Listen to this story
|
How many times have we heard that prompt engineering is the job of the future? Ever since ChatGPT was launched, everyone started experimenting with different ways of prompting the chatbot and called it a job. Others were scared of losing their jobs, thanks to this new role. Well, it now turns out that AI can do it and do it better than humans.
“Did anybody, but the most desperate climbers, ever honestly believe that ‘typing prompts into ChatGPT’ was ever going to be a high-paying full-time job?” asked a user on HackerNews.
In a recent study, VMware researchers found that LLMs become more unpredictable when humans start experimenting with weird prompts. Even more interesting was another research team’s finding that concluded that “no human should manually optimise prompts ever again”. The best prompt engineering is done by the AI model itself.
Problems Aplenty with Manual Prompting
Rick Battle and Teja Gollapudi from VMware experimented with big and small language models and tried different prompting techniques to figure out the most optimal and efficient way to do it. “It’s both surprising and irritating that trivial modifications to the prompt can exhibit such dramatic swings in performance,” read the conclusion of the paper.
They even highlighted that there is no obvious methodology that could improve performance, and the effects would be very trivial. For example, the research concluded that practitioners do not even need GPT-4 or PaLM-2 size models to use effective prompts. In their experiment Llama 13B and Mistral-7B were able to produce “superior prompts”, which they found shocking.
“It’s undeniable that the automatically generated prompts perform better and generalise better than hand-tuned ‘positive thinking’ prompts,” the paper concluded, saying that even after giving positive affirmation to chatbots, automatic prompts perform better – also called auto prompting.
“I literally could not believe some of the stuff that it generated,” Battle said in an interview, talking about how no human could ever generate the prompt that the system generated itself, as it was bizarre.
Another aspect of prompt engineering is that it creates bias in the output of the model, which observers can attribute to the AI model itself. “Arguably, prompt engineering can be an even worse source of bias,” said François Chollet.
People are confusing prompt engineering with just giving prompts to a chatbot in English language, while what the model does is actually a lot of maths, English is just the frontend of the model. Therefore, the AI models can do it better.
It only makes sense. “In LLMs, the need for prompt engineering is a sign of *lack* of robust language understanding,” said Melanie Mitchell in a post on X last year. That seems to be the case even now, even after a year of development and scaling of language models.
Was Prompting Just a Passing Fad?
Just like C++ was considered a “dying language”, prompt engineering is regarded as a passing fad. Logging onto ChatGPT or Codex and just typing in what you want does not work that easily. It’s a skill to learn. And now, with AI doing it better than humans, it is a matter of time that it vanishes. But will it?
Most of prompt engineering is just about trial and error, and now with AI, that is not needed anymore. Now that companies are hiring for roles such as LLMOps, they might be labelled as the new prompt engineers, but as a job, it won’t die.
Douglas Crockford, one of the developers of JavaScript and the brain behind the JSON format, is worried about English as a programming language “because it’s so ambiguous,” he said in an interview with AIM.
He clarified that the fundamental law of programming is that the program has to be perfect. “It has to be perfect in every aspect, in every detail, for all states for all time”. He further elaborated that if it isn’t, the computer is licensed to do the worst possible thing at the worst possible time. “It’s not the computer’s fault, it is the programmers’ fault,” he noted.
Another important aspect to note about prompt engineering is that it changes as people experiment with different data sources and different AI models. And as companies are adapting different open source and closed source models, what is the best combination of prompts for each model might still be a job that someone needs to do.
We’re calling them prompt engineers today, maybe as AI improves, we will call them something else. That is not to say that prompt engineering holds no value. It has already found its niche and is probably going to stay here for a while, but not too long.