These days, everyone seems to have strong opinions on LLMs. While some are grounded in research by experts such as Yann LeCun, others just follow the hype to criticise it. Some say they’re our ticket to the AGI, while others think they’re just glorified text-producing algorithms with a fancy name.
One of the biggest arguments against LLMs achieving AGI is that they’re just not like us. As one user puts it in a Reddit discussion, “Human intelligence develops from small amounts of data, in real-time, on 20W of power, using metacognition. By contrast, LLMs work with massive amounts of data, are pre-trained, use massive power, and operate without any cognitive awareness. Therefore, AGI requires a different paradigm.”
Meanwhile, LeCun advises people getting into the AI space to work on something apart from LLMs. “If you are a student interested in building the next generation of AI systems, don’t work on LLMs. This is in the hands of large companies, there’s nothing you can bring to the table,” said LeCun.
Similarly, Francois Chollet, the creator of Keras, also recently shared similar thoughts about this. “OpenAI has set back the progress towards AGI by 5-10 years because frontier research is no longer being published and LLMs are an offramp on the path to AGI,” he said in an interview.
Interestingly, the discussion on alternatives of LLM-based models has been ongoing for a while now. Recently, Mufeed VH, the young creator of Devin’s alternative Devika, spoke about how people should move away from Transformer models and start building new architectures and how we should do more innovation.
LLMs are the projections of the world
But here’s the kicker according to people on Reddit: human intelligence also developed with tons of data and energy. Our cognitive architecture is like the universe’s ultimate information superhighway, built over hundreds of millions of years and encoded in our DNA.
Ilya Sutskever, the former chief scientist at OpenAI, has noted several times that text is the projection of the world. But how much of that is linked with LLMs is still questionable. LLMs, in a way, are building cognitive architecture from scratch, echoing the evolutionary and real-time learning processes, albeit with a bit more electricity.
One insightful comment noted, “An essential similarity between the human brain and LLMs is that they are essentially compression algorithms… compressing massive amounts of world data into worldviews that provide predictive models to guide action.”
The brain’s architecture is that of a finely tuned machine, learning efficiently from small amounts of data in real time. LLMs, on the other hand, need vast amounts of data and computational power to get anywhere close to that level of performance.
On the other hand, it’s a misconception that LLMs are all about scaling up datasets. In fact, progress is happening with ever-smaller datasets and clever techniques like synthetic data generation with positive-feedback cycles, and even very small models such as Phi-3 and Llama.
One user wisely noted, “Our brain isn’t one architecture for all processes. There are different parts of our brain that process things differently to do different things in life.” Indeed, while LLMs emulate parts of the brain related to language, there’s still much work to be done.
Diffusion models, tackling visual processing, and retrieval-augmented generation (RAG), are trying to mimic the hippocampus. But the road to AGI is long and winding, with many more brain regions to cover.
Then what to focus on? A little bit more of LLMs
Just as LeCun suggests, there are many people working in different fields of AI to figure out things other than LLMs.
A Reddit thread sparked this conversation, asking, “What is it like to work on niche topics that aren’t LLM or Vision?” One user shared, “I shifted my focus from computer vision to ML theory. Now I’m working on kernel methods which are nascent but hold promise to explain and interpret large and over-parameterised networks.”
While LLMs grab the headlines, the world of AI research is full of unsung heroes working on everything from speech synthesis to climate modelling. As one comment wisely put it, “Most of the cooler AI applications aren’t chatbots or generating bounding boxes on camera feeds.”
The bottom line is – LLMs are algorithms that utilise extensive datasets, rely on unsupervised learning, generalise skills not explicitly trained for, and have broad applicability to various downstream tasks. This approach resembles human intelligence, but current LLMs are not upgrading themselves every second like humans.
This is one discussion that can go on for the longest time. Humans can continuously learn, while models trained on static datasets with current training methods cannot. But it might be too soon to write off LLMs.