Listen to this story
|
Unlike typical software development, LLM development is a distinctly different and more complex task, with its own unique set of challenges. One of the most formidable challenges faced by LLM developers is the “curse of multilinguality”.
Sara Hooker, VP of research at Cohere AI, said, “When you try and make AI actually work for the world, you’re talking about this vast array of different languages. There are 7,000 languages in the world, and 80% of those have no text data.”
This lack of diverse language data leads to models that overfit high-resource languages like English and Chinese while under-serving the “longtail” of low-resource languages.
It doesn’t stop at that. It gets worse for the reasoning part.
The Elusive Nature of Reasoning
As Subbarao Kambhampati, professor at Arizona State University, illustrates with the classic “manhole cover” interview question, “The tricky part about reasoning is if you ask me a question that requires reasoning and I gave an answer to you, on the face of it, you can never tell whether I memorised the answer and gave it to you or I actually reasoned from first principles.”
Assessing whether an LLM can truly reason rather than just match patterns is difficult. There is often a gap between an LLM’s ability to generate code or text that looks plausible versus its deeper understanding of the underlying logic and ability to reason about it.
Natural language relies heavily on context, shared understanding, and inference to convey meaning. This makes it difficult for LLMs to extract precise semantics and formal logic needed for rigorous reasoning just from language examples.
Furthermore, LLMs have no concept of reality outside of language and cannot test the truth of statements. They are unconcerned about whether concepts contradict each other and only focus on generating sentences that follow language rules.
David Ferrucci, the founder of Elemental Cognition, argues that natural language is insufficient for reliable logical reasoning and computations in complex domains. He states that “for complex reasoning problems where you cannot afford to be wrong, natural language is not the right medium”.
“Without any underlying formalism, natural language’s ambiguity and subjectivity are great for casually navigating around into another human’s brain, but not the best for ensuring shared meaning and precise, reliable outcomes,” he added.
Ferrucci suggests that formal languages and reasoning systems are needed to enable complex problem-solving.
The Verification Gap
Perhaps the most critical challenge that LLM developers face is the lack of robust methods for verifying the outputs of these models. As Kambhampati notes, “It’s very hard to show what is and what is not on the web,” making it difficult to determine whether an LLM’s output is grounded in factual knowledge or mere hallucination.
A research paper titled ‘TrustLLM: Trustworthiness in Large Language Models’ developed a trustworthiness evaluation framework examining 16 mainstream LLMs across eight dimensions, including fairness, machine ethics, privacy, robustness, safety, truthfulness, accountability, and transparency.
The researchers found that none of the tested models was truly trustworthy according to their benchmarks, highlighting the need for improved verification methods.
Aidan Gomez, the CEO of Cohere, mentioned that to improve reasoning, language models need to be shown how to break down tasks at a low level, think through problems step-by-step, and have an “inner monologue”.
“However, data demonstrating this type of reasoning process is extremely scarce on the internet,” he added.
One of the most significant challenges in verifying the outputs of LLMs is their inherent “black box” nature. LLMs are complex, opaque systems that make it difficult for developers and researchers to understand how they arrive at their outputs.
LLMs suffer from a lack of interpretability, which means it is challenging to understand the reasoning behind their responses. This opacity makes it difficult to identify the root causes of incorrect or inconsistent outputs, hindering efforts to improve the models’ reliability.
Another related issue is the limited explainability of LLMs. Even when an LLM provides an answer, it is often unclear how it arrived at that particular response. This lack of explainability makes it challenging for developers to troubleshoot issues and refine the models.
Addressing the challenges faced by LLM developers will require a multifaceted approach. This includes developing more advanced verification methods to assess the factual accuracy and logical consistency of LLM outputs, improving the interpretability and explainability of LLMs to better understand their inner workings.
By focusing on these key areas, researchers and developers can work towards creating LLMs that are more reliable, trustworthy, and capable of complex reasoning across diverse languages and domains.