Just as we began getting comfortable with the mixture of experts (MoE) method, the mixture of agents (MoA) approach started to gain prominence. MoA takes the concept of specialisation a notch higher by leveraging the collective strengths of multiple LLMs.
Unlike MoE, which operates within a single model, MoA employs a layered architecture where each layer comprises several LLM agents.
“While mixture of experts is an innovative approach to overcome hardware restrictions, mixture of agents goes one step further in providing flexibility and depth, which is not possible with MoE,” Arjun Reddy, the co-founder of Nidum.AI, told AIM.
For countries like India, where computational resources and data availability can be limiting factors, MoA offers a practical and scalable solution. MoA can achieve state-of-the-art results without the need for extensive computational power or data by utilising open-source models and focusing on collaboration rather than individual model performance.
Recent research highlights the transformative potential of MoA. A study by Together AI demonstrates how MoA significantly enhances the capabilities of LLMs by constructing a layered architecture, where each layer comprises multiple agents.
These agents collaboratively generate responses by utilising outputs from the previous layer, leading to state-of-the-art performance on benchmarks like AlpacaEval 2.0, MT-Bench, and FLASK. For instance, the MoA model achieved a score of 65.1% on AlpacaEval 2.0, outperforming GPT-4 Omni’s 57.5%.
The Rise of MoA
OpenAI is exploring the MoA framework through its multi-agent debate technique. This method involves multiple independent agents simultaneously attempting to solve the same problem proposed by the user. Each agent retains its solution in memory, and the system synthesises these solutions to arrive at a final response.
In a post on X, Together AI explains how MoA works and how it can be implemented in just 50 lines of code, showcasing the approach’s simplicity and effectiveness.
A research by Ajith’s AI Pulse elaborates on the MoA’s layered architecture, where each layer includes multiple LLM agents. Each agent processes the outputs of agents from the previous layer, refining and enhancing the response iteratively.
This collaborative process enables the model to leverage the strengths of different LLMs, resulting in improved performance. The rise also favours general audience as they can create their own local mixture of agents all by using the Llama index-pack, giving a glimpse of how flexible and effective MoA is.
Research paper titled ‘Mixture of Agents: A New Paradigm for Large Language Models’ provides a comprehensive theoretical foundation for the MoA framework. Exploring how the collaboration of multiple agents leads to improved performance metrics, enhanced accuracy, and scalability.
Mixture of Agents in Action
This innovative approach is already being harnessed in various cutting-edge applications, demonstrating its potential to revolutionise the field. For instance, the integration of MoA with Grok has shown remarkable improvements in AI performance, surpassing even GPT-4 in speed and efficiency.
Notably, Andrej Karpathy has also shared his insights on MoA in his recent posts, discussing how people will take the Llama 3.1 405B, and distil and convert it into a small agent for narrow tasks and applications. This points towards a growing community of AI enthusiasts and professionals actively exploring the potential of MoA.
NVIDIA has demonstrated the use of AI agents to optimise supply chains through their cuOpt microservice. This system uses multiple LLM agents to handle complex optimisation tasks, transforming natural language queries into optimised plans.
This approach exemplifies how MoA can be applied to real-world problems, enhancing efficiency and decision-making processes in large-scale operations.