UHG
Search
Close this search box.

Microsoft Research Bangalore Says Accuracy is Not All You Need

Share

Accuracy is Not All You Need

A recent report by researchers at Microsoft Research in Bangalore has uncovered significant issues with current methods for compressing and quantisation of LLMs. 

The paper, titled “Accuracy is Not All You Need,” highlights that commonly used compression techniques, such as quantisation, can lead to changes in model behaviour that are not captured by traditional accuracy metrics.

The study by Abhinav Dutta, Sanjeev Krishnan, Nipun Kwatra, and Ramachandran Ramjee emphasises the importance of looking beyond accuracy when evaluating compressed models.

The researchers point out that while compressed models often maintain similar accuracy levels to their baseline counterparts, their behaviour can differ significantly. This phenomenon, referred to as “flips,” involves answers changing from correct to incorrect and vice versa, impacting the model’s reliability.

The researchers propose using distance metrics like KL-Divergence and the percentage of flips to better assess the impact of compression. These metrics provide a more nuanced view of how compression affects model outputs as perceived by end-users.

The research team conducted experiments using multiple LLMs, such as Llama2 chat and Yi chat, across various quantization techniques and datasets. They found that compressed models perform significantly worse in generative tasks, as evidenced by evaluations on the MT-Bench dataset.

Key Findings

The researchers acknowledge that predicting performance degradation in real-world applications remains challenging. They note that distance metrics may not always indicate visible degradation in downstream tasks.

They also added that compressed models often exhibit significant behavioural differences from their baseline versions, impacting user experience. The flips metric revealed that the proportion of answer changes is substantial, highlighting the limitations of accuracy as a sole performance indicator.

Moreover, in tasks requiring generative capabilities, compressed models underperformed compared to their baseline versions, underscoring the need for more comprehensive evaluation metrics.

The Microsoft Research study concludes that traditional accuracy metrics are insufficient for evaluating the quality of compressed LLMs. The introduction of distance metrics such as KL-Divergence and flips offers a more accurate assessment of model performance, capturing changes that affect end-users. 

The researchers argue that these metrics are essential for all optimisation methods that aim to minimise visible changes in model behaviour from a baseline. By adopting these metrics, the field of model optimisation and compression can progress more effectively, ensuring that compressed models meet user expectations and maintain high-quality outputs.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.