Karya has partnered with Microsoft Research to conduct evaluation of large language models (LLMs). The project involved over 90,000 human evaluations of 30 models in 10 Indian languages, completed in under three weeks. This effort is one of the largest multilingual human evaluations of LLMs to date.
The evaluation addressed challenges such as linguistic diversity, benchmark contamination, and the incorporation of local cultural nuances. Karya engaged a broad community of workers representing the average Indian population to ensure a comprehensive assessment.
The evaluation results, including detailed leaderboards on model performance, are available in the published paper here. Karya’s services encompass various benchmarks, including linguistic acceptability, cultural sensitivity, and reasoning.
A presentation of Karya’s LLM Evaluation Services can be viewed here. The project is noted for its complexity, with Karya’s team successfully delivering accurate results.
Multi-Turn Evaluation: Karya workers assessed models using various multi-turn conversation benchmarks, including tasks such as recollection, expansion, refinement, and follow-ups. The evaluation measured model performance in perception, reasoning, and creativity.
Model Comparison: The evaluation also involved comparing multiple models against a common benchmark to determine their relative performance. This analysis aimed to identify the best-performing model for specific tasks.
The execution of this project was led by Jeevitha and included Bipin, Victor, Sakshi, Deepsikha, Swarup, Rizwan, Praveen, Hari, Iqbal, Rupal, Neha, Anand, Monali, Aishwarya, and Anushree. Karya aims to expand such high-complexity digital tasks, potentially increasing earning opportunities for workers and advancing pathways out of poverty.
Founded by Manu Chopra and Vivek Seshadri in 2021, Karya provides a variety of AI services, including generating high quality training data, multimodal data labelling and performing culturally sensitive language model evaluations. By offering industry-leading wages and investing in worker welfare, Karya is pioneering an equitable model in the AI data industry.