Last updated August 21, 2024
In AI News

Microsoft Launches New Phi-3.5 Models, Outperforms Google Gemini 1.5 Flash, Meta’s Llama 3.1, and OpenAI’s GPT-4o

The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license, making them accessible for a wide range of applications.

Share

Published on August 21, 2024

by Siddharth Jindal

Microsoft has released the new Phi-3.5 models: Phi-3.5-MoE-instruct, Phi-3.5-mini-instruct, and Phi-3.5-vision-instruct. The Phi-3.5-mini-instruct, with 3.82 billion parameters, is built for basic and quick reasoning tasks.

The Phi-3.5-MoE-instruct, with 41.9 billion parameters, handles more advanced reasoning. The Phi-3.5-vision-instruct, with 4.15 billion parameters, is designed for vision tasks like image and video analysis.

Microsoft has released the Phi-3.5 series of models!

Phi-3.5-MoE-instruct: 42B param MoE, 6.6B active params, 128k context length, trained on 4.9T tokens on 512 H100s, multilingual (10% of dataset)https://t.co/hMph0ZsBve

Phi-3.5-mini-instruct: 3.8B param dense, 128k context… pic.twitter.com/HM6X4U5Jae
— Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr) August 20, 2024

Phi-3.5 MOE-instruct

Phi-3.5-MoE instruct is a 42-billion-parameter open-source model and demonstrates significant improvements in reasoning capabilities, outperforming larger models such as Llama 3.1 8B and Gemma 2 9B across various benchmarks.

Despite its competitive performance, Phi-3.5-MoE falls slightly behind GPT-4o-mini but surpasses Gemini 1.5 Flash in benchmarks. The model supports multilingual applications, although the specific languages covered remain unclear.

https://twitter.com/rohanpaul_ai/status/1825984804451463210

Phi-3.5-MoE features 16 experts, with two being activated during generation, and has 6.6 billion parameters engaged in each inference. Phi-3.5-MoE supports multilingual capabilities and extends its context length to 128,000 tokens.

The model was trained over 23 days using 512 H100-80G GPUs, with a total training dataset of 4.9 trillion tokens.

The model’s development included supervised fine-tuning, proximal policy optimisation, and direct preference optimisation to ensure precise instruction adherence and robust safety measures. The model is intended for use in memory and compute-constrained environments and latency-sensitive scenarios.

Key use cases for Phi-3.5-MoE include general-purpose AI systems, applications requiring strong reasoning in code, mathematics, and logic, and as a foundational component for generative AI-powered features.

The model’s tokenizer supports a vocabulary size of up to 32,064 tokens, with placeholders for downstream fine-tuning. Microsoft provided a sample code snippet for local inference, demonstrating its application in generating responses to user prompts.

Phi-3.5-mini-instruct

With 3.8 billion parameters, this model is lightweight yet powerful, outperforming larger models such as Llama3.1 8B and Mistral 7B. It supports a 128K token context length, significantly more than its main competitors, which typically support only up to 8K.

Microsoft’s Phi-3.5-mini is positioned as a competitive option in long-context tasks such as document summarisation and information retrieval, outperforming several larger models like Llama-3.1-8B-instruct and Mistral-Nemo-12B-instruct-2407 on various benchmarks.

The model is intended for commercial and research use, particularly in memory and compute-constrained environments, latency-bound scenarios, and applications requiring strong reasoning in code, math, and logic.

The Phi-3.5-mini model was trained over 10 days using 512 H100-80G GPUs. The training process involved processing 3.4 trillion tokens, leveraging a combination of synthetic data and filtered publicly available websites to enhance the model’s reasoning capabilities and overall performance.

Phi-3.5-vision-instruct

Phi-3.5 Vision is a 4.2 billion parameter model and it excels in multi-frame image understanding and reasoning. It has shown improved performance in benchmarks like MMMU, MMBench, and TextVQA, demonstrating its capability in visual tasks. It even outperforms OpenAI GPT-4o on several benchmarks.

How the hell Phi-3.5 is even possible?

Phi-3.5-3.8B (Mini) somehow beats LLaMA-3.1-8B..
(trained only on 3.4T tokens)

Phi-3.5-16×3.8B (MoE) somehow beats Gemini-Flash
(trained only on 4.9T tokens)

Phi-3.5-V-4.2B (Vision) somehow beats GPT-4o
(trained on 500B tokens)

how? lol pic.twitter.com/97gmx1CsQs
— Yam Peleg (@Yampeleg) August 20, 2024

The model integrates an image encoder, connector, projector, and the Phi-3 Mini language model. It supports both text and image inputs and is optimised for prompts using a chat format, with a context length of 128K tokens. The model was trained over 6 days using 256 A100-80G GPUs, processing 500 billion tokens that include both vision and text data.

The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license, making them accessible for a wide range of applications. This release aligns with Microsoft’s commitment to providing open-source AI tools that are both efficient and versatile.

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.