UHG
Search
Close this search box.

Microsoft Launches New Phi-3.5 Models, Outperforms Google Gemini 1.5 Flash, Meta’s Llama 3.1, and OpenAI’s GPT-4o

The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license, making them accessible for a wide range of applications.

Share

Microsoft has released the new Phi-3.5 models: Phi-3.5-MoE-instruct, Phi-3.5-mini-instruct, and Phi-3.5-vision-instruct. The Phi-3.5-mini-instruct, with 3.82 billion parameters, is built for basic and quick reasoning tasks. 

The Phi-3.5-MoE-instruct, with 41.9 billion parameters, handles more advanced reasoning. The Phi-3.5-vision-instruct, with 4.15 billion parameters, is designed for vision tasks like image and video analysis.

Phi-3.5 MOE-instruct 

Phi-3.5-MoE instruct is a 42-billion-parameter open-source model and demonstrates significant improvements in reasoning capabilities, outperforming larger models such as Llama 3.1 8B and Gemma 2 9B across various benchmarks.

Despite its competitive performance, Phi-3.5-MoE falls slightly behind GPT-4o-mini but surpasses Gemini 1.5 Flash in benchmarks. The model supports multilingual applications, although the specific languages covered remain unclear.

https://twitter.com/rohanpaul_ai/status/1825984804451463210

Phi-3.5-MoE features 16 experts, with two being activated during generation, and has 6.6 billion parameters engaged in each inference. Phi-3.5-MoE supports multilingual capabilities and extends its context length to 128,000 tokens. 

The model was trained over 23 days using 512 H100-80G GPUs, with a total training dataset of 4.9 trillion tokens.

The model’s development included supervised fine-tuning, proximal policy optimisation, and direct preference optimisation to ensure precise instruction adherence and robust safety measures. The model is intended for use in memory and compute-constrained environments and latency-sensitive scenarios.

Key use cases for Phi-3.5-MoE include general-purpose AI systems, applications requiring strong reasoning in code, mathematics, and logic, and as a foundational component for generative AI-powered features. 

The model’s tokenizer supports a vocabulary size of up to 32,064 tokens, with placeholders for downstream fine-tuning. Microsoft provided a sample code snippet for local inference, demonstrating its application in generating responses to user prompts.

Phi-3.5-mini-instruct

With 3.8 billion parameters, this model is lightweight yet powerful, outperforming larger models such as Llama3.1 8B and Mistral 7B. It supports a 128K token context length, significantly more than its main competitors, which typically support only up to 8K.

Microsoft’s Phi-3.5-mini is positioned as a competitive option in long-context tasks such as document summarisation and information retrieval, outperforming several larger models like Llama-3.1-8B-instruct and Mistral-Nemo-12B-instruct-2407 on various benchmarks. 

The model is intended for commercial and research use, particularly in memory and compute-constrained environments, latency-bound scenarios, and applications requiring strong reasoning in code, math, and logic. 

The Phi-3.5-mini model was trained over 10 days using 512 H100-80G GPUs. The training process involved processing 3.4 trillion tokens, leveraging a combination of synthetic data and filtered publicly available websites to enhance the model’s reasoning capabilities and overall performance.

Phi-3.5-vision-instruct 

Phi-3.5 Vision is a 4.2 billion parameter model and it excels in multi-frame image understanding and reasoning. It has shown improved performance in benchmarks like MMMU, MMBench, and TextVQA, demonstrating its capability in visual tasks. It even outperforms OpenAI GPT-4o on several benchmarks. 

The model integrates an image encoder, connector, projector, and the Phi-3 Mini language model. It supports both text and image inputs and is optimised for prompts using a chat format, with a context length of 128K tokens. The model was trained over 6 days using 256 A100-80G GPUs, processing 500 billion tokens that include both vision and text data.

The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license, making them accessible for a wide range of applications. This release aligns with Microsoft’s commitment to providing open-source AI tools that are both efficient and versatile.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.