UHG
Search
Close this search box.

German AI Startup Aleph Alpha Launches Pharia-1-LLM Model Family

Share

Listen to this story

German AI Startup Aleph Alpha has announced the release of its latest foundation model family, Pharia-1-LLM, featuring Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned. These models are now publicly available under the Open Aleph License, which permits non-commercial research and educational use.

Pharia-1-LLM-7B-control is designed to produce concise, length-controlled responses and is optimized for German, French, and Spanish languages. The model has been trained on a multilingual base corpus and adheres to EU and national regulations, including copyright and data privacy laws. It is specifically engineered for domain-specific applications in industries such as automotive and engineering.

The Pharia-1-LLM-7B-control-aligned variant includes additional safety features through alignment methods. This model is tailored for use in conversational settings like chatbots or virtual assistants, where safety and clarity are prioritized.

The training of Pharia-1-LLM-7B involved two phases. Initially, the model was pre-trained on a 4.7 trillion token dataset with a sequence length of 8,192 tokens, using 256 A100 GPUs. In the second phase, the model was trained on an additional 3 trillion tokens with a new data mix, utilizing 256 H100 GPUs. The training was performed using mixed-precision strategies and various optimisation techniques to enhance throughput and performance.

In terms of performance, Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned were evaluated against similarly sized weight-available multilingual models, including Mistral’s Mistral-7B-Instruct-v0.3 and Meta’s llama-3.1-8b-instruct. 

The comparison results, detailed in the model card, provide insights into the models’ effectiveness across multiple languages, including German, French, and Spanish. The evaluation highlighted areas where Pharia-1-LLM-7B outperforms or matches its peers in specific benchmarks and use cases.

Pharia detailed the model architecture, hyperparameters, and training processes in a comprehensive blog post. The models underwent evaluations against comparable weight-available multilingual models, with results available in the model card.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.