Listen to this story
|
German AI Startup Aleph Alpha has announced the release of its latest foundation model family, Pharia-1-LLM, featuring Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned. These models are now publicly available under the Open Aleph License, which permits non-commercial research and educational use.
Pharia-1-LLM-7B-control is designed to produce concise, length-controlled responses and is optimized for German, French, and Spanish languages. The model has been trained on a multilingual base corpus and adheres to EU and national regulations, including copyright and data privacy laws. It is specifically engineered for domain-specific applications in industries such as automotive and engineering.
The Pharia-1-LLM-7B-control-aligned variant includes additional safety features through alignment methods. This model is tailored for use in conversational settings like chatbots or virtual assistants, where safety and clarity are prioritized.
The training of Pharia-1-LLM-7B involved two phases. Initially, the model was pre-trained on a 4.7 trillion token dataset with a sequence length of 8,192 tokens, using 256 A100 GPUs. In the second phase, the model was trained on an additional 3 trillion tokens with a new data mix, utilizing 256 H100 GPUs. The training was performed using mixed-precision strategies and various optimisation techniques to enhance throughput and performance.
In terms of performance, Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned were evaluated against similarly sized weight-available multilingual models, including Mistral’s Mistral-7B-Instruct-v0.3 and Meta’s llama-3.1-8b-instruct.
The comparison results, detailed in the model card, provide insights into the models’ effectiveness across multiple languages, including German, French, and Spanish. The evaluation highlighted areas where Pharia-1-LLM-7B outperforms or matches its peers in specific benchmarks and use cases.
Pharia detailed the model architecture, hyperparameters, and training processes in a comprehensive blog post. The models underwent evaluations against comparable weight-available multilingual models, with results available in the model card.