UHG
Search
Close this search box.

Meta FAIR Releases Transfusion for Multimodal AI Training

Transfusion is a state-of-the art approach at advancing text and image modalities

Share

Listen to this story

In a collaborated effort with Waymo & University of Southern California, Meta FAIR released its research on the importance of multi-modal generative models. Transfusion aims to unite and simplify the gap between discrete sequence modeling and continuous media generation. 

The Transfusion Model 

The model is trained equally on text and image. Per Meta, Transfusion is more advanced than quantising images and training a language model over discrete image tokens. The model’s performance can be enhanced through “modality-specific” encoding and decoding layers. The model predicts the next word in a sequence. Trained on improving predictions, it reduces the difference between guessing and actual words. It is imperative to note that with 7 billion parameters and 2 trillion multi modal tokens, Transfusion is at par with other larger models that create image and text – and outperforms models like DALL-E 2 and SDXL. It works better than Chameleon as it takes lesser computing power and generates better results. 

One limitation is perhaps that diffusion models do not perform at par with traditional language models. A lot of research is yet to be done in this area to improve overall performance.  

Transformer’s Uniqueness & the Future of Innovation in AI Research 

What differentiates Transformer from the rest is its unified architecture that runs end to end to generate text and images. Existing models like Flamingo, LLaVA, GILL, and DreamLLM combine separate architectures for different types of data, which are trained separately. 

The goal of this Transfusion is to synergise two modalities in a single joint model – with each of them fulfilling their objective. The incentives are that these are versatile, resource efficient, and cost effective for handling different types of data without any additional costs. 

📣 Want to advertise in AIM? Book here

Picture of Aditi Suresh

Aditi Suresh

Aditi is a political science graduate, and is interested in technology, social media, and culture.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.