With the release of Llama 3.1, Mark Zuckerberg has established himself as the king of open-source AI. Contrary to popular belief, Zuckerberg has admitted that the pursuance of an open-source strategy is due to somewhat selfish reasons about the tech ecosystem, as he wants to influence how the models are developed and integrated into the social fabric.
“We’re not pursuing this out of altruism, though I believe it will benefit the ecosystem. We’re doing it because we think it will enhance our offerings by creating a strong ecosystem around contributions, as seen with the PyTorch community,” said Zuckerberg at SIGGRAPH 2024.
“I mean, this might sound selfish, but after building this company for a while, one of my goals for the next ten or 15 years is to ensure we can build the fundamental technology for our social experiences. There have been too many times when I’ve tried to build something, only to be told by the platform provider that it couldn’t be done,” he added.
Zuckerberg does not want the AI industry to follow the path of the smartphone industry, as seen with Apple. “Because of its closed ecosystem, Apple essentially won and set the terms. Apple controls the entire market and profits, while Android has largely followed Apple. I think it’s clear that Apple won this generation,” he said.
He explained that when something becomes an industry standard, other folks work starts to revolve around it. “So, all the silicon and systems will end up being optimised to run this thing really well, which will benefit everyone. But it will also work well with the system we’re building, and that’s, I think, just one example of how this ends up being really effective,” he said.
Earlier this year, Meta open-sourced Horizon OS built for its AR/VR headsets. “We’re basically making the Horizon OS that we’re building for mixed reality an open operating system, similar to what Android or Windows was. We’re making it so that we can work with many different hardware companies to create various kinds of devices,” said Zuckerberg.
Jensen Loves Llama
NVIDIA chief Jensen Huang could not agree more with Zuckerberg. He said that using Llama 2, NVIDIA has developed fine-tuned models that assist engineers at the company.
“We have an AI for chip design and another for software coding that understands USD (Universal Scene Description) because we use it for Omniverse projects. We also have an AI that understands Verilog, our hardware description language. We have an AI that manages our bug database, helps triage bugs, and directs them to the appropriate engineers. Each of these AIs is fine-tuned based on Llama,” said Huang.
“We fine-tune them, we guardrail them. If we have an AI designed for chip design, we’re not interested in asking it about politics, you know, and religion and things like that,” he explained. Huang joked that an AI chip engineer is costing them just $10 an hour.
Moreover, Huang said he believes the release of Llama 2 was “the biggest event in AI last year.” He explained that this was because suddenly, every company, enterprise, and industry—especially in healthcare—was building AI. Large companies, small businesses, and startups alike were all creating AIs. It provided researchers with a starting point, enabling them to re-engage with AI. And he believes that Llama 3.1 will do the same.
Army of AI Agents
Meta released AI Studio yesterday, a new platform where people can create, share, and discover AIs without needing technical skills. AI Studio is built on the Llama 3.1 models. It allows anyone to build and publish AI agents across Messenger, Instagram, WhatsApp, and the web.
Taking a dig at OpenAI, Zuckerberg said, “Some of the other companies in the industry are building one central agent.
“Our vision is to empower everyone who uses our products to create their own agents. Whether it’s the millions of creators on our platform or hundreds of millions of small businesses, we aim to pull in all your content and quickly set up a business agent.”
He added that this agent would interact with customers, handle sales, take care of customer support, and more.
Forget Altman, SAM 2 is Here
While the world is still awaiting the voice features in GPT-4o as promised by Sam Altman, Meta released another model called SAM 2. Building upon the success of its predecessor, SAM 2 introduces real-time, promptable object segmentation capabilities for both images and videos, setting a new standard in the industry.
SAM 2 is the first model to unify object segmentation across both images and videos. This means that users can now seamlessly apply the same segmentation techniques to dynamic video content as they do to static images.
One of the standout features of SAM 2 is its ability to perform real-time segmentation at approximately 44 frames per second. This capability is particularly beneficial for applications that require immediate feedback, such as live video editing and interactive media.
Huang said this would be particularly useful since NVIDIA is now training robots, believing that the future will be physical AI. “We’re now training AI models on video so that we can understand the world model,” said Huang.
He added that they will connect these AI models to the Omniverse, allowing them to better represent the physical world and enabling robots to operate in these Omniverse worlds.
On the other hand, this model would be beneficial for Meta, as the company is bullish on its Meta Ray-Ban glasses. “When we think about the next computing platform, we break it down into mixed reality, the headsets, and the smart glasses,” said Zuckerberg.