Listen to this story
|
IBM recently announced that it would host Meta’s Llama 2-chat 70 billion parameter model in the watsonx.ai studio, with early access available to select clients and partners.
Enterprises are now embracing the trend of generative AI to bolster their business strategies. To harness its potential effectively, they require streamlined methods for training and constructing their own LLMs using their accumulated years of data. To address this challenge, various cloud providers, including AWS and Azure, have stepped up to offer assistance.
OpenAI’s partnership with Microsoft provided them with GPT-4 while AWS multi-LLM approach gave them the options to choose from a buffet of models like AI21, Cohere, Anthropic Claude 2, and Stability AI SDXL 1.0. Apart from well known clouds several other service providers popped up recently.
Enterprises sought a reliable solution they could trust from service providers. Recently, AI enthusiasts have devised methods to train and construct Llama 2 models, yet the critical concern remains: Can these approaches be relied upon to handle the data with trustworthiness?
A few days back, AI expert Santiago tweeted “You can now test Llama 2 in less than 10 minutes,” introducing Monster API, a new tool that lets you effortlessly access powerful generative AI models such as Falcon, Llama, Stable Diffusion, and GPT J and others, without having to worry about managing the generative AI models or scaling them up to handle lots of requests.
However, new initiatives like this are too risky for established companies to trust on and they have not proved their ability to scale the business.
IBM has Customers’ Trust
IBM is dedicated to prioritizing trust and security as it introduces its generative AI features. As an example, when users use the Llama 2 model in the prompt lab of watsonx.ai, they can activate the AI guardrails function. This helps in automatically filtering out harmful language from both the input prompt text and the generated output of the model.
In an exclusive conversation with AIM, Geeta Gurnani, IBM Technology CTO and technical sales leader, IBM India & South Asia, said IBM is introducing an AI governance toolkit which is expected to be generally available later this year, which will help operationalise governance to mitigate the risk, time and cost associated with manual processes and provides the documentation necessary to drive transparent and explainable outcomes.
“It will also have mechanisms to protect customer privacy, proactively detect model bias and drift, and help organisations meet their ethics standards.” she said.
Why Llama 2 and not GPT-4
Llama2 has gained popularity among the enterprise. This can be backed by the fact that it is available on Amazon Sagemaker, Databricks, Watsonx.ai, and even on Microsoft’s Azure which is the backbone of proprietary LLM GPT-4.
Furthermore, the partnership between Meta and several prominent companies like Amazon, Hugging Face, NVIDIA, Qualcomm, Zoom, and Dropbox, as well as academic leaders, underscores the significance of open-source software.
Even OpenAI’s Karpathy, a prominent figure in the field of deep learning couldn’t resist himself from using Llama 2 which led to him creating Baby Llama. aka llama.c, where he had been exploring the concept of running large language models (LLMs) on a single computer as part of his recent experiments. Moreover, he even hinted that OpenAI might release open source models in the near future.
In a similar vein, AI expert Santiago expressed that Llama 2 possesses all the elements for potential success: being open-source, having a commercial license, allowing cost-effective GPU usage, and enabling comprehensive control over the entire utilization process.
“I’ve talked to two startups migrating from proprietary models into Llama 2. How many more companies will ditch commercial alternatives and embrace Llama 2?,” he questioned
GPT-4 is exclusively accessible through Microsoft Azure OpenAI Service, but enterprises can also purchase the GPT-4 API provided by OpenAI. Nonetheless, the limitation of GPT-4 is its closed-source nature, preventing users from creating their own models or experimenting with its code. Unlike Llama 2, which is free for commercial use, GPT-4 APIs come with a price tag. The charges are calculated per 1000 tokens, amounting to $0.03 for input and $0.06 for output.
For a slightly complex use case, the monthly inference cost for a GPT 4 API can be anywhere between $ 250,000 to $ 300,000 per month inference cost for a GPT-4 API (having 16K context length) for a complex use case as per AIM Research. Therefore, when using the ChatGPT API, it’s essential to keep track of the token usage and manage it effectively to control costs, just as you would with a website integration.
Initially, we observed a trend where companies leaned towards Azure this quarter for GPT-4, exclusively available there, which subsequently boosted Azure cloud’s revenue. However, things took an intriguing turn when Microsoft partnered with Meta to host Llama 2. This underscores the fact that open source LLMs possess a unique advantage that shouldn’t be overlooked.