UHG
Search
Close this search box.

How These Indian Startup Founders will Help OpenAI Build GPT-6

Rockset’s main product features real-time indexing, SQL-based search, and analytics capabilities that can process data from sources such as Kafka, MongoDB, and DynamoDB. 

Share

Illustration by Nikhil Kumar

OpenAI recently acquired Rockset, a data analytics company co-founded in 2016 by former Meta engineers Dhruba Borthakur and Venkat Venkataramani, for $105 million (INR 905 crore). This acquisition aims to leverage Rockset’s advanced analytics to enhance OpenAI’s retrieval infrastructure.

Venkataramani, a graduate of NIT Tiruchirappalli, played a crucial role in scaling Meta’s data infrastructure to handle billions of queries per second with high reliability and contributed to the development of technologies like TAO, Dragon, Memcache, McRouter, MySQL, RocksDB, HBase, MongoRocks, and MyRocks.

“AI has the opportunity to transform how people and organisations leverage their own data. That’s why we’ve acquired Rockset, a leading real-time analytics database that provides world-class data indexing and querying capabilities,” said OpenAI in its blog post.

“Rockset’s infrastructure empowers companies to transform their data into actionable intelligence. We’re excited to bring these benefits to our customers by integrating Rockset’s foundation into OpenAI products,” said OpenAI COO Brad Lightcap.

AI Agents To Lead the Way

Rockset will help OpenAI build AI agents capable of taking actions on the users’ behalf.  “In the future, assistants will become agents to whom you will delegate certain data access rights and permissions to act on your behalf. It might start with simple tasks like booking a meeting or making a reservation, and then they will quickly get better from there,” posted Rockset founder Venkatramani on LinkedIn a month ago.

This is similar to what Microsoft chief Mustafa Suleyman recently said about GPT-6’s ability to take actions, which is most likely to come out in two years. 

OpenAI CTO Mira Murati said in an interview that the next generation of GPT will be ‘PhD-level’ compared to GPT-3 (toddler) and GPT-4 (high school). She also said the next model will be released in a year and a half.

Rockset, based in San Mateo, California, employs approximately 88 people. The company has secured $109 million in funding from prominent investors, including Greylock Partners, Sequoia Capital, and Glynn Capital. 

This funding, through a Series B, implies a $300-$500 million valuation. Despite substantial backing over eight years, the business has struggled to gain significant traction, with public figures indicating $10-20 million in revenue.

But why Rockset? 

Borthakur said there are two sets of companies: one builds the latest AI models and the second comprises traditional data companies that store enterprise data to power business analytics and applications.

“And they (AI companies) want AI models to be able to leverage the data to power intelligent assistants and agents. So, the AI world and the data world are converging towards the same end goal,” he said 

Rockset announced that it will join OpenAI to enhance the retrieval infrastructure for OpenAI’s product suite. “We will help OpenAI solve the complex database challenges that AI applications face at a massive scale,” the post said.

Rockset’s main product features real-time indexing, SQL-based search, and analytics capabilities that can process data from sources such as Kafka, MongoDB, and DynamoDB. 

Its technology is optimised for quick search, filtering, aggregations, and joins, making it an excellent tool for applications that need real-time data processing and analytics. 

Integrating SQL with advanced search capabilities will allow OpenAI to streamline its data workflows, reducing the complexity of data management and improving overall efficiency. This is particularly beneficial for research and development, where quick and accurate data analysis is essential.

OpenAI Loves RAG 

Moreover, Rockset augments the capabilities of LLMs through Rockset’s retrieval augmented generation (RAG) feature. This would be a nice addition for OpenAI enterprise customers who want to use models with their proprietary data and aim to reduce hallucination.

By incorporating their data, customers can enhance LLMs to provide contextual and more accurate results. This integration expands the possibilities of LLMs in various applications, from content generation to information retrieval.

Rockset uses a converged indexing approach that combines row, columnar, and search indexes, enabling fast searches across multiple data types and formats. Currently, no other database company (MongoDB, Elasticsearch, or Amazon Redshift), offers this specific service.

While Elasticsearch excels in keyword search and Weaviate and Pinecone specialise in vector search, Rockset merges these capabilities to provide both precise keyword matches and semantically rich search results. 

Rockset’s hybrid vector search combines traditional keyword search with vector search, allowing for more relevant and context-aware search results. This hybrid approach is particularly beneficial for OpenAI, which deals with massive volumes of unstructured data, including text, images, and audio.

Unlike some other solutions that require complex infrastructure management, Rockset offers a fully managed, serverless architecture. This reduces operational overhead and simplifies scalability. It eliminates the need for manual infrastructure management, allowing OpenAI to scale its operations seamlessly and efficiently.

This scalability is crucial for handling the vast amounts of data required for training large AI models, ensuring that OpenAI can continue to push the boundaries of AI research without being hindered by infrastructure limitations.

OpenAI is Not Alone

Databricks acquired MosaicML in 2024 for $1.3 billion to enhance its capabilities in generative AI and LLMs. Most recently, Databricks acquired Tabular as well to leverage Apache Iceberg, a leading open-source table format for data lakes.

Last year, ThoughtSpot, an AI-powered analytics platform last valued at $4.5 billion, acquired Mode Analytics, a business intelligence startup, for $200 million in cash and stock.

Recently, IBM acquired HashiCorp in a deal valued at $6.4 billion, aiming to expand its portfolio of cloud-based software products to capitalise on the growing demand for AI-powered solutions. The deal will be closed by the end of 2024.

HashiCorp’s technology will complement IBM subsidiary Red Hat, IBM AI platform Watsonx, the vendor’s consulting arm, and its offerings in data security and IT automation.The trend of acquisition of data and AI companies is likely to continue in 2024 as well.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.