OpenAI recently acquired Rockset, a data analytics company co-founded in 2016 by former Meta engineers Dhruba Borthakur and Venkat Venkataramani, for $105 million (INR 905 crore). This acquisition aims to leverage Rockset’s advanced analytics to enhance OpenAI’s retrieval infrastructure.
Venkataramani, a graduate of NIT Tiruchirappalli, played a crucial role in scaling Meta’s data infrastructure to handle billions of queries per second with high reliability and contributed to the development of technologies like TAO, Dragon, Memcache, McRouter, MySQL, RocksDB, HBase, MongoRocks, and MyRocks.
“AI has the opportunity to transform how people and organisations leverage their own data. That’s why we’ve acquired Rockset, a leading real-time analytics database that provides world-class data indexing and querying capabilities,” said OpenAI in its blog post.
“Rockset’s infrastructure empowers companies to transform their data into actionable intelligence. We’re excited to bring these benefits to our customers by integrating Rockset’s foundation into OpenAI products,” said OpenAI COO Brad Lightcap.
AI Agents To Lead the Way
Rockset will help OpenAI build AI agents capable of taking actions on the users’ behalf. “In the future, assistants will become agents to whom you will delegate certain data access rights and permissions to act on your behalf. It might start with simple tasks like booking a meeting or making a reservation, and then they will quickly get better from there,” posted Rockset founder Venkatramani on LinkedIn a month ago.
This is similar to what Microsoft chief Mustafa Suleyman recently said about GPT-6’s ability to take actions, which is most likely to come out in two years.
OpenAI CTO Mira Murati said in an interview that the next generation of GPT will be ‘PhD-level’ compared to GPT-3 (toddler) and GPT-4 (high school). She also said the next model will be released in a year and a half.
Rockset, based in San Mateo, California, employs approximately 88 people. The company has secured $109 million in funding from prominent investors, including Greylock Partners, Sequoia Capital, and Glynn Capital.
This funding, through a Series B, implies a $300-$500 million valuation. Despite substantial backing over eight years, the business has struggled to gain significant traction, with public figures indicating $10-20 million in revenue.
But why Rockset?
Borthakur said there are two sets of companies: one builds the latest AI models and the second comprises traditional data companies that store enterprise data to power business analytics and applications.
“And they (AI companies) want AI models to be able to leverage the data to power intelligent assistants and agents. So, the AI world and the data world are converging towards the same end goal,” he said
Rockset announced that it will join OpenAI to enhance the retrieval infrastructure for OpenAI’s product suite. “We will help OpenAI solve the complex database challenges that AI applications face at a massive scale,” the post said.
Rockset’s main product features real-time indexing, SQL-based search, and analytics capabilities that can process data from sources such as Kafka, MongoDB, and DynamoDB.
Its technology is optimised for quick search, filtering, aggregations, and joins, making it an excellent tool for applications that need real-time data processing and analytics.
Integrating SQL with advanced search capabilities will allow OpenAI to streamline its data workflows, reducing the complexity of data management and improving overall efficiency. This is particularly beneficial for research and development, where quick and accurate data analysis is essential.
OpenAI Loves RAG
Moreover, Rockset augments the capabilities of LLMs through Rockset’s retrieval augmented generation (RAG) feature. This would be a nice addition for OpenAI enterprise customers who want to use models with their proprietary data and aim to reduce hallucination.
By incorporating their data, customers can enhance LLMs to provide contextual and more accurate results. This integration expands the possibilities of LLMs in various applications, from content generation to information retrieval.
Rockset uses a converged indexing approach that combines row, columnar, and search indexes, enabling fast searches across multiple data types and formats. Currently, no other database company (MongoDB, Elasticsearch, or Amazon Redshift), offers this specific service.
While Elasticsearch excels in keyword search and Weaviate and Pinecone specialise in vector search, Rockset merges these capabilities to provide both precise keyword matches and semantically rich search results.
Rockset’s hybrid vector search combines traditional keyword search with vector search, allowing for more relevant and context-aware search results. This hybrid approach is particularly beneficial for OpenAI, which deals with massive volumes of unstructured data, including text, images, and audio.
Unlike some other solutions that require complex infrastructure management, Rockset offers a fully managed, serverless architecture. This reduces operational overhead and simplifies scalability. It eliminates the need for manual infrastructure management, allowing OpenAI to scale its operations seamlessly and efficiently.
This scalability is crucial for handling the vast amounts of data required for training large AI models, ensuring that OpenAI can continue to push the boundaries of AI research without being hindered by infrastructure limitations.
OpenAI is Not Alone
Databricks acquired MosaicML in 2024 for $1.3 billion to enhance its capabilities in generative AI and LLMs. Most recently, Databricks acquired Tabular as well to leverage Apache Iceberg, a leading open-source table format for data lakes.
Last year, ThoughtSpot, an AI-powered analytics platform last valued at $4.5 billion, acquired Mode Analytics, a business intelligence startup, for $200 million in cash and stock.
Recently, IBM acquired HashiCorp in a deal valued at $6.4 billion, aiming to expand its portfolio of cloud-based software products to capitalise on the growing demand for AI-powered solutions. The deal will be closed by the end of 2024.
HashiCorp’s technology will complement IBM subsidiary Red Hat, IBM AI platform Watsonx, the vendor’s consulting arm, and its offerings in data security and IT automation.The trend of acquisition of data and AI companies is likely to continue in 2024 as well.