A lot goes into making the videos we consume online—brainstorming ideas, writing scripts, editing, and recording voice-overs—and all of this consumes a substantial amount of the content creators’ time.
This is where generative AI can step in to ease the burden. It can be a great tool for the creators to streamline these processes and reduce the time spent on routine tasks.
Imagine an AI tool that lets you accomplish everything at one place with just a few prompts. InVideo AI promises exactly that, which is why content creators around the globe are falling in love with the platform.
According to Sanket Shah, co-founder and CEO of InVideo, the platform gets nearly 3 million new users every month.
“We don’t track the total number of people using our free product, but if we take 3 million on average, we could have around 35 million users on our platform in just one year. In terms of paid users, we have about 150,000 of them,” he said.
AIM met Shah in early August in the Bengaluru edition of AWS GenAI Loft, a collaborative gathering of developers, startups, and AI enthusiasts to learn, build, and connect.
Shah revealed that the startup has already secured over 50% of its $30 million revenue target for the year.
Avoiding the Uncanny Valley
Interestingly, InVideo has not developed a text-to-video model like OpenAI’s Sora or Kling. Instead, the startup has partnered with several media providers like Getty Images and Shutterstock and pays them a licence fee.
One primary reason for this approach, according to Shah, is that InVideo wants to provide users with publishable videos.
Currently, even though models like Sora produce high-quality videos, there is limited clarity about the datasets they are trained on, which complicates the publishability of these videos.
“Moreover, at InVideo, one of our core principles is to avoid anything that falls into the uncanny valley. As of last week, we felt that generative image and video technology still often produced results with issues like extra fingers or multiple eyes—things that are not acceptable for our purposes.
“We focus on delivering high-quality, publishable videos, so we prioritise ensuring that our users receive content that meets professional standards,” Shah said.
The platform leverages AI to understand the user’s intent, write script, handle voice-overs—practically cloning the user’s voice.
It selects and integrates media, performs editing tasks, adds music, and ensures that all elements like transitions and zooms are correctly applied, effectively automating much of the post-production process.
However, the startup does not shy away from leveraging models like Sora. Once available, they could contemplate integrating Sora and Kling into their platform. However, they are not in the business of building models.
“We don’t want to enter into a race with the hyperscalers (model builders) to build the next-big model. Models are also perishable and we have seen that already,” he said.
The AI in InVideo
While the startup is refraining from entering the territory of model builders like OpenAI, Anthropic, Microsoft, and Google, it is building models that suit its business model.
“We prefer to focus on developing models that are niche and highly specific to our needs. For instance, we are working on a lip-sync model tailored to our requirements,” Shah revealed.
The startup also plans a new AI-powered feature next month, which allows users to create an avatar or a digital clone of themselves.
“Here’s how it works– you record a short video, speaking for 30 seconds to a minute, and the AI generates an avatar. Once you have your avatar, you can input a prompt or specify what you want it to say,” Shah revealed.
The platform does leverage LLMs from Anthropic, OpenAI and Google, but Shah refrained from revealing much about their use cases. “This is very proprietary and that is where most of the magic happens.”
InVideo also leverages Amazon Bedrock, which gives them access to some of the top LLMs through a single API.
Moreover, the startup also leverages AWS’ multi-region fleet of Spot GPUs for video rendering and editing on open-source solutions, allowing them to run 90% of their workload on Spot instances, which enables close to 40% cost reduction.
Enabling Content Creators with AI
The startup started with a pre-AI product in 2017 and was initially focussed on enterprises. However, pivoting to AI and to a more B2C business model from B2B proved to be a game changer.
Today, it caters to YouTubers and established and new content creators creating content for Facebook, Instagram, and TikTok.
“The platform is also leveraged by small businesses, for example, a lady selling horses in Texas, a partially deaf teacher in Palo Alto who is teaching a community college, a bunch of students and teachers, someone selling water bottles, and some restaurants,” Shah revealed.
“About 5% of our customers are also filmmakers.”
When AIM asked Shah about some of the most fun and interesting videos he has seen users generate using the platform, he revealed that the brand manager of the legendary rock band Aerosmith used InVideo to generate content on how to deal with depression.
“One day, the brand manager received an email from a viewer who was contemplating suicide. After watching the video and following some advice, they decided against it. Stories like these are incredibly powerful,” Shah revealed.