Join top executives in San Francisco July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more
Refuel AI, a company that uses large language models (LLMs) to generate high-quality training data for AI models, today snuck out with $5.2 million in seed funding. The company said it would use the round to grow its team and build the capabilities of its platforms, preparing it for a commercial launch in July.
Founded by Stanford graduates Nihit Desai and Rishabh Bhargava, Refuel has also opened access to AutoLabel, an open source library that allows any AI team to easily label their data in their environment and with any desired LLM.
>>Don’t miss our special issue: Building the Foundation for Customer Data Quality.<
The offerings come as a response to data challenges that are slowing the development of AI, preventing companies from incorporating next-generation technology into their products and business functions.
Join us in San Francisco July 11-12, where top executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.
Every AI company needs AI-ready data
Today, every business is racing to become an AI company, collaborating with internal experts and third-party vendors to develop models that can address different business-specific use cases. The task can be daunting, but every AI project has the same starting point: clean, labeled data. If this is done right, the project can easily come to life.
Now, while companies have a lot of data, not everyone is ready for training by default. Information needs to be cleaned and annotated for model training, a task that is typically handled by human teams and takes weeks or months. This just doesn’t fit the needs of AI today.
Many teams [we spoke to] they had all these amazing ideas for models they wanted to train and products they wanted to build if only they had the data ready to train. We realized at that moment that making clean, labeled data available at the speed of thought was what we wanted to focus on, Bhargava told VentureBeat.
Then, in 2021, the duo started Refuel and went on to build a dedicated platform using specialized LLMs to automate the creation and labeling of datasets (at or better than human quality) for every company and every case of use.
According to the company, business users will be able to use the platform by simply uploading their own datasets and instructing LLMs to label the data. They could also provide guidelines and some examples to ensure that only high-quality training-ready data is provided.
Within an hour, they (the users) will have enough data to start training their AI models, which they can then seamlessly plug into their model training infrastructure. As these teams collect more data (especially from production), they can route it to Refuel for labeling, measure performance, and enhance their datasets for model retraining, the CEO added.
In private beta testing of select companies, the offering has been found to speed up the data creation and labeling process by up to 100%. Bhargava didn’t share the names of these companies, but noted that Refuel AI is seeing interest from multiple verticals, from social media and fintech to healthcare, HR and e-commerce.
The road ahead
With this round, jointly led by General Catalyst and XYZ Ventures, Refuel plans to grow its engineering team from six to 12 members and further invest in the platform and its LLM infrastructure to prepare for commercial launch by the end of July. The company will also invest capital in its open source library and community.
As a concrete example, we’re running a competition to push the boundaries of LLM-based data labeling, with prizes up to $10,000, Bhargava noted.
Currently, in the data labeling space, the company competes with players like Tasq AI, Snorkel AI, and SuperAnnotate.
VentureBeat’s mission it is to be a digital city square for technical decision makers to gain insights into transformative business technology and transactions. Discover our Briefings.
#Refuel #raises #create #LLM #trainingready #datasets