Comet partners with Snowflake to improve the reproducibility of machine learning datasets

Join top executives in San Francisco July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more

MLOps platform Comettoday announced a strategic partnership with Snowflake aimed at enabling data scientists to build superior machine learning (ML) models at an accelerated pace.

Comet said the collaboration will enable Comets solutions to be integrated into the Snowflakes unified platform, allowing developers to track and version their Snowflake queries and datasets within their Snowflake environment.

Comet says this integration will enable the lineage and performance of a model to be traced, providing greater visibility and insight than traditional development processes. It will also impact model performance in response to changes in the data.

Overall, the company believes that using Snowflake data in the Comet platform will result in a streamlined and more transparent model development process.


Transform 2023

Join us in San Francisco July 11-12, where top executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.

subscribe now

Faster model training, deployment and monitoring

The combination of the Snowflakes Data Cloud platform and Comets ML will allow customers to build, train, deploy and monitor models much faster, according to the companies.

Additionally, this partnership fosters a feedback loop between model development at Comet and data management at Snowflake, Comet CEO Gideon Mendels told VentureBeat.

>>Don’t miss our special issue: Building the Foundation for Customer Data Quality.<

Mendels said integrating such a loop can continuously improve models and bridge the gap between model experimentation and model deployment, while delivering the key promise of machine learning, which is the ability to learn and adapt over time. He said clear versioning between datasets and models will enable organizations to better address data changes and their impact on models in production.

Comet’s new offering follows the recent release of a suite of tools and integrations designed to accelerate workflows for data scientists working with large language models (LLMs).

Improve ML models through constant feedback

When data scientists or developers run queries to pull datasets from Snowflake for their ML models, Comet will be able to register, versione and directly link these queries to the resulting models.

Mendels said this approach offers several benefits, including increased reproducibility, collaboration, auditability, and iterative improvement.

The integration between Comet and Snowflake aims to provide a more robust, transparent and efficient framework for ML development by enabling monitoring and versioning of Snowflake queries and datasets within Snowflake itself, he explained. By versioning SQL queries and datasets, data scientists can always determine the exact version of data used to train a specific version of the model. This is crucial for the reproducibility of the model.

Track changes in model performance to data alterations

In ML, the training data is just as important as the model itself. Data alterations, such as introducing new features, fixing missing values, or changing data distributions, can profoundly affect the performance of a model.

Comet says that by tracing a model’s lineage, it becomes possible to establish a connection between changes in model performance and specific alterations in the data. This not only helps in debugging and performance understanding, but drives data quality and feature engineering.

Mendels said tracking queries and data over time can create a feedback loop that drives continuous improvements in both the data management and model development stages.

Model lineage can facilitate collaboration among a team of data scientists, as it allows anyone to understand the history of a model and how it was developed without the need for extensive documentation, Mendels said. This is especially useful when team members leave or when new members join the team, allowing for seamless knowledge transfer.

What is the future of Comet?

The company says customers currently using Comets like Uber, Etsy, and Shopify typically report a 70% to 80% improvement in their ML speed.

This is due to faster research cycles, the ability to understand model performance and detect problems faster, better collaboration and more, Mendels said. With the joint solution, this should increase even more as there are still challenges today in connecting the two systems. Customers save on input and consumption costs by keeping data within Snowflake instead of transferring it over the wire and saving it to other locations.

Mendels said Comet aims to establish itself as the in fact AI development platform.

Our view is that companies will see real value from AI only after they implement these models based on their own data, he said. Whether they’re training from scratch, fine-tuning an OSS model, or using context input in ChatGPT, Comets’ job is to make this process seamless and bridge the gap between research and production.

VentureBeat’s mission it is to be a digital city square for technical decision makers to gain insights into transformative business technology and transactions. Discover our Briefings.

#Comet #partners #Snowflake #improve #reproducibility #machine #learning #datasets

Leave a Comment