The startup has also partnered with Databricks, a data infrastructure provider, which found that using Cleanlab reduced errors by 37% and increased test accuracy from 65% to 78% in an OpenAI Davinci model. Cleanlab's software has also been used by consulting firm Berkeley Research Group, saving a legal client about $30 million in costs. Despite competition from other startups offering data solutions, Cleanlab's founders believe their product's ability to improve models post-release sets it apart.
Key takeaways:
- Cleanlab, a startup founded by three MIT PhDs, offers software that can automatically label up to 90% of a raw, un-labelled data set and flag potential duplicates or errors. This helps users clean their data faster and cheaper for more accurate results.
- The company recently raised $25 million in a funding round co-led by Menlo Ventures and TQ Ventures, valuing Cleanlab at $100 million. Cloud heavyweight Databricks also joined the round as an investor and partner.
- Cleanlab's software has been available as a free, open-source version since 2017, with teams from Chase, Google, and Tesla among its users. The company launched its paid, enterprise version, Cleanlab Studio, in July.
- Despite competition from other startups offering data solutions, investors argue that Cleanlab is more than just a labeling company. It can also make models more valuable after their release, not just during their training, by measuring output.