2023/4/25 | 3 min read
Utilize Superb Curate to build and manage performant models without increasing labeling time, effort, or spending is vital to maintaining ROI.

We’re excited to announce that Curate, formerly known as DataOps, is now available!

Curate is Superb AI’s answer to the data-related questions we’ve all faced at one time or another. What data should I label first? Which data should I use for training vs. validation? How much data do I actually need? And so on.

With Curate, you can easily manage, curate, and visualize all your organization’s computer vision data in a single place. And use AI to answer all of the above questions with minimal manual effort, such as auto-curating a balanced slice for training that best represents your entire dataset or uncovering your most valuable data in the form of edge cases or potential mislabels. 

At Superb AI, we believe the future of computer vision and machine learning is one where every organization, regardless of ML team size or resources, can build and deploy AI applications, and this product is the next step in our goal of making this a reality. 

In this article, we’ll walk you through all the tools and features you can use today to achieve better model performance with much less data. Want to see it for yourself right away? Let our team know, and we’d be happy to give you a personalized demo.

Data Management

Graph represents how easy it is to streamline computer vision data upload and pipeline management with Superb Curate.Uploading and pipelining large volumes of data into one place is easy with Superb Curate.

Let’s get started with data management. With Curate, uploading and pipelining large volumes of data into one place as soon as it’s collected is easy. With this initial release, you can upload raw data and labels with associated annotations and metadata using our SDK. Soon, we’ll be working on many other upload mechanisms, including API and CLI, among others, and various forms of pipeline automation.

Embedding Store

Example of the proprietary, high-dimensional embeddings available with Superb Curate to help fine-tune your computer vision data and machine learning models.Get access to proprietary, high-dimensional embedding generation algorithms.

One thing that makes Curate unique is that it provides you access to proprietary, high-dimensional embedding generation algorithms. This eliminates the need to build, train, and maintain custom embedding models and infrastructure, which can carry a substantial price tag in data, computational resources, and in-house expertise. It also reduces, or can even eliminate, the need to rely on otherwise slow and cumbersome manual curation techniques.

How it works is simple. Whenever new images or objects are uploaded, high-quality embeddings are generated automatically, and Curate uses unsupervised learning to cluster the image or object data based on visual similarity. Our curation algorithm, which we’ll cover shortly, then uses this to select the data most suitable for your model needs automatically, such as a training or validating set.

Query and Slice

Snapshot of Superb Curate query builder to help improve computer vision data curation processing.Easily find and meaningfully group data into slices using any combination of metadata or annotation information tagged within your images.

After uploading your data via the SDK, you can easily find and meaningfully group data into slices using any combination of metadata or annotation information tagged within your images.  A slice, essentially a data subset, is a foundational concept in Curate, and they can be created manually, as above, or automatically via our AI tools. Using queries to create slices allows you to manually curate data as you see fit - without having to rely on old-school (and painful) search operators like file names. 

Using queries combined with image-level views, with tile and scatter views coming soon at the object level, makes it easy to quickly find the exact data you need. Semantic search, also in development, will likewise reduce the time and cognitive effort required even further.

Plus, all your slices are saved within the platform and easy to find, so you can use them immediately or refer back to them whenever needed. 

Auto Curate

Snapshot of Superb AI Curate computer vision toolAuto-Curate is an AI-based tool that uses high-quality curation algorithms to curate computer vision demands scale

Too often, as machine learning engineers or project managers, we rely heavily on good old-fashioned intuition and brute force when curating data for computer vision. While there’s no doubt this can work from time to time to solve small problems, the risk of introducing issues like bias, subjectivity, and overfitting, among others, grows as the problem or task we are trying to solve grows. And, to be frank, it’s not all that scalable.

That’s where Auto-Curate comes in. Auto-Curate is an AI-based tool that uses high-quality curation algorithms to curate the following for you at scale:

  • Training sets

  • Validating sets

  • Edge cases

  • Mislabels

Curating relevant data from visualized clusters was something we always wanted to develop internally as we deal with a ton of visually similar raw video frames. Superb AI’s new Auto Curation feature will help us effectively curate a balanced dataset while allowing us to move away from time or random-based sampling.

Yongjin Shin

ML Engineer at ioCrops

Want to know more about how our curation algorithms work and what they could do for you? We’ve been hard at work recently putting our AI to the test on popular datasets, and we’re pretty excited with the results. Here’s what we’ve published so far:

Scatter Visualization and Analytics

Snapshot of Superb Curate cluster visualization of computer vision data.Scatter Visualization and Analytics
Finally, we have some tools to help you better understand patterns and potential outliers in your datasets. To start, we have scatter visualization, which uses embeddings to cluster images or objects over a two-dimensional space based on visual similarities, allowing you to visualize distribution in seconds. Another way is by consulting our in-depth analytics dashboards, which provide insight into the distribution of metadata, annotation types, and object classes in your data pool, among other helpful information. 

Stay tuned for many how-to guides on all of these new features!

