We’re excited to announce that Curate, formerly known as DataOps, is now available!
Curate is Superb AI’s answer to the data-related questions we’ve all faced at one time or another. What data should I label first? Which data should I use for training vs. validation? How much data do I actually need? And so on.
With Curate, you can easily manage, curate, and visualize all your organization’s computer vision data in a single place. And use AI to answer all of the above questions with minimal manual effort, such as auto-curating a balanced slice for training that best represents your entire dataset or uncovering your most valuable data in the form of edge cases or potential mislabels.
At Superb AI, we believe the future of computer vision and machine learning is one where every organization, regardless of ML team size or resources, can build and deploy AI applications, and this product is the next step in our goal of making this a reality.
In this article, we’ll walk you through all the tools and features you can use today to achieve better model performance with much less data. Want to see it for yourself right away? Let our team know, and we’d be happy to give you a personalized demo.
Let’s get started with data management. With Curate, uploading and pipelining large volumes of data into one place as soon as it’s collected is easy. With this initial release, you can upload raw data and labels with associated annotations and metadata using our SDK. Soon, we’ll be working on many other upload mechanisms, including API and CLI, among others, and various forms of pipeline automation.
One thing that makes Curate unique is that it provides you access to proprietary, high-dimensional embedding generation algorithms. This eliminates the need to build, train, and maintain custom embedding models and infrastructure, which can carry a substantial price tag in data, computational resources, and in-house expertise. It also reduces, or can even eliminate, the need to rely on otherwise slow and cumbersome manual curation techniques.
How it works is simple. Whenever new images or objects are uploaded, high-quality embeddings are generated automatically, and Curate uses unsupervised learning to cluster the image or object data based on visual similarity. Our curation algorithm, which we’ll cover shortly, then uses this to select the data most suitable for your model needs automatically, such as a training or validating set.
Query and Slice
After uploading your data via the SDK, you can easily find and meaningfully group data into slices using any combination of metadata or annotation information tagged within your images. A slice, essentially a data subset, is a foundational concept in Curate, and they can be created manually, as above, or automatically via our AI tools. Using queries to create slices allows you to manually curate data as you see fit - without having to rely on old-school (and painful) search operators like file names.
Using queries combined with image-level views, with tile and scatter views coming soon at the object level, makes it easy to quickly find the exact data you need. Semantic search, also in development, will likewise reduce the time and cognitive effort required even further.
Plus, all your slices are saved within the platform and easy to find, so you can use them immediately or refer back to them whenever needed.
Too often, as machine learning engineers or project managers, we rely heavily on good old-fashioned intuition and brute force when curating data for computer vision. While there’s no doubt this can work from time to time to solve small problems, the risk of introducing issues like bias, subjectivity, and overfitting, among others, grows as the problem or task we are trying to solve grows. And, to be frank, it’s not all that scalable.
That’s where Auto-Curate comes in. Auto-Curate is an AI-based tool that uses high-quality curation algorithms to curate the following for you at scale:
Curating relevant data from visualized clusters was something we always wanted to develop internally as we deal with a ton of visually similar raw video frames. Superb AI’s new Auto Curation feature will help us effectively curate a balanced dataset while allowing us to move away from time or random-based sampling.
Want to know more about how our curation algorithms work and what they could do for you? We’ve been hard at work recently putting our AI to the test on popular datasets, and we’re pretty excited with the results. Here’s what we’ve published so far:
Scatter Visualization and Analytics
Stay tuned for many how-to guides on all of these new features!
Enjoyed this? Subscribe to our mailing list to hear more from us!
Learn More About Curate
As the scope and scale of your data and feature sets grow, the ability to build and manage performant models without increasing labeling time, effort, or spending is vital to maintaining ROI. That’s where Curate can help. Talk to our sales team today for a personalized look at how much easier intelligent curation can be!