Developing High-Quality Computer Vision Applications with Superb AI Suite and NVIDIA TAO Toolkit

Superb AI

Superb AI

2023/2/22 · 5 min
Image of Superb AI computer vision tools in action tagging images.

Data labeling is essential for all forms of supervised learning (where an entire dataset is fully labeled) and a key ingredient of semi-supervised learning (which combines a smaller set of labeled data with algorithms designed to automate the labeling of the rest of the dataset programmatically). Labeling is essential to computer vision, one of the most advanced and developed areas of machine learning.

Model training is another major bottleneck in machine learning, alongside labeling. Whereas labeling is slow because it requires scaling a distributed human labor team, training is slow because it involves waiting for machines to finish complex calculations. It requires teams to know about networking, distributed systems, storage, specialized processors (GPUs or TPUs), and cloud management systems (Kubernetes and Docker).

In the 2022 report by the AI Infrastructure Alliance, labeling and training are consistently ranked as the most significant challenges teams face when building their AI/ML infrastructure. Both are essential steps in the ML application development process, and if not done correctly, they can lead to inaccurate results and decreased performance.

Introducing Superb AI Suite and NVIDIA TAO

Superb AI has introduced a revolutionary way for computer vision teams to drastically decrease the time it takes to deliver high-quality training datasets. Instead of relying on human labelers for a majority of the data preparation workflow, teams can now implement a much more time- and cost-efficient pipeline with the Superb AI Suite.

NVIDIA TAO Toolkit, built on TensorFlow and PyTorch, is a low-code version of the NVIDIA TAO framework that accelerates the model development process by abstracting away the framework complexity. The TAO toolkit enables you to utilize the power of transfer learning to fine-tune NVIDIA pre-trained models with your own data and optimize for inference.

Computer vision engineers can use the Superb AI Suite and the NVIDIA TAO Toolkit in combination to address the challenges of data labeling and model training. More specifically, they can quickly generate labeled data in Suite and train models with TAO to perform specific computer vision tasks, whether classification, detection, or segmentation.

A Step-By-Step Guide to Using Superb AI Suite and NVIDIA TAO

In this tutorial, we’ll demonstrate how to use Superb AI Suite to prepare a high-quality computer vision dataset that’s compatible with NVIDIA TAO Toolkit. We’ll walk through the process of downloading the COCO dataset, creating a new project on Suite, uploading our data to the project via Suite SDK, using Superb AI’s Auto-Label capability to quickly label our dataset, exporting the labeled dataset, and setting up a TAO toolkit configuration to use our data! 

1 - Get Started with Suite SDK

First, head over to superb-ai.com to make an account. After creating your account, you can follow this quick-start guide to install and authenticate Suite CLI. You should be able to install the latest version of spb-cli and retrieve the Suite Account Name / Access Key for authentication.

2 - Download The COCO Dataset

For this tutorial, we’ll work with the COCO dataset, a large-scale object detection, segmentation, and captioning dataset that is very popular in the computer vision research community.

You can use this code snippet to download the dataset. Save it in a file called download-coco.sh and run bash download-coco.sh from the terminal. This will create a data/ directory that stores the COCO dataset.

The next step is to convert COCO to Suite SDK format. We’ll sample the 5 most frequent classes in the COCO validation 2017 dataset. For this tutorial, we’ll handle bounding box annotations only, but Suite can also handle polygons and key points.

You can use this code snippet to perform the conversion. Save it in a file called convert.py and run python convert.py from the terminal. This will create an upload-info.json file that stores information about the image name and annotations.

3 - Create A Project in Suite

Creating projects via SDK is a work in progress. In this tutorial, we’ll create a project on the web by following Superb AI’s official guide for project creation. In this tutorial, we’ll follow this configuration:

Choose the “Image” data type.

  1. Set the “Project Name” as “CocoTest.”

  2. Select the “Annotation Type” as “Bounding Box.”

  3. Create 5 object classes that match the class names of COCO class names: ['person', 'car', 'chair', 'book', 'bottle'].

By the end of this process, we can view the main project page as seen below:

4 - Upload Data via the SDK

After we finish creating the project, we’ll start uploading the data. You can use this code snippet to accomplish this task. Save it in a file called upload.py and run python upload.py --project CocoTest --dataset coco-dataset in the terminal. That means CocoTest is the project name and coco-dataset is the dataset name. This will kickstart the uploading process, which will take a while, depending on the processing power of the device. For example, it takes us close to 2.5 hours to finish uploading the dataset.

Snapshot of code snippet to load CoCo data into Superb AI Suite.We can check the uploaded dataset via the Suite web page in real-time.

Screenshot of Superb AI Suite dataset upload.

5 - Use Auto-Label to Label The Dataset

The next step is to label the COCO dataset. To do so quickly, we’ll utilize Suite’s powerful automated labeling capabilities. More specifically, Auto-Label and Custom Auto-Label are both powerful tools that can boost labeling efficiency by automatically detecting objects and labeling them. Auto-Label is a pre-trained model developed by Superb AI that detects and labels 100+ common objects, whereas Custom Auto-Label is a model trained using your own data that detects and labels niche objects.


Since the COCO data in this tutorial is composed of 5 common objects that Auto-Label is capable of labeling, we’ll use Auto-Label. Follow this official guide to configure up Auto-Label. The important thing to remember is that you would want to choose the “MSCOCO Box CAL” as the Auto-Label AI and map the object names with the respective applied objects.

Screenshot of Superb AI Suite Auto-Label review.After the Auto-Label finishes running (it took about an hour to process all 3,283 labels in the COCO dataset), it informs us how “difficult” each automated labeling task was (red is difficult, yellow is moderate, and green is easy). The higher the difficulty is, the more likely that the Auto-Label incorrectly labeled that image. This “difficulty”, or estimated uncertainty, is calculated based on factors such as small object size, bad lighting conditions, complex scenes, and so on. In a real-world situation, you can easily sort and filter labels by difficulty in order to prioritize going over labels with a higher chance of errors.

6 - Export the Labeled Dataset at the Suite

After obtaining the labeled dataset, you want to export and download the labels. There is more to a label than just the annotation information. In order to fully utilize a label for training ML models, you must know additional information, such as the project configuration and meta-information about the raw data. In order to download all this information along with the annotation files, you must first request an export so that the Suite system can create a zip file for download. Follow this official guide to export and download labels from the Suite.

Example of CoCo test labels test in Superb AI Suite.When you export labels, a compressed zip file will be created for you to download. The export result folder will contain general information regarding the project as a whole, annotation information for each label, and the metadata for each data asset. More details are on this page.

7 - Convert The Output To COCO Format

From here, you need a script to convert your labeled data to a format that’s feedable to the TAO Toolkit, such as the COCO format. Since we are working with the COCO dataset in this tutorial, the data is already in the COCO format! For instance, you can find the JSON file below of a random exported label:
{
  "objects": [
      {
          "id": "7e9fe8ee-50c7-4d4f-9e2c-145d894a8a26",
          "class_id": "7b8205ef-b251-450c-b628-e6b9cac1a457",
          "class_name": "person",
          "annotation_type": "box",
          "annotation": {
              "multiple": false,
              "coord": {
                  "x": 275.47,
                  "y": 49.27,
                  "width": 86.39999999999998,
                  "height": 102.25
              },
              "meta": {},
              "difficulty": 0,
              "uncertainty": 0.0045
          },
          "properties": []
      },
      {
          "id": "70257635-801f-4cad-856a-ef0fdbfdf613",
          "class_id": "7b8205ef-b251-450c-b628-e6b9cac1a457",
          "class_name": "person",
          "annotation_type": "box",
          "annotation": {
              "multiple": false,
              "coord": {
                  "x": 155.64,
                  "y": 40.61,
                  "width": 98.34,
                  "height": 113.05
              },
              "meta": {},
              "difficulty": 0,
              "uncertainty": 0.0127
          },
          "properties": []
      }
  ],
  "categories": {
      "properties": []
  },
  "difficulty": 1
}

8 - Prepare The Labeled Data for Model Training 

You want to pull the COCO data from Suite into model development by using SuiteDataset. SuiteDataset makes an exported dataset at the Suite accessible via PyTorch’s data pipeline. The code snippet below (also accessible in this URL) instantiates the SuiteDataset object class for your training set.

class SuiteDataset(Dataset):
  """
  Instantiate the SuiteDataset object class for training set
  """
  def init(
          self,
          team_name: str,
          access_key: str,
          project_name: str,
          export_name: str,
          train: bool,
          caching_image: bool = True,
          transforms: Optional[List[Callable]] = None,
          category_names: Optional[List[str]] = None,
  ):
      """Function to initialize the object class"""
      super().__init__()

      # Get project setting and export information through the SDK
      # Initialize the Python Client
      client = spb.sdk.Client(team_name=team_name, access_key=access_key, project_name=project_name)
      # Use get_export
      export_info = call_with_retry(client.get_export, name=export_name)
      # Download the export compressed file through download_url in Export
      export_data = call_with_retry(urlopen, export_info.download_url).read()

      # Load the export compressed file into memory
      with ZipFile(BytesIO(export_data), 'r') as export:
          label_files = [f for f in export.namelist() if f.startswith('labels/')]
          label_interface = json.loads(export.open('project.json', 'r').read())
          category_infos = label_interface.get('object_detection', {}).get('object_classes', [])

      cache_dir = None
      if caching_image:
          cache_dir = f'/tmp/{team_name}/{project_name}'
          os.makedirs(cache_dir, exist_ok=True)

      self.client = client
      self.export_data = export_data
      self.categories = [
          {'id': i + 1, 'name': cat['name'], 'type': cat['annotation_type']}
          for i, cat in enumerate(category_infos)
      ]
      self.category_id_map = {cat['id']: i + 1 for i, cat in enumerate(category_infos)}
      self.transforms = build_transforms(train, self.categories, transforms, category_names)
      self.cache_dir = cache_dir

      # Convert label_files to numpy array and use
      self.label_files = np.array(label_files).astype(np.string_)

  def len(self):
      """Function to return the number of label files"""
      return len(self.label_files)

  def getitem(self, idx):
      """Function to get an item"""
      idx = idx if idx >= 0 else len(self) + idx
      if idx < 0 or idx >= len(self):
          raise IndexError(f'index out of range')

      image_id = idx + 1
      label_file = self.label_files[idx].decode('ascii')

      # Load label information corresponding to idx from the export compressed file into memory
      with ZipFile(BytesIO(self.export_data), 'r') as export:
          label = load_label(export, label_file, self.category_id_map, image_id)

      # Download the image through the Suite sdk based on label_id
      try:
          image = load_image(self.client, label['label_id'], self.cache_dir)
      # Download data in real time using get_data from Suite sdk
      except Exception as e:
          print(f'Failed to load the {idx}-th image due to {repr(e)}, getting {idx + 1}-th data instead')
          return self.__getitem__(idx + 1)

      target = {
          'image_id': image_id,
          'label_id': label['label_id'],
          'annotations': label['annotations'],
      }

      if self.transforms is not None:
          image, target = self.transforms(image, target)
      return image, target

Next, you want to handle the test set in a similar fashion. The code snippet below (also accessible in this URL) instantiates the SuiteCocoDataset object class for the test set by wrapping SuiteDataset to make it compatible with torchvision’s official COCOEvaluator).

class SuiteCocoDataset(C.CocoDetection):
  """
  Instantiate the SuiteCocoDataset object class for test set
  (by wrapping SuiteDataset to make compatible with torchvision's official COCOEvaluator)
  """
  def init(
          self,
          team_name: str,
          access_key: str,
          project_name: str,
          export_name: str,
          train: bool,
          caching_image: bool = True,
          transforms: Optional[List[Callable]] = None,
          category_names: Optional[List[str]] = None,
          num_init_workers: int = 20,
  ):
      """Function to initialize the object class"""
      super().__init__(img_folder='', ann_file=None, transforms=None)

      # Call the SuiteDataset class
      dataset = SuiteDataset(
          team_name, access_key, project_name, export_name,
          train=False, transforms=[],
          caching_image=caching_image, category_names=category_names,
      )
      self.client = dataset.client
      self.cache_dir = dataset.cache_dir

      self.coco = build_coco_dataset(dataset, num_init_workers)
      self.ids = list(sorted(self.coco.imgs.keys()))
      self._transforms = build_transforms(train, dataset.categories, transforms, category_names)

  def loadimage(self, id: int):
      """Function to load an image"""
      label_id = self.coco.loadImgs(id)[0]['label_id']
      image = load_image(self.client, label_id, self.cache_dir)
      return image

  def getitem(self, idx):
      """Function to get an item"""
      try:
          return super().__getitem__(idx)
      except Exception as e:
          print(f'Failed to load the {idx}-th image due to {repr(e)}, getting {idx + 1}-th data instead')
          return self.__getitem__(idx + 1)

SuiteDataset and SuiteCocoDataset can then be used for your training code. The code snippet below (also accessible in this URL) illustrates how to use them. During model development, you want to train with train_loader and evaluate with test_loader.

train_dataset = SuiteDataset(
  team_name=args.team_name,
  access_key=args.access_key,
  project_name=args.project_name,
  export_name=args.train_export_name,
  caching_image=args.caching_image,
  train=True,
)
test_dataset = SuiteCocoDataset(
  team_name=args.team_name,
  access_key=args.access_key,
  project_name=args.project_name,
  export_name=args.test_export_name,
  caching_image=args.caching_image,
  train=False,
  num_init_workers=args.workers,
)

train_loader = DataLoader(
  train_dataset, num_workers=args.workers,
  batch_sampler=G.GroupedBatchSampler(
      RandomSampler(train_dataset),
      G.create_aspect_ratio_groups(train_dataset, k=3),
      args.batch_size,
  ),
  collate_fn=collate_fn,
)
test_loader = DataLoader(
  test_dataset, num_workers=args.workers,
  sampler=SequentialSampler(test_dataset), batch_size=1,
  collate_fn=collate_fn,
)

9 - Train Your Model with NVIDIA TAO

Your data annotated with Suite can now be used to train your object detection model. The TAO Toolkit allows you to train, fine-tune, prune, and export highly optimized and accurate Computer Vision models for deployment by adapting popular network architectures and backbones to your data. For this tutorial, you can choose YOLOv4, an object detection model included in TAO.

First, download the notebook samples.

$ wget --content-disposition

https://api.ngc.nvidia.com/v2/resources/nvidia/tlt_cv_samples/versions/v1.0.2/zip -O tlt_cv_samples_v1.0.2.zip

$ unzip -u tlt_cv_samples_v1.0.2.zip  -d ./tlt_cv_samples_v1.0.2 && rm -rf tlt_cv_samples_v1.0.2.zip && cd ./tlt_cv_samples_v1.0.2

After the notebook samples are downloaded, you may start the notebook using the following commands:

$ jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root

Open the internet browser on localhost and open the following URL: http://0.0.0.0:8888

Because you are creating a YOLOv4 model, open the yolo_v4/yolo_v4.ipynb notebook. Follow the notebook instructions to train the model.

Based on the results, fine-tune the model until it achieves your metric goals. If desired, you can create your own active learning loop at this stage. In a real-world scenario, you’ll want to query samples of failed predictions, assign human labelers to annotate this new batch of sample data, and supplement your model with newly labeled training data. Superb AI Suite can further assist you with data collection and annotation in subsequent rounds of model development as you iteratively improve your model performance.

Recently, NVIDIA has released Version 4.0 of TAO, which makes it even easier to get started and create high-accuracy models without needing any AI expertise. In this release, you can automatically fine-tune your hyperparameters with AutoML, experience turnkey deployment of TAO into various cloud services, integrate TAO with third-party MLOPs services, and explore new transformer-based vision models (CitySemSegformer, Peoplenet Transformer). We recommend experimenting with TAO on this Google Colab notebook.

Conclusion

Data labeling in computer vision can present many unique challenges. It can be difficult and expensive due to the amount of data that needs to be labeled. Additionally, it can be subjective since different people may interpret and label an image or video frame differently, making it hard to achieve consistently high-quality labeled outputs across a large dataset.

Model training can be challenging as well. There is a plethora of algorithms and hyper-parameters that require tuning and optimization. This process requires a deep understanding of the data/the model and significant experimentation to achieve the best results. Additionally, computer vision models tend to require a lot of computing power to train, making it difficult to do so on a limited budget and timeline.

Superb AI Suite lets you collect and label high-quality computer vision datasets. NVIDIA TAO Toolkit lets you train and optimize pre-trained computer vision models. Their combination accelerates your computer vision application development times by a hundredfold without sacrificing quality.

For more information, see the following resources:

About Superb AI

Superb AI provides a training data platform that makes building, managing, and curating computer vision datasets faster and easier than ever before. Specializing in adaptable automation models for labeling and quality assurance, our solutions help companies drastically reduce the time and cost of building data pipelines for computer vision models. Launched in 2018 by researchers and engineers with decades of experience in computer vision and deep learning (including 25+ publications, 7,300+ citations, and 100+ patents), our vision is to empower companies at all stages to develop computer vision applications faster than ever before.

Superb AI is also a proud partner with NVIDIA through the NVIDIA Inception Program for Startups. This program helps nurture the development of the world’s cutting-edge startups, providing them with access to NVIDIA’s technologies and experts, opportunities to connect with venture capitalists, and co-marketing support to heighten their visibility