Best Practices for Annotating Images in Computer Vision Datasets

Hanan Othman

Hanan Othman

Content Writer | 2022/9/27 | 7 min read

Computer vision datasets are some of the most valuable resources available to researchers and developers today. These datasets feature a variety of images and labels that enable machine learning engineers to train models for advanced functions such as detecting objects, faces, or other visual features within their deployed environment. However, with many of these datasets, it can be difficult to understand what needs to be specifically annotated and to properly do so.

For that reason, to get the most out of CV data, it's important to make the annotation process as thorough as possible. This is a common challenge in the ML industry, as images vary in complexity, on a batch or individual dataset level; some due to appearance or image content, others because of subject matter.

To simplify image annotation efforts, several specific strategies can be employed to identify the most crucial regions or details within images that are relevant to their target industry and development goals. Read on to learn about a number of essential best practices for how to annotate images in computer vision datasets.

The Different Types of Visual Data

There's a limitless amount of data being generated every minute of every day and on a global scale. That data comes in many forms, the most prevalent being; speech, video text, or image. Often, they're utilized at once - coming together and intermingled to give AI applications the ability to interpret a complex and fast-paced world once deployed.

Of the selection cited above, the most notable of all is image data. As most data scientists would know; on their own and in raw form, images need a considerable amount of sorting, review, and refinement to be considered usable in operating any AI/ML application framework. Using computer vision techniques such as image classification is only the bare basis of greater best practice efforts to annotate imagery or any other type of data that an ML project might need.

Depending on the visual data type, other techniques, such as localization, segmentation, and object detection, should be used interchangeably to fulfill a distinct purpose during different stages of the data processing pipeline. With multiple methods of annotation available, there's also more leeway to experiment and opportunity for new avenues to explore for prepping data from one use case to another. Different models require different types of data, and that data may need to be more or less diverse and specialized to provide the most reliable and consistent results.

Models that are stable and make accurate predictions are simply easier to produce if the data is exclusive to their training needs. Achieving that degree of customization in the types of datasets fed into them requires meticulous work on the annotation and data labeling side. As like with any tool, the host of CV annotation features are only at their most effective when they're used as instructed and intended.

Best Practices for Image Annotation

Any CV project has the primary aim to develop a DL model that can precisely detect real-world objects in its environment. Those objects are read or processed as input data in the form of images or videos. A variety of models; from those powering satellites to others that allow self-driving vehicles to navigate the streets safely, depend on image segmentation techniques.

These techniques were developed and put into practice as effective CV-powered methods for models to best interpret the images they're trained by. Making it crucial that these methods offer a well-rounded representation of the data that is most relevant to each individual model. To achieve this, there are three distinct approaches that image segmentation can be split into: semantic, instance, and panoptic.


Semantic segmentation helps annotators to identify boundaries between similar objects. Through the method, a model is meant to divide each object within an image into separate classes, but it doesn't go beyond that function. Essentially, this method is best for top-level analysis of an image and it's contents, but for anything that requires more granular detail, opting for one of the other segmentation methods is recommended.


Instance segmentation works to identify and label objects, similarly to semantic, but it has the capability to specific classifications within datasets on a deeper, more individual level. Which is useful for certain industries and the CV-driven applications that serve them, like autonomous vehicles or medical imaging applications, when a model should be trained to pick up on the smallest things or objects.


Panoptic can be considered the most complex type of segmentation method, as it unifies the functions that the other concepts distinctly offer, going a step further in labeling or annotation capability. Panoptic segmentation assigns labels to each pixel in an image and separates them according to semantic class while also differentiating instances. Surpassing instance segmentation's level of detail in label organization, panoptic assigns each pixel a unique label that corresponds to the instance, which helps to prevent overlapping when annotating.

Choosing a Segmentation Method

When choosing between the different types of image segmentation methods, it all comes down to the parameters of different projects and the level of detail required. According to the model development criteria, each segmentation type has a specific use that can be implemented to match and satisfy those requirements.

Semantic is best for basic datasets for the generic bulk of information that will serve as the foundation for training a model; instance goes a bit deeper into the particularities of dataset content, and last but not least, panoptic goes into even greater detail. Through thorough interpretation and a comprehensive annotation report of what each dataset contains in its entirety, from object detection to image backdrops or scenes.

Best Practices for Bounding Box Annotation

When thinking about manually labeling objects in image datasets, the first thing that probably comes to mind is bounding boxes. It's easy to see why they're popular, considering how simple and quick they are to use. However, they're also restrictive because of their natural limitation of forming either a rectangular or square shape to label objects.

Since models need to be trained on real-life situations and scenarios, bounding box annotation can only go so far in properly preparing them to interpret the objects in those scenarios, as irregular objects and shapes are more common in the real world than geometric ones.

Tight Annotation

Being as precise and tight with forming bounding boxes is critical for training accuracy. That means the edges of the box should be as close as possible to the targeted object. Being imprecise and inconsistent when labeling using bound boxes can cause issues and training gaps for the model through its prediction and ground truth data later on.

Angling or Rotation Capability

To suit the variety of objects that annotators will come across in image data, bound box labeling tools should accommodate for real-case scenarios when objects require a bound box to be rotated or angled to better fit around the item requiring annotation.

Reduce Overlap

Sometimes the way an image is laid out gives annotators no choice but to overlap annotations, especially in crowded or cluttered imagery. Annotation overlap is known to cause issues with models distinguishing different elements if they're excessive. In that case, when using bounding boxes, it's worthwhile to consider a secondary annotation method such as polygon to help with differentiation and reduce overlap instances.

Best Practices for Polygon Annotation

Polygonal annotations are essentially a variation of the bounding box technique. Polygon annotations utilize complex shapes and go beyond the typical bounding box rectangle or square to help track or annotate a target object's structure completely or more accurately.

Add and Subtract

A commonly known downside to polygon use in annotation is that, although they're typically more accurate than bounding boxes, they tend to capture multiple objects and result in overlapping. Having the ability to subtract and add polygon "branches" while annotating helps to prevent these overlapping occurrences. Using the union and subtraction feature available through the Superb AI Suite is demonstrated in the image below.

Best Practices for Polyline Annotation

Polyline annotation comes in handy whenever annotators are looking to label objects that are unconventional or don't conform to basic shapes in image datasets. They're highly versatile when used along with the other annotation methods because they fill in the gaps that a bounding box might leave due to its fixed shape as a labeling tool.

Stay Within the Lines

Revisiting the realistic observation that the world isn't made up of boxes and lines, making the exclusive use of bounding boxes and polylines impractical when annotating images, it's useful to have a third option at hand. Splines fulfill that third option - polylines that can be manipulated to outline markings that stray from a straight line or structure and are available through the Superb AI suite.

Best Practices for Keypoint Annotation

Keypoint annotation techniques are mainly known for enabling CV-powered applications to detect facial characteristics and motions. They do so by plotting out "dots" or "keypoints" to measure face dimensions and identify features that classify certain expressions and facial changes that way, which in turn, helps applications also recognize human emotions.

Focus on Structure

Keypoints are best applied when the shapes in an image dataset share the same structural qualities, such as a human face or body part. Concentrate on working within these shapes and use the keypoints as a skeletal outline around smaller "key" features inside a bigger shape you're trying to label, such as eyes or the outline of a mouth when labeling a face.

Mark Obscure Points

In some instances, it's difficult to annotate certain points when annotating a shape using keypoint; when those cases come up, ensure that even if it's difficult to mark those points or complete an annotation; an individual keypoint is still placed. That way, a model will still have some context to work from when predicting a point was placed there to complete the shape, regardless of visibility.

This is one of the features offered through Superb AI's suite and specific to keypoint annotation needs, which enables annotators to mark individual key points as visible or invisible rather than leaving the point unmarked and incomplete.

Precise and Accessible Image Annotation

Image annotation strategies often look different according to ML and CV model initiatives. Still, every team is working towards a common purpose, to achieve the most accurate AI model. They accomplish that purpose by being aware of the tools and resources available to them on the market that once were not.

Among those resources are image segmentation and labeling methods that are customized to varying modeling needs. Different projects call for different types of segmentation and more general or focused techniques relative to the level of detail needed when annotating certain portions of data, such as edge cases or ground truth datasets that are used as the basis for supervised learning algorithms.

Through the Superb AI Suite, users can delegate much of the basic data labeling tasks that are repetitive and time-consuming to automation and dedicate their efforts primarily to the most complex and meaningful edge case datasets that make the difference in a model's accuracy and overall performance.

Subscribe to our newsletter

Stay updated latest MLOps news and our product releases

About Superb AI

Superb AI is an enterprise-level training data platform that is reinventing the way ML teams manage and deliver training data within organizations. Launched in 2018, the Superb AI Suite provides a unique blend of automation, collaboration and plug-and-play modularity, helping teams drastically reduce the time it takes to prepare high quality training datasets. If you want to experience the transformation, sign up for free today.

Join The Ground Truth Community

The Ground Truth is a community newsletter featuring computer vision news, research, learning resources, MLOps, best practices, events, podcasts, and much more. Read The Ground Truth now.


Designed for Data-Centric Teams

We’ve built a platform for everyone involved in the journey from training to production - from data scientists and engineers to ML engineers, product leaders, labelers, and everyone in between. Get started today for free and see just how much faster you can go from ideation to precision models.