How to Write Better Annotation Guidelines for Human Labelers: 4 Top Tips

Abimbola Oshodi

Abimbola Oshodi

Project Manager, Data Services · 2022/7/26 · 4 min read

It goes without saying, that the quality of data used to train ML models is a primary determinant for how well they’ll ultimately perform, namely, how accurate they will or won’t be in carrying out their intended function. Acknowledging that significant factor – it’s no surprise that in recent years, ML dev teams are recognizing the need to optimize data labeling processes.

When it comes to influencing the production of high-quality data, it’s heavily reliant on the data management practices machine learning and labeling teams choose to follow. More often than not, those practices require manual annotation or the direct involvement of human labelers.

Annotation Instruction Inefficiency

As data labeling is repetitive and precise work, the human element is just one of the concerns that may lead to devaluing datasets through inefficient labeling. There are also considerations like the number of labelers involved, whether their expertise aligns with the project focus, as well as clear and thorough guidelines for annotators.

Of all the ways that data labeling efforts can be compromised, the simplest solution to significantly lower that risk is by providing comprehensive instructions to educate and support labelers. Not unlike the goal of producing high-quality data that enables high-performing AI applications; the better the documentation for annotators to reference, the higher the percentage of accurate annotation.

The Importance of Well-Written Instructions

The greatest threats to annotation inconsistencies are notably attributed to human error. These errors can be narrowed down to either subjective interpretation of directions by each individual labeler, or unclear task specifications.

Knowing that, although it may seem obvious, well-written instructions that anticipate and address different interpretations by labelers will equip them with the information they need to perform at their best and reduce the chance of errors that could be easily prevented.

In a basic sense, any guidelines that will be utilized by human labelers should include the following:

• Concept descriptions of individual tasks.

• Information that can help both experienced and non-experienced labelers understand a project’s particular use case.

• Specific labeling details for different dataset types and groupings.

As a starting point, a well-constructed, instructive document for annotation should specify several things, which are listed below.

• The labels that are relevant to datasets along with descriptions of how and when to apply them.

• Clarification of edge cases to help combat misinterpretation.

• Any distinguishing remarks on labels that are difficult to differentiate or might be improperly used.

In addition to well-organized and formulated written instructions, visual aids are a helpful supplemental tool to expand on certain directives that would benefit from the use of illustration. For example, provide 1 or 2 visual examples of the correct way to label a person in an image, along with an example of incorrect methods.

It’s important to keep in mind that even with guidelines that are sufficiently laid out and structured, enough to feel satisfied with and provide to labelers, there will always be room for improvement and the need for revision. Supervising parties to data labeling teams should prepare to create improved iterations of guidelines based on team performance and any problem areas that become more apparent over time.

Common Mistakes We See

Despite the fact that creating perfect guidelines, especially the first time around is an unrealistic expectation, there are certainly well-known oversights and pitfalls that machine learning and labeling teams can look out for to cut down on revision. Being mindful of these inefficient commonalities will also be a big help in achieving a high-quality document that translates to producing high-performance data down the pipeline.

1. Ambiguous Instructions

Any instructional information meant to guide data labelers should be communicated in a straightforward and detailed manner. However, the guidance shouldn’t be drawn out to the point of being convoluted. Try to single out guidelines and keep them brief.

A labeler should be able to comprehend and act on basic lessons and concepts within or around 10 minutes, although this timeframe can vary and is very dependent on the difficulty level of each section. However, it should never take more than an hour to get through all segments of an instructional document. If labelers are regularly experiencing delays as they review guidelines or seem to have trouble grasping the material and following it, then it might be an indication that the instructions need to be rephrased and made more succinct.

2. Domain Knowledge Gaps

Assuming that annotators possess industry-specific knowledge and familiarity with the raw data they are handling is ill-advised. Making a habit of repping labelers, especially if they’re outsourced, is one approach ML teams can take to be proactive and better prepare instructions with that consideration in mind.

This tip is most relevant to in-house and outsourced data labeling assignments, but it can also apply to other options such as crowdsourcing. Take precautionary measures by incorporating industry and niche-specific know-how in any resource provided to labelers. It’s best to be safe and craft the messaging for an unfamiliar audience rather than a less likely scenario presuming readers are experts.

3. Standard and Non-Standard Cases

There are instances when organizing and labeling data that can be separated into two distinct categories, standard and non-standard cases. Instructions should plan for labelers to come across these outlier situations and be able to handle them accordingly.

When determining the differences between a standard and non-standard labeling case, try to provide case study examples that demonstrate how previous instances were handled and the precedent they set for handling similar instances in the future. Consider including a stipulation in guidelines that labelers should ask for further guidance if they ever come across unusual cases that are not addressed or acknowledged through the standard guidelines provided.

4. Irregular QC Review

Conducting routine QA reviews is a crucial measure for fulfilling and maintaining expectations for accurately labeled data, like meeting ground truth targeted criteria and data pool validity. When this phase of managing labeled data is neglected or disregarded, it will undoubtedly perpetuate inconsistent and flawed results that could very well lead to faulty models.

To save time, effort, and resources from a managerial perspective, QC and QA procedures should be continuously enforced and adhered to. With the availability of modern solutions that meet present and growing development needs, such as Superb AI’s range of QA tools, that are seamlessly integrated into a robust data training platform – ML and CV project managers can conduct audits in less time and more efficiently than ever.

Practical Steps

Like any other stage of the data prepping process for ML workflows, creating effective guidelines that are conducive to more productive and accurate data labeling efforts takes repetition and iterative betterment. There will always be the probability of imperfection affiliated with manual tagging, as labelers are only human at the end of the day.

However, achieving a higher-quality result is now more possible than before, largely because of the effort machine learning and labeling teams are willing to dedicate to creating fine-tuned and practical guidance on better data management on a collaborative level.