Class AI — A tool to improve the efficiency of data labeling processes

William Law
6 min readDec 31, 2019

With more and more discoveries made in AI each day, the applications of AI have been expanding. Take the example of AI Dungen 2 that uses Open AI’s GPT-2 (NLP model), an open-source text adventure game that generates effectively limitless open-ended storylines.

Or take the example of self-driving cars. The research being done by companies like Uber and trying to predict the intent of drivers through signal cues, or creating more efficient algorithms to fuse the data from different sensors to provide a more robust/detailed input to the computer.

In both these cases, the machine learning model needs to be able to learn from some data (which is why I excluded the breakthroughs in reinforcement learning which were mostly unsupervised). The performance of the model depends on the quality of data that you give it and kinda follows this sort of formula:

The higher quality the data = the higher accuracy/performance from the model

However, the time and resources that it takes to clean and process the data take up too much time for a lot of companies. It’s not surprising for companies to outsource this work to other companies, which is why startups like Scale AI, LabelBox, Playment exist to solve this problem: providing high-quality training data for AI purposes.

For these data labeling companies, a combination of automation, machine learning, and human review is done to ensure accurate ground truth data is provided. The amount of contractors for the human review process varies based on the company. “For example, Scale AI has about 30,000 contractors aiding in the labeling process.

Depending on the rate, companies spend sizeable amounts for this contracting process, which is why I built Class AI: aiming to increase the quality of images to improve object detection. Class AI serves as a tool to help data labelers reduce the amount of labeling that needs to be done for different ranges in the quality of data.

Class AI was built using Super Generative Adversarial Networks (SRGANs).

The model that I used was from TensorLayer. It took ~10 hrs to train on a GTX 1060 6GB for 50 epochs and 500 steps per epoch with a batch size of 2.



William Law

swe // trading — prev: @MLHacks, eng @ early-stage startups | Twitter @wlaw_