I recently got my very first introduction to Active Learning, through reading Active Learning for Visual Object Recognition [Feund, et al]. At its core, this paper talks about Active Learning as a way to aid the tedious process of labeling objects within an image (for visual recognition purpose). Specifically, the paper talks about pedestrian detection where the input dataset comes from video-stream recorded through camera attached on a car being driven. Input images are then extracted from the video-stream's frames.
The labeling part is clever. Instead of manually specifying all the regions where the pedestrians are located, the idea is to use a classifier, which at the same time is being trained, to help with the process. Here's the high-level idea:
1. Split the input images into a number of sets.
1. Pick one of the sets and manually label the regions with pedestrians.
2. Train a pedestrian-classifier using the labeled set.
3. Pick a different set of unlabeled input images, then label them using the previously trained classifier.
4. Re-train the classifier with all the sets of labeled images.
5. Repeat step (3) and (4) until all sets are labeled.
The key idea here is that manually specifying a region (e.g. using a mouse to carefully draw a bounding box) of pedestrian is significantly more time consuming (~20 seconds) than marking whether a region specified by the classifier is a true positive or a false negative (~3 seconds). And as the classifier undergoes more training, its accuracy also increases, which in turn reduces the number of false negatives, and ultimately speeds up the labeling process. Interesting, isn't it? :)
No comments:
Post a Comment