Using MTurk Workers for content creation

There’s an interesting new study about how Workers can be used to create content–video content–to help computers learn about our daily tasks. Basically, if computers can learn where we put our keys, they can help us find our keys. So a group of researchers recruited workers to film short videos of themselves doing boring things in order to create a dataset to help computers learn.

“Our Hollywood in Homes approach allows not only the labeling, but the data gathering process to be crowdsourced. In addition, Charades offers a novel large-scale dataset with diversity and relevance to the real world.”

There’s some interesting descriptions on how they got Workers to participate–they found that workers only started filming videos when the task price was set at $3, which was above their goal of $1. The article details some different methods they used to recruit and retain people including a bonus for the first submission and ‘refer a friend’ bonuses. Workers also got bonuses after every 15th video they created. From the abstract:

“Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,850 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 37,972 temporally localized intervals for 160 action classes and 24,623 labels for 40 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community.”


Sigurdsson, G. A., Varol, G., Wang, X., Farhadi, A., Laptev, I., & Gupta, A. (2016). Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding. arXiv preprint arXiv:1604.01753.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s