Turkers versus algorithms! c

A new study shows Turkers’ predictions of recidivism are pretty much as accurate as predictions by an algorithm. Read it here!

It must be noted that this study paid workers $1 and workers could receive a $5 bonus for accuracy.  The rest of this is completely out of my wheelhouse, but interesting nonetheless.


The study is by Julia Dressel and Hany Farid and can be found in: Science Advances  17 Jan 2018: Vol. 4, no. 1, eaao5580 DOI: 10.1126/sciadv.aao5580



New population estimates

A new study uses some interesting statistical techniques developed from ecology to estimate that number of workers on MTurk.  The big finding–from 40,000 unique worker– is that at any given time, there are 2,450 people on the platform available for work. This is in contrast to an earlier study (Stewart et al) that found that the average lab can reach only about 7500 people  for a given study (corrected with thanks to an author from that study).


Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics
and Dynamics of Mechanical Turk Workers. In Proceedings of WSDM 2018:
The Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, February 5–9, 2018 (WSDM 2018), 9 pages.

Stewart et al study:

Neil Stewart, Christoph Ungemach, Adam J. L. Harris, Daniel M. Bartels, Ben R.
Newell, Gabriele Paolacci, and Jesse Chandler. 2015. The average laboratory
samples a population of 7,300 Amazon Mechanical Turk workers.
Judgment and Decision Making 10, 5 (2015), 479–491

Are Turkers healthy?

One new study says no. Well, it says they say they aren’t as healthy as others.

“Demographic, socioeconomic, and health status variables in an adult MTurk sample collected in 2016 (n=1916), the 2015 MEPS household survey component (n=21,210), and the 2015 BRFSS (n=283,502).


Our findings indicate statistically significant differences in the demographic, socioeconomic, and self-perceived health status tabulations in the MTurk sample relative to the unweighted and weighted MEPS and BRFSS. The MTurk sample is more likely to be female (65.8% in MTurk, 50.9% in MEPS, 50.2% in BRFSS), white (80.1% in MTurk, 76.9% in MEPS, and 73.9% in BRFSS), non-Hispanic (91.1%, 82.4%, and 81.4%, respectively), younger, and less likely to report excellent health status (6.8% in MTurk, 28.3% in MEPS, and 20.2% in BRFSS).”



Our findings indicate statistically significant differences in the demographic, socioeconomic, and self-perceived health status tabulations in the MTurk sample relative to the unweighted and weighted MEPS and BRFSS. The MTurk sample is more likely to be female (65.8% in MTurk, 50.9% in MEPS, 50.2% in BRFSS), white (80.1% in MTurk, 76.9% in MEPS, and 73.9% in BRFSS), non-Hispanic (91.1%, 82.4%, and 81.4%, respectively), younger, and less likely to report excellent health status (6.8% in MTurk, 28.3% in MEPS, and 20.2% in BRFSS).


The study concludes that researchers should be hesitant in using MTurk for health research.   I think this study is overrepresenting females on  MTurk, but perhaps women were more interested in an MTurk study than men were. The brief report available here does not indicate whether the study was limited to people in the US.  I’m also unclear on whether self-reports for healthiness are valid, but that might be just me.



Title: Self-reported Health Status Differs for Amazon’s Mechanical Turk Respondents Compared With Nationally Representative Surveys
Author: Karoline Mortensen, Manuel Alcalá, Michael French, et al
Publication: Medical Care
Publisher: Wolters Kluwer Health, Inc.
Date: Dec 21, 2017

The Downsides of Flexibility in Crowdwork

A new study looks at flexibility in crowdwork on platforms like MTurk and how such flexibility supports and inhibits workers.  The full text is availlable here.

Researcher at Oxford points out the biggest issue with MTurk:

“One finding is that workers’ ability to choose their hours of work is limited by structural constraints: the availability of work on the platform at any given moment, and the degree to which the worker depends on that work for their living. The severity of these constraints varied significantly between the three platforms. Mechanical Turk was formally the freest platform, but its design resulted in intense competition between workers. This made it necessary for workers who depended on MTurk income to remain constantly on call, lest someone else grab the decently-paying tasks.”

The paper talks about the support systems among workers that keep people engaged and motivated. Community in crowdwork is so important. Why doesn’t Amazon get that??

Lehdonvirta, V. (2018). Flexibility in the Gig Economy: Managing Time on Three Online Piecework Platforms. New Technology, Work & Employment (forthcoming).



“Ten years” of crowdsourcing

This new article (full text available for free!) gives an overview of ‘ten years’ of crowdsourcing—the article begins with Howe’s quote from 2006 so it is really 12 years and MTurk launched to the public in 2005 so OK, ten years more or less. The article is a nice overview of crowdsourcing and its benefits, tapping into a bunch of different studies on crowdsourcing accessed by the authors. It includes an overview of benefits and concerns, and is a basic and straightforward albeit cursory analysis of crowdsourcing.

The paragraph on ethics says “As crowdsourcing is a nascent field, there is no Review Ethics Board (REB) or Institutional Review Board (IRB) process specific to it, to the author’s knowledge, despite it being quite different from other methodologies.” This was published in a UK journal but it is important to note that many IRBs in the US are providing very specific information on addressing MTurk.

This very important paragraph is sort of buried, so let me highlight it here:

Finally, some authors reviewed gave tips for using crowdsourcing in research. Most importantly, selecting a clear and appropriate research question was emphasised. Having a big challenge, and clear, measurable goals that are communicated to participants was seen as important as this helps motivate the participants, along with as providing options regarding levels and modes of participation. Finally, the importance of acknowledging participation was highlighted

Citation:Wazny, K. (2017). “Crowdsourcing” ten years in: A review. Journal of Global Health7(2)..



MTurk: Life in the Iron Mills

Some professors from UT Dallas, seeking a way to bring the book “Life in the Iron Mills” to life. “Life in the Iron Mills” is a novella about unregulated work in the 18th century where the protagonist creates a work of art out of the waste from the mill to symbolize the mindlessness of industrial labor.

Here’s where MTurk comes in:

“Burrough and Starnaman  (the professors) …offered Mechanical Turk workers an unusual, self-reflective task. “Each month, we ask nine workers how this virtual platform affects their bodies,” Burrough said. “They respond, and trace and measure their hands for us. The hands are laser-cut from cardboard or wood and the sentiments are embroidered on those, or if the written response is longer, it is shined through a light box.” As a socially engaged artist trying to highlight the workers’ experience, Burrough tries to remove her own input as much as possible. “I’m depicting what the workers send to me,” Burrough said. “I am trying not to speak for them — I’m a conduit for their sentiments.” Since the statue in Life in the Iron Mills is constructed from a byproduct, the cardboard hands used for “The Laboring Self” come from a modern equivalent — recycled packing boxes. “All this stuff gets shipped with so much packaging, and you remove one little thing that you ordered,” Burrough said. “These donated boxes allow us to take the workers’ voices and put it on the byproduct of our era.”

Read the whole story here.



Recruiting very specific populations on MTurk

This article is from 2016 and discussed how to recruit military veterans using MTurk.  What I like a lot about this article is it gives some specific guidelines on how to ask screening questions that can truly work to get a specific population.

Too many studies would use questions like “are you a military veteran?” to screen and—yeah. People often state they are something they are not in order to do a HIT.  This study on recruiting veterans used these screening questions:

  • What is the acronym for the locations where final physicals are taken prior to shipping off for basic training? (four letters)
  • What is the acronym for the generic term the military uses for various job fields? (three letters)
  • Please put these officer ranks in order: (participants were given visual insignia to rank order).
  • Please put these enlisted ranks in order: (contextualized branch-specific question; participants were given visual insignia to rank order)
  • In which state is your basic training base located? (contextualized branch-specific question)

I’m not a military veteran, and if I had tried to answer some of these questions I would clearly get them wrong. I guess I could try googling the answers but that would take time away from doing other week, so I’m pretty sure I would just move on to another HIT.

This article is a great lesson in how to make sure you get the people you want to get in your studies.

MTurkers and motivation

A new study used Turkers to test ideas about whether ‘early’ or ‘late’ rewards had different effects on motivations. It is a series of five experiments about rewards; four use Turkers. The bottom line is that early rewards are more motivating than late rewards, and higher rewards are more motivating than lower rewards.

A few thoughts on this study:

-The rewards were paid out (usually) as bonuses. Given what I’ve heard, some (many) workers are highly skeptical that they’ll actually get the bonus they are promised. Additional research needs to examine if skepticism is a moderator on motivation.

-These studies were low-paid to begin with; I have to think that someone who takes on these tasks has a different perspective than others. For example, one task had people reading five pages of a book and then answering questions. It took AT LEAST 5 minutes, probably more like 7 or 8, and paid .25. That’s just wrong, people. And it is going to have an influence on motivation.

Citation (apparently it isn’t published in a journal):

Woolley, K., & Fishbach, A. (2017). It’s About Time: Earlier Rewards Increase Intrinsic Motivation.



Turkers vs. Citizen Scientists

Who does better at interpreting data about climate change: MTurk workers or citizen scientists? My gut reaction is that it would be citizen scientists (that is, unpaid volunteers with an interest in science). This new study looks to see if I am correct.

The researchers selected 600 Tweets about climate change and asked these two groups to evaluate the tweets regarding the attitudes expressed in the Tweet (that is, negative to positive) and to classify the Tweet into broader themes (ten themes provided, respondents could pick up to three). The responses of the 127 Citizen Scientists and 574 Turkers were compared. The results?

“Despite the fact that only the best quality Amazon MTurk workers were selected (HIT approval rating ≥ 95%), the performance of the paid workers was still inferior to the performance of volunteers. Consequently, we found that for a particular task, the same accuracy level can be achieved with 12 paid workers as with only four volunteers. The associated cost increase may be prohibitive for many scientific projects, which makes volunteer crowdsourcing an attractive alternative. The downside of volunteer crowdsourcing is that it requires a much longer time to complete the project. In our case, Amazon MTurk processing was completed in five days, with most of the time taken up with validation of the already processed data. The Citizen Scientist platform processing took one year; on average, ~600 tweets per month were processed. We also found that an interaction between the scientists and volunteers was required to keep the public interested in donating their time to the project.”

The researchers in this study used Workers from both India and the US, and reflected that perhaps this might have been a problem. Some other nuggets:

Regarding their choice of ‘best quality’ workers: “We, however, speculate that workers’ reputation might not be a very reliable indicator of their performance. Proliferation of the online rating system means that the workers have become highly motivated in the protection of their online reputation. In a handful of cases, we had to reject incomplete tasks; subsequently, we received complaints and threats to blacklist us as bad requesters. Given the time and effort required to follow up requests from unsatisfied workers and a low cost of individual tasks, there is a strong incentive to avoid a dispute and comply with workers’ requests, which thus artificially boosts the approval ratings of workers.”
This statement needs more context: why was the work rejected? Was it explained to the Workers?
Payment: “The workers participating in our study were on average earning ~$2/h, which is similar to average MTurk earnings. It is possible that a higher pay rate would return better quality results; however, Gillick and Liu hypothesized that lower compensation might attract the workers less interested in monetary rewards and hence spend more time per task. Having read the online discussion of the MTurk workers, we also noticed that they associate an unusually high pay rate with possible fraud and recommend abstaining from taking such HITs.”
The key words here are ‘unusually high’. Whatever ‘unusually high’ might be–it isn’t $2 per hour. Is minimum wage ‘unusually high’? Maybe—but in my experience (and the experience of others I’ve counseled on MTurk), it results in better work.
Citation: Kirilenko, Andrei P., Travis Desell, Hany Kim, and Svetlana Stepchenkova. “Crowdsourcing Analysis of Twitter Data on Climate Change: Paid Workers vs. Volunteers.” Sustainability 9, no. 11 (2017): 2019.

Yet Another MTurk Competitor: GEMS

There’s a new game in town (or will be soon): it’s called GEMS  and it is designed to address some of MTurk’s issues. The article identifies two key issues: what it calls the challenging process to even work on MTurk, and the ‘middleman’ fee issues (“Amazon stands in between those hiring and those working and charges high and opaque middleman fees”).

About those fees: “O’Reilly said that the platform will be completely market driven. This means that requesters are free to offer as much pay as they want, and it’s up to workers to accept the jobs or not.”

How will GEMS be different? “Users sign up with an Ethereum address and complete microtasks. In exchange for completing tasks, they are paid in Gems and their work is rated by trusted parties. The more highly rated work, meaning accurate work, the user completes the higher their score will become. Higher scores will lead to access to more complex jobs, potentially higher pay, and the ability to become a task verifier and get paid for doing so.”

Gems is apparently a type of cryptocurrency, similar to a Bitcoin. The microwork platform is in some ways a vehicle to get more people easily engaged in the cryptocurrency marketplace. According to another story:
“The Gems Platform allows participants to earn GEM tokens by completing micro tasks, and anyone with a working internet connection can participate. By lowering the barriers to entering the crypto world — banks, wait times, association with an investment and perceived risk of loss — we want to make it easier for millions of users to participate.”

Here’s a White Paper on the whole system.

I’m just not sure how current MTurk workers will benefit from getting paid in Gems rather than cash but I guess we’ll see down the line. This system might seem to benefit MTurk Requesters (although I am not sure how the IRB would accept Gems as incentives) but I venture to guess it might just be too complex for most academic researchers to engage with.