Recent Studies on MTurk Validity

MTurk is coming under a lot of scrutiny in academic circles, generally due to anecdotal data and false assumptions. Here is some recent research on MTurk validity that you might enjoy.


Every sample has its strengths and weaknesses. Today, any data collected must be scrutinized for satisficing, attention, and representativeness. MTurk, though, is consistently seen as a valid sample. MTurk has been used for data collection in thousands of studies since its inception a decade ago. Google Scholar lists more than 124,000 studies using MTurk, with about 15,000 of these studies published in 2015 and to date in 2016. Some of these studies compared MTurk to other sampling frames. The studies published through mid-2015 are summarized in Sheehan and Pittman’s (2016) book (see below for link): highlights are below.

  • A 2014 study compared MTurk samples to those from Survey Monkey and Qualtrics (which purchases panels from companies including Survey Spot International). The study’s results showed that that there was considerable similarity between many of the treatment effects obtained from MTurk and nationally representative population-based samples.
  • A 2015 study found that 95% of MTurk Workers passed an “Instructional Manipulation Check” (that is, asking whether the respondent read the instructions and having the respondent answer ‘yes’). At the same time, only 39% of a student pool answered yes. In a second study, the respondents were provided a question asking ‘which of these personality traits describes you’, followed by a list of characteristics with check boxes for respondents to answer. However, after the question was asked, the instruction to ‘check other’ and type in “I read the instructions’ was clearly indicated. About 96% of the MTurkers Workers passed this test, while only 26% of the student pool passed.
  • Another 2015 study compared MTurk Workers’ responses to attention check questions to those from other crowdsourcing sites. About 86% of MTurk Workers passed two attention check questions in the study. This passing rate was significantly higher than the rates from other sites: about 53% of the CBDR panelists and 40% of the MicroWorkers panelists passed both checks, while only about 25% of the CrowdFlower and 6% of the RapidWorkers panelists passed both checks.
  • A 2011 study found that extrinsic motivational factors (such as social motivations) have a strong effect on the amount of time a Worker spends on MTurk, and so perhaps also the number of academic surveys that they complete. For many workers, intrinsic motivation aspects are more important, especially the different facets of enjoyment based motivation like “task autonomy” and “skill variety”. Some Workers (about 9%) report that compensation is not important to them at all, choosing to participate in HITs to pass the time, and because they find the work at the site interesting or fun.



Below are portions of abstracts of studies published since the Sheehan and Pittman’s book went in press (e.g. since July of last year). Please note that several of these studies focus on tactics to make sure that data collected on MTurk is valid. We have placed the relevant findings in bold.

  • “In the present study, MTurk-based responses for a personality scale were found to be significantly less reliable than scores previously reported for a community sample. While score reliability was not affected by the length of the survey or the payment rates, the presence of an item asking respondents to affirm that they were attentive and honest was associated with more reliable responses.” -Rouse, S. V. (2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304-307.



  • “This study compares whether participants recruited through AMT give different responses than participants recruited through an online forum or recruited directly on a university campus. Moreover, we compare whether a study conducted within AMT results in different responses compared to a study for which participants are recruited through AMT but which is conducted using an external online questionnaire service. The results of this study show that there is a statistical difference between results obtained from participants recruited through AMT compared to the results from the participant recruited on campus or through online forums. We do, however, argue that this difference is so small that it has no practical consequence. There was no significant difference between running the study within AMT compared to running it with an online questionnaire service. There was no significant difference between results obtained directly from within AMT compared to results obtained in the campus and online forum condition. This may suggest that AMT is a viable and economical option for recruiting participants and for conducting studies as setting up and running a study with AMT generally requires less effort and time compared to other frequently used methods.” Bartneck, C., Duenser, A., Moltchanova, E., & Zawieska, K. (2015). Comparing the similarity of responses received from studies in amazon’s mechanical turk to studies conducted online and with direct recruitment. PloS one, 10(4), e0121595.


  • “Here, we compare the results of Amazon Mechanical Turk online surveys of refrigerators, freezers, televisions, and ceiling fans to the nationwide Residential Energy Consumption Survey (RECS) deployed by the US Energy Information Administration. To account for differences in demographic distributions between the online survey results and the general population, we weighted the results using standard cell weighting and raking techniques, as well as a combination of these, termed “hybrid.” The weighted results gave a distribution of product ownership that was reasonably close to RECS, albeit with small, statistically significant differences in some cases. The cell weighting method provided a slightly better agreement with RECS than the other two approaches. We recommend online surveys as an efficient and cost-effective way of gathering in-home use data on appliances that are not adequately covered by existing data sources.” Yang, H. C., Donovan, S. M., Young, S. J., Greenblatt, J. B., & Desroches, L. B. (2015). Assessment of household appliance surveys collected with Amazon Mechanical Turk. Energy Efficiency, 8(6), 1063-1075.


  • “The findings reveal very different performances between two types of strategies: those that “pull in” online users actively looking for paid work (MTurk workers and Craigslist users) and those that “push out” a recruiting ad to online users engaged in other, unrelated online activities (Google AdWords and Facebook). The pull-method recruits were more cost efficient and committed to the survey task, while the push-method recruits were more demographically diverse.” Antoun, C., Zhang, C., Conrad, F. G., & Schober, M. F. (2015). Comparisons of Online Recruitment Strategies for Convenience Samples Craigslist, Google AdWords, Facebook, and Amazon Mechanical Turk. Field Methods, 1525822X15603149.



  • “In this paper, we evaluate this claim by comparing a large MTurk sample to two benchmark national samples – one conducted online and one conducted face-to-face. We examine the personality and value-based motivations of political ideology across the three samples. All three samples produce substantively identical results with only minor variation in effect sizes. In short, liberals and conservatives in our MTurk sample closely mirror the psychological divisions of liberals and conservatives in the mass public, though MTurk liberals hold more characteristically liberal values and attitudes than liberals from representative samples. Overall, our results suggest that MTurk is a valid recruitment tool for psychological research on political ideology. “Clifford, S., Jewell, R. M., & Waggoner, P. D. (2015). Are samples drawn from Mechanical Turk valid for research on political ideology?. Research & Politics, 2(4), 2053168015622072.



  • A 2015 study reported on three different online studies where participants from MTurk and collegiate populations participated in a task that included a measure of attentiveness to instructions (an instructional manipulation check: IMC). “In all studies, MTurkers were more attentive to the instructions than were college students, even on novel IMCs (Studies 2 and 3), and MTurkers showed larger effects in response to a minute text manipulation. These results have implications for the sustainable use of MTurk samples for social science research and for the conclusions drawn from research with MTurk and college subject pool samples.” Hauser, D. J., & Schwarz, N. (2015). Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior research methods, 1-8.

Our new book on MTurk is available from Amazon!

  • Sheehan, Kim and Matthew Pittman (2016). The Academic’s Guide to Using Amazon’s Mechanical Turk: The HIT Handbook for Social Science Research. Irving: Melvin & Leigh.

1 thought on “Recent Studies on MTurk Validity

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s