All posts by profsheehan

Do Turkers’ health habits mirror the US population?

I am guessing that the answer, found in this new study is no. They studied 1000 workers (including a bunch of Masters workers) and compared to a national health study.

“Adjusting for covariates, MTurk users were less likely to be vaccinated for influenza, to smoke, to have asthma, to self-report being in excellent or very good health, to exercise, and have health insurance but over twice as likely to screen positive for depression relative to a national sample. Results were fairly consistent among different age groups.”

Hmmm. I think the Masters threw them off. (Kidding). But really, why so many Masters workers, when we don’t know how someone attains this status?

Citation: Walters K, Christakis DA, Wright DR (2018) Are Mechanical Turk worker samples representative of health status and health behaviors in the U.S.? PLoS ONE 13(6): e0198835.

UPDATE:Do Turkers do better on better paid HITS?

According to this study, apparently not. The study looked at how paying four different hourly rates ($2, $4, $6 and $8) affected things like attention as well as answers.

“Looking at demographics and using measures of attention, engagement and evaluation of the candidates, we find no effects of pay rates upon subject recruitment or participation. We conclude by discussing implications and ethical standards of pay.”


They do find some indication that lower paid workers do not as well on some attention checks. They also suggest that they didn’t have problems getting people to do the study, although each was capped at 99 people. Two things are important to note:

“Our larger concern is for things that we were not able to measure, such as Turker experience. It is possible that more experienced Turkers may gravitate toward higher pay rates, or studies that they feel have a higher pay-to-effort ratio. This is, regrettably, something that we were not able to measure. However, since experimental samples do not tend to seek representative samples on Mechanical Turk, we feel that the risk of any demographic or background differences in who we recruit is that it could then lead to differences in behavior, either through attention to the study or in reaction to the various elements of the study. ”

They also clearly state: “Paying a fair wage for work done does still involve ethical standards.”



Andersen, D., & Lau, R. (2018). Pay Rates and Subject Performance in Social Science Experiments Using Crowdsourced Online Samples. Journal of Experimental Political Science, 1-13. doi:10.1017/XPS.2018.7

Let’s pay Turkers 25 cents per minute!

So advocates Alexandra Samuel in this excellent article that describes MTurk as part of the ‘golden age’ of research.

And good for her, she writes:

“The use of crowdsourced survey platforms is likely to increase in the years ahead, so now is the time to entrench research practices that ensure fair wages for online survey respondents. Peer-reviewed journals, academic publishers, and universities can all play a part in promoting ethical treatment of online respondents, simply by requiring full disclosure of payment rate and task time allocation as part of any study that uses a crowdsourced workforce. We already expect academic researchers to disclose their sample size; we should also expect them to disclose whether their respondents earned a dollar for a five-minute survey, or a quarter for a half-hour survey.”

MTurk versus Google Surveys

This new study by Hulland and Miller compares MTurk to Google Surveys for survey research. Interestingly, the second author works for a commercial research company and the piece starts out by saying that the use of MTurk is fairly non-existent in the commercial world. Hmmm. Not sure about that. I’ve heard anecdotal evidence that large panel companies turn to the Turk when they can’t get enough responses in some categories.  So I start out reading this document with a bit of skepticism.

The authors review the good parts of MTurk (calling Turkers ‘agreeable’ which makes me smile) and then move on to the bad parts.  There’s non-representativeness, self-selection (a problem with panels as well), non-naivete, and participant misrepresentation (ie lying on answers).  The authors suggest this is most problematic when screening for specific populations. That may be true, but that may be more on the researcher writing the screener than the audience.

The authors then sing some praises of Google Surveys, including that people who complete surveys want to be there, garner high response rates, and self-selection isn’t a problem since respondents read an article of their preference and then are asked to answer questions in exchange. The maximum number of questions one can ask is ten, by the way, and Google Surveys uses an algorithm to deduce the demographics of the respondent.

The authors compare four samples (GS, MTurk, Burke research firm employees, and SSI), asking about mobile phone purchases. They conclude:

“For example, our results suggest that surveys about shopping behavior incidence rates should be placed neither with an Amazon audience nor with a convenience sample of relatively educated and affluent respondents (e.g., the Burke internal sample), whereas Google Surveys may prove adequate for providing reliable estimates of behavioral incidence. Yet use of MTurk may be completely suitable for studies regarding different types of attitudes or behaviors, or for research studying effect differences across experimental conditions. (Much of the existing work in Marketing making use of MTurk workers has been experimental.) ”

It’s an interesting study.

Citation: Hulland, J. & Miller, J. J. of the Acad. Mark. Sci. (2018).

Turkers <3 science

Sometimes I wonder (as a researcher) whether Turkers take my surveys and  think “oh that is a FASCINATING study and I would love to know more about it and I’m so glad I had the opportunity to help in the pursuit of knowledge.” Or some such.

And guess what! This study says they do!

“Our findings show that 40% of our participants on Mechanical Turk actively sought out post-experiment learning opportunities despite having already received their financial compensation. Participants expressed high interest in a range of research topics, including previous research and experimental design. Finally, we find that participants comprehend and accurately recall facts from post-experiment learning opportunities. Our findings suggest that Mechanical Turk can be a valuable platform for learning at scale and scientific outreach.”

Full study is available at the link above and the citation is:

Jun, Eunice, Morelle Arian, and Katharina Reinecke. “The Potential for Scientific Outreach and Learning in Mechanical Turk Experiments.” (2018).

Research Crowdfunding

This isn’t about MTurk per se, but rather about new platforms for crowdfunding academic research. Journalist’s Resource reports on a platform called “Experiment” where “each project has a page that makes a pitch for your support. But on Experiment, consumer gadgets are replaced with research questions about amphibians, cancer cell growth and mental health. (All proposals are subject to review and approval by the site’s staff, and work involving human or animal subjects must have support from an institutional review board.)”

And it work well, particularly for junior scientists, who are more likely to get funding.

The linked page has more links to a variety of studies if you are interested in learning more.

Shady HITs

A new study examines privacy protection behaviors on MTurk (extended abstract here).

It discusses an interesting challenge for AMT workers–what do you do if you are partway through a HIT and the requesters want information that you’re uncomfortable giving? Do you give it away or do you stop the HIT and risk losing the payment?

It also suggests that workers provide misinformation in some of these situations, which is a typical privacy protection behavior.


Recruiting hard to reach populations

A new study looks at the feasibility of recruiting hard-to-reach populations in the health segment to complete online studies. In this study, the hard-to-reach population is women who smoke during pregnancy, and they were recruiting in four ways: MTurk, Qualtrics, Soapbox Sample and Reddit. Here are the results in the recruitment:

“Amazon mTurk yielded the fewest completed responses (n=9), 100% (9/9) of which passed several quality metrics verifying pregnancy and smoking status. Qualtrics Panel yielded 14 completed responses, 86% (12/14) of which passed the quality screening. Soapbox Sample produced 107 completed surveys, 67% (72/107) of which were found to be quality responses. Advertising through Reddit produced the highest completion rate (n=178), but only 29.2% (52/178) of those surveys passed the quality metrics. We found significant differences in eligibility yield, quality yield, age, number of previous pregnancies, age of smoking initiation, current smokers, race, education, and income (P<.001).Although each platform successfully recruited pregnant smokers, results varied in quality, cost, and percentage of complete responses. Moving forward, investigators should pay careful attention to the percentage yield and cost of online recruitment platforms to maximize internal and external validity.”

I’m not surprised that  Turkers passed the quality metrics, but am surprised at how poorly the Reddit people did. And I’ve never heard of Soapbox Sample, but I’m off to investigate. Qualtrics did well, but of course probably cost 10x the cost of the MTurk population.

Citation: Ibarra JL, Agas JM, Lee M, Pan JL, Buttenheim AM. Comparison of Online Survey Recruitment Platforms for Hard-to-Reach Pregnant Smoking Populations: Feasibility Study. JMIR Res Protoc. 2018 Apr;7(4) e101. doi:10.2196/resprot.8071. PMID: 29661751.

New research buzzacronym: UIR

A UIR is an “unpaid internet resource”  and a new study examines the similarities between UIR (e.g. people who you find online who will do your survey for free) and AMT (e.g. people you find online at Mechanical Turk who will do your survey for $).  The results find these two groups differ in several psychological measurements.

The UIR was recruited through “free press releases, list serves, free websites recruiting participants for research studies (CraigsList), and through social media.” Measures included demographics, a 10-item measure of depression,  7-item measure for anxiety, 4 questions about mood, then had an intervention, then answered more questions.

For this, the UIR was paid nothing and Turkers were paid ten cents. Note to researchers: that is a ridiculous payment for that amount of work. Even if it only took 3 minutes, that is equivalent to $2 per hour. And that is not fair at all.

Anyway, there are a bunch of results:

“The AMT sample reported significantly lower depression and anxiety scores (p < .001 and p < .005, respectively) and significantly higher mood, motivation, and confidence (all p < .001) compared to the UIR sample. AMT participants spent significantly less time on the site (p < .05) and were more likely to complete follow-ups than the UIR sample (p < .05). Both samples reported a significant increase in their level of confidence and motivation from pre- to post-intervention. AMT participants showed a significant increase in perceived usefulness of the intervention (p < .0001), whereas the UIR sample did not (p = .1642).”

The researchers also talk about how BOTH methods are useful for this type of research (into interventions).

Citation:Eduardo Bunge, Haley M. Cook, Melissa Bond, Rachel E. Williamson, Monique Cano, Alinne Z. Barrera, Yan Leykin, Ricardo F. Muñoz, Comparing Amazon Mechanical Turk with unpaid internet resources in online clinical trials, Internet Interventions, Available online 15 April 2018, ISSN 2214-7829,


Turkers and Chatbots

This study uses MTurk workers to study different types of chatbots and it gives me pause.

Here’s why: the study uses a ‘virtual agent designed to assist customers’ named Emma.

I’m reading a terrific book in a graduate seminar I’m teaching called “Technically Wrong: Sexist Apps, Biased Algorithms and Other Threats of Toxic Tech” by Sara Wachter-Boettcher. In it, she talks about the biases that are reinforced by having women (even disembodied women) represent these virtual chatbots. It’s the type of bias that MTurk would be PERFECT to investigate.


Araujo, T. (2018). Living up to the chatbot hype: The influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Computers in Human Behavior.