Turkers helped with this study.
Remember a while back when Requesters got all excited because Amazon was having Workers do (unpaid) qualification tests with the idea we’d be able to select Respondents based on age or gender?
Well, buttercups, The Blog “Tips for Requesters on Mechanical Turk” posted that Amazon was planning to charge Requesters for these Qualifications! The rumor is that they will charge 0.50 per worker if you limit by age, gender, education or employment. .50 per qualification. So if you want to do a survey of women college graduates 25+ who are employed full time, that will cost you $2 in addition to what you’re paying to the Worker.
And I’m guessing (although I don’t know) that Susie Smith who graduated from the University of Smithville who works for Susie Corporation won’t see a penny. I hope I’m wrong.
Argh. I know Workers hate screeners but honestly, do a short screener, give someone .03 if they get screened out, and then have the rest do the HIT. WHAT ARE THEY THINKING?
This might be a bit of a rant. It is about excuses we use for paying substandard wages on MTurk.
The song and dance seems to be that academics pay the ‘market rate’ for MTurk work. But what is the market? If it is Amazon–well, they recommend somewhere in the $6/hour range (not minimum wage). Other academics, like the one I wrote about yesterday, said that the market is what other people on MTurk are paying. That is an excuse to pay $3=5 hours.
But. If we (as in we academics) are the market, then we can IMPROVE the market and not just reinforce existing poor payment practices. I don’t buy that paying someone more will result in poorer work–there’s just nothing in the literature to back that up. I also think paying more will bring MORE people into MTurk and begin to address the ‘non-naive’ concern.
I think it is also important to recognize that a Turker isn’t making money from the minute he or she logs on til he or she logs off. It takes time to find HITs, to do qualification tests, to get through screeners, and all those are done for free. Then you add in the wacky way some people set up surveys and a Worker has to wait for a screen to load and that takes time too. So all these things need to be considered when thinking about wages.
I was thinking the other night that Amazon should pay Workers a nominal fee from when they log in to when they log off–not a lot but maybe .50 per hour—to cover the unpaid work. There would have to be ways to make sure people actually WERE working and not gaming the system, but that’s one way to start to improve the worker experience. Baby steps.
Rochelle tweeted about this study and it is just another one of those that doesn’t really get MTurk. It promoting the ‘fast and cheap’ story while suggesting that paying 80 cents for a 10 minute study is ok (it isn’t). Some other things I want to pick on:
Quote 1: “Specifically, if participants are all paid to take surveys, what incentivizes them to answer accurately and how do we know that they answer accurately? For example, Oppenheimer, Meyvis, & Davidenko (2009) found that online survey participants are often less attentive than those watched by experimenters in a lab, meaning they may pay less attention to the treatment and bias the experiment. Especially given that the amount of payment is small and fixed, the only way Turkers can increase payment per hour worked is working faster.”
My response: it is basically impossible to do a random survey any longer. I would suggest that the great majority of all research results in someone being paid–either with money, or with ‘points’ toward gift cards, or with entries into a sweepstakes. In fact, I got an email just yesterday asking me to take part in a survey for a publisher to help with course materials and I thought first thing “what am I being paid?” (answer: nothing). So compensation is getting to be a requirement, in my opinion.
At the same time, this author seemed to confound that being paid suggests you won’t answer accurately. They conflate fast with accurate. There are many studies that show that Turkers are accurate. Just read this blog. And in the author’s favor, they did state that the rejection possibility is an incentive to answer correctly and take one’s time.
Quote 2: “How many participants fail catch trials?It depends on the difficulty of the catch trials. Rouse (2015) found that ~5% of his population did not pass checks, while Antin & Shaw (2012) found 5.6% of theirs. These numbers can vary widely — in an experiment I personally ran, I found 10-30% of people would fail comprehension checks. More importantly, survey completion rates and catch trial pass rates have equaled or exceeded that of other online survey samples or traditional college student samples (Paolacci, Chandler, & Ipeirotis, 2010; Berinsky, Huber, & Lenz, 2012). However, care must be taken to selecting catch trials that participants do not have prior exposure to (see Kahan, 2013).
My response: I did read the author’s study, and I think the questions he used for his comprehension checks were challenging and not that great. The literature on the ability of Turkers to answer catch trials is deep and convincing. Cherrypicking literature to show otherwise is problematic.
Quote 3: (regarding the size of the MTurk population): “Therefore, completing a 10,000 person study could take months or years, which could be a substantial concern given that these samples may be necessary for animal advocacy researchers attempting to detect small treatment effects.”
My response: Who needs 10,000 people? Some ‘rule of thumb’ data show that you can get by with 1000 for many people. And the number for experiments–which the author is talking about–should be smaller. 10,000 people in an experiment?
Quote:”Therefore, I recommend offering a wage of $3/hr-$5/hr, which appears close to the mean wage offered by most studies and is respectfully above the average wage on the platform. Notably, this does conflict with Casey, et. al. (2016) who state “minimum acceptable pay norms of $0.10 per minute” ($6/hr or 83% FMW), but this appears to be a statement based more on ethics of justice (which are certainly important and could prevail depending on your point of view) than data accuracy”
My response: I will always side with ethics over this person’s calculations, which lack validity.
Quote 4: “Avoid putting study-specific language in HIT titles or descriptions.”
My response: yes, because we want to trick people into doing our work. I’m being sarcastic.
I know I seem angry in this post, but I get so fed up with people writing this type of muck that taints MTurk studies for all of us.
The title of this post is NSFW but the content (labels of which are also NSFW) contain a terrific groups of articles that describe the challenges facing quantitative research today.
This newspaper report tells us that every stereotype about cat lovers and dog lovers are true, according to Facebook.
And it also gleefully reports it didn’t have to go to the bother of using MTurk but instead used “its in-house image-recognition technology — a computational neural network, trained on millions of images.”
Who trained that network, I wonder?
This isn’t a new article but one that is interesting and becoming very important–the notion of the ‘invisible workforce’. Once a workforce is invisible, the reasons to treat workers well become invisible too. And that has continued to be a problem with Mturk in the two years since this article was written.
What I’m not sure about is whether women suffer more than men.
“Female mechanical turkers meet their parallel in the female computers before them. Before the word “computer” came to describe a machine, it was a job title. David Skinner wrote in The New Atlantis, “computing was thought of as women’s work and computers were assumed to be female.” Female mathematicians embraced computing jobs as an alternative to teaching, and they were often hired in place of men because they commanded a fraction of the wages of a man with a similar education.”
The article suggests that 70% of workers are female, and that was the stimulus behind this. The Pew Center research, published a few weeks ago, found the gender balance was more even. Do men resent being invisible more than women? Might that change the power dynamics?
This Bloomberg article explicates that Amazon and Google are leaders in research and development. We’ll just stop there and let you think abut that for a while.
A new study looks at MTurk for biomedical research.
There were two interesting points. First, the study said that crowdsourcing shouldn’t be used to replace experts, but rather to extend and reinforce the ‘gold standard’. That is a bit different from other studies that show that Turkers do just as well as experts, but then this study is looking at biomedicine so that could explain that difference.
The other point is that there are institutional issues in that federal funding will not fund projects using MTurk since it is seen as a consumer survey and not a research tool. This is the first I’ve heard of that–have any of my readers heard of this?
Citation: Title: Crowdsourcing in biomedicine: challenges and opportunities
Author: Khare, Ritu
A new study does exactly this. The panel is purchased through Qualtrics, which doesn’t really tell us where the panel came from as Qualtrics uses different companies to collect data. The study also didn’t put a ‘US only’ qualification on MTurk.
“The USA Regular panel members took more time to complete their responses than either of the two MTurk samples. This suggests that the MTurk respondents did not read the questions as thoroughly and were, in fact, speeding—potentially yielding lower quality data.” Or that the Qualtrics people were slow. And not used to answering surveys.
” Responses to all attention filter questions differed significantly (p < .001) across the three sample groups (editorial note: the three groups are Qualtrics, MTurk US, MTurk non-us) indicating differences in attentiveness to survey instructions (with implications for data quality). The response pattern is similar for the two USA respondent groups, but the non-USA MTurk group deviated greatly. Moreover, the non-USA MTurk sample had the least percent of correct responses for all questions indicating that this group of respondents paid the least amount of attention to the questions (and thus would be expected to furnish the lowest quality of data).”
“Our results suggest that MTurk samples may be dominated by non-USA respondents, which may result in different sample characteristics, response patterns and data quality. This in turn can impact the substantive results and conclusions drawn from the research. Trading off cost (MTurk has an ease and cost advantage), the research must make an informed choice of an Internet online sample source.”
The article does suggest that screening questions can be put in, although these are difficult on MTurk (what?). All in all, this study seems to start with a hatred of MTurk and builds from there.
Citation: Smith S, Roster C, Golden L, Albaum G. A multi-group analysis of online survey respondent data quality: Comparing a regular USA consumer panel to MTurk samples. Journal Of Business Research [serial online]. August 2016;69(8):3139-3148. Available from: Business Source Complete, Ipswich, MA. Accessed August 9, 2016.