MTurk is offering a new premium service which will allow requesters to screen out people without having to do a screener test. For example, now they will provide your HIT to people based on their primary mobile device, their political affiliation, or their marital and parenthood status. There is, of course, a price and it isn’t clear whether this gets passed on to the workers (my guess is that it doesn’t). The cost is .40 per assignment for political affiliation and .50 for the rest. Which is odd, because getting Republicans is really hard and these days they could probably charge 3x that for republicans.
I really don’t think this will save a lot of money for Requesters, especially if it is a quick survey that you might pay .50 for (a 3-4 minute one) and now you’d have to pay $1.00 (but again, probably not going to the Requester). You know, I’d rather they spend their time figuring out a way to pay people a tiny amount to do the screener and then pay people the full amount once they’ve passed the screener.
Have you used this? What do you think?
That’s the conclusion from a recent study:
“These results suggest that AMT workers use the pricing of individual tasks as a signal in order to assess the difficulty of tasks when choosing among HIT groups,
particularly low prices in order to assess easiness.
As a consequence, workers might be collectively driven towards high
volume but low price tasks since they would select HIT Groups in terms of their expected
wages, and since low price would signal easiness, in a sense “like” a more detailed
description. This interpretation could explain why we do not find any elasticity of work
supply in our dataset with respect to the pricing of individual tasks, even when this pricing
changes for the same task.”
I have to admit, though, much of this paper was way over my head.
I’m kind of a sucker for satisficing studies and a new one has some interesting findings.
“We administered surveys to university students and respondents—half of whom held college degrees—from a for-pay survey website, and we used an experimental method to randomly assign the participants to survey formats, which presumably differed in task difficulty. Based on satisficing theory, we predicted that ability, motivation, and task difficulty would predict satisficing behavior and that satisficing would artificially inflate internal consistency reliability and both convergent and discriminant validity correlations. Indeed, results indicated effects for task difficulty and motivation in predicting survey satisficing, and satisficing in the first part of the study was associated with improved internal consistency reliability and convergent validity but also worse discriminant validity in the second part of the study. ”
In “Education and Psychological Measurement”: http://epm.sagepub.com/content/early/2016/01/22/0013164415627349.abstract
That’s part of the very clever title of a new paper, “Turking Overtime: How Participant Characteristics and Behavior Vary Over Time and Day on Amazon Mechanical Turk” available here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2836946.
“We find no time or day differences in behavior, but do find that participants at nights and on weekends are less experienced with online studies; on weekends are less reflective and worse at comprehension questions; and at night are less conscientious and more neurotic. These results are largely robust to finer grained measures of time and day. We also find that those who participated early on in the course of the study are more experienced, comprehending, reflective, and agreeable, but less charitable, than later participants.”
Interesting stuff–especially if you want to get more ‘naive’ workers. Nights might be better.
This piece uses Marxist theory to examine aspects of alienation on MTurk. It cites the work of Lily Irani quite a bit, and is interesting, if you like that Marxist stuff.
Karin Hansson,Tanja Aitamurto, Thomas Ludwig, Michael Muller (2016): From alienation to relation: Modes of production in crowd work.
In Karin Hansson, Tanja Aitamurto, Thomas Ludwig, Michael Muller
(Eds.), International Reports on Socio-Informatics (IRSI), Proceedings of the CSCW 2016
– Workshop: Toward a Typology of Participation in Crowdwork (Vol.13,Iss.1, pp.13-22)
Yes this is a thing.
“At its simplest, Mmm Turkey is a tool for easily develop-ing and managing external HITs on Amazon MechanicalTurk. Though are other tookits have been created that offer some similar services,MmmTurkey stands out as the first open source framework we are aware of providing auditors: a unique and powerful feature. Rather than just collect responses to tasks, these auditors can record a worker’s interactions on the front-end, providing researchers and task designers of wealth of new data to study and better under-stand worker behaviors in task execution.
Mmm Turkey, with its modular architecture and core auditor feature, enables its
users to collect comprehensive data without the trouble of
having to code a HIT that has already been created before.”
That’s the conclusion of a new study:
“In this study, we conduct an initial investigation on the effect of crowd type and task complexity on work quality by crowdsourcing a simple and more complex version of a data extraction task to paid and unpaid crowds. We then measure the quality of the results in terms of its similarity to a gold standard data set. Our experiments show that the unpaid crowd produces results of high quality regardless of the type of task while the paid crowd yields better results in simple tasks. We intend to extend our work to integrate existing quality control mechanisms and perform more experiments with more varied crowd members.”
Borromeo, Ria Mae, Thomas Laurent, and Motomichi Toyama. “The Influence of Crowd Type and Task Complexity on Crowdsourced Work Quality.” In Proceedings of the 20th International Database Engineering & Applications Symposium, pp. 70-76. ACM, 2016.
This new study is really interesting and multifaceted. They used MTurk, and found ways to identify how some individuals perform better than experts as well. HEre is the abstract:
“We analyze how 208 experts forecast the results of 15 treatments involving monetary and non-monetary motivators in a real-effort task. Wecompare these forecasts to those made by PhD students and non-experts: undergraduates,MBAs, and an online sample. We document seven main results. First, the average forecastof experts predicts quite well the experimental results. Second, there is a strong wisdom-of-crowds effect: the average forecast outperforms 96 percent of individual forecasts. Third,correlates of expertise–citations, academic rank,field, and contextual experience—do not
improve forecasting accuracy. Fourth, experts as a group do better than non-experts, but
not if accuracy is defined as rank ordering treatments. Fifth, measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent, especially for non-
experts. Sixth, using these measures we identify ‘superforecasters’ among the non-experts who outperform the experts out of sample.
Predicting Experimental Results: Who Knows What?
UC Berkeley and NBER
U Chicago and NB