MTurk results on privacy and security: as good as nationally representative samples


“In this paper, we compare the results of a survey about security and privacy knowledge, experiences, advice, and internet behavior distributed using MTurk (n=480), a nearly census-representative web-panel (n=428), and a probabilistic telephone sample (n=3,000) statistically weighted to be accurate within 2.7% of the true prevalence in the U.S. Surprisingly, we find that MTurk responses are slightly more representative of the U.S. population than are responses from the census-representative panel, except for users who hold no more than a high-school diploma or who are 50 years of age or older. Further, we find that statistical weighting of MTurk responses to balance demographics does not significantly improve generalizability. This leads us to hypothesize that differences between MTurkers and the general public are due not to demographics, but to differences in factors such as internet skill. ”

Read the whole paper here!


Redmiles, Elissa M., Sean Kross, Alisha Pradhan, and Michelle L. Mazurek. How Well Do My Results Generalize? Comparing Security and Privacy Survey Results from MTurk and Web Panels to the US. 2017.

More on defining Amazon as an ’employer’

This article  adds to the debate about whether crowdsource sites are employers or merely an agent that connects employers or employees. This is a key distinction in employment law that many countries and platforms are struggling with.

The author, a professor of Law at Oxford, states in reference to Uber:

“An increasing number of online resources provide insights into the reality of the relationship between the platform and its drivers: through its app, the platform has close control over the routes drivers are to choose and the prices customers will be charged for each ride. All financial transactions take place via the app, which also sits at the core of Uber’s rating system, enlisting customers to act as the platform’s agents in monitoring worker performance. Even the supposed freedom to work when and as desired is mostly illusionary: ratings are carried from engagement to engagement, and a refusal to accept a series of offers will soon have an impact on a drivers’ ratings.

In my mind, there is therefore little doubt that Uber should be classified as the employer of its drivers, who would therefore be guaranteed access to the core of fundamental worker rights in English law. Even customers will profit from such a decision: well-rested drivers will be much safer, and in the unhappy event of an accident or other problems, they too will be able to assert their claims for reparation against the employing platform.”

Looking at this from the perspective of Amazon as an employer versus agent: Amazon can have control over the work people can complete (by issuing blocks). All financial transactions take place through the platform, as do ratings of workers.  However, Amazon isn’t involved in pricing tasks (other than charging for special demographics), and Amazon doesn’t care about the amount of work the worker does.


MTurk and validity

This new study adds to existing literature on the validity of MTurk. This new sutdy examines the validity of the platform for spacial cuing research.

“Ultimately, the present study empirically validated the use of AMT to study the symbolic control of attention by successfully replicating four hallmark effects reported throughout the visual attention literature: the left/right advantage, cue type effect, cued axis effect, and cued endpoint effect.”

Academic article on alternatives to MTurk

This article is available here.

The upshot?

“After surveying several options, we empirically examined two such platforms, CrowdFlower (CF) and Prolific Academic (ProA). In two studies, we found that participants on both platforms were more naïve and less dishonest compared to MTurk participants. Across the three platforms, CF provided the best response rate, but CF participants failed more attention-check questions and did not reproduce known effects replicated on ProA and MTurk. Moreover, ProA participants produced data quality that was higher than CF’s and comparable to MTurk’s. ProA and CF participants were also much more diverse than participants from MTurk.”


New study analyzes tasks on MTurk

This new study analyzes tasks over a long time period.

I haven’t quite got my head around what they did, but the study has some interesting take-aways sprinkled throughout:

The most popular 10 sources account for 95% of the tasks performed on the marketplace. Furthermore, these 10 include many companies that we, the authors, have never
heard of.
The marketplace only supports a handful of workers on a full-time basis. A majority of the active workers appear to view the marketplace as a supplemental source of in-
come, as is indicated by their daily hours of activity

MTurk and Piecework: an analysis

This new article provides some historical context for analyzing crowdwork like that which occurs on MTurk. It looks at the use of piecework in manufacturing and draws some interesting implications.  Some of these key implications include:

  • Complexity: when something moves from a simple and familiar task (sewing) to one that is more constrained (sewing to the reqs of a manufacturer), masurement and verification will remain persistent challenges that will limit complexity unless solved.
  • Decomposition: at some point, specialized training for work will become the norm.
  • Workers: the decentralized nature of workers will continue to limit collective action.

This is a great foundational piece for anyone doing MTurk research.

Studying infants using MTurk

Yep, it can be done.

“We investigated whether the online platform, Amazon Mechanical Turk (MTurk), could be used as a resource to more easily recruit and measure the behavior of infant populations. Using a looking time paradigm, with users’ webcams we recorded how long infants aged 5 to 8 months attended while viewing children’s television programs.”

Here’s the procedure:

” After accepting the HIT, participants were directed to a webpage that provided an information sheet and asked for their informed consent. Consent was obtained via online checkbox and button press. This was followed by an evaluation of the suitability of participants’ computers, software, webcams, speakers, and internet connectivity. To do this, we recorded a brief 5-s video of caregivers and their infants and asked them to move and make sounds. This video was then played back to caregivers, and they were asked to check a box to indicate whether or not they were able to see and hear themselves. If they indicated they could not, they were thanked and excluded from participation. Otherwise, they were directed to a new webpage instructing them to position their infants on their laps in the center of the screen and in a well-lit room. This was specified to ensure that infants’ eyes were visible while recording the webcam videos. Once they were in a comfortable position, caregivers were instructed to press a “start” button to commence the experiment. Then 1 of 10 pseudorandomly selected movies was presented. Afterward, participants completed a short demographic questionnaire from which the ages and language backgrounds of their infants were obtained.”

Read it all here: