Anew study on linguistics and psychology by Tara McAllister Byun talks about how aggregating responses results in accurate models for linguistics research.
Modeling studies have shown that even when individual responses to a task are not highly accurate, aggregated or crowdsourced responses from a large number of people generally converge with those of experts. In this study, the researchers tested the validity of having AMT users rate speech sounds, compared with ratings collected from experienced listeners.
Listeners were asked to rate recordings of 100 words containing the “r” sound, collected from children with trouble pronouncing the sound and working to correct it in speech therapy. Twenty-five experienced listeners and 153 AMT listeners scored the “r” sounds as correct or incorrect. Data from experienced listeners were collected over a period of three months, while data gathering using AMT took a mere 23 hours.
The researchers found that when responses were aggregated, there was a very high level of overall agreement. When items were classified as correct or incorrect based on the majority vote across all listeners in a group, the AMT group and the experienced listener group were in agreement on all but seven of 100 items.
In a further analysis, the researchers sought to understand how many AMT listeners were needed to still get valid responses that converged with those of experienced listeners. They found that samples of nine or more AMT listeners demonstrate a level of performance consistent with typical expectations for experienced listeners.