Empirical Methodology for Crowdsourcing Ground Truth

Dumitrache, Anca; Inel, Oana; Timmermans, Benjamin; Ortiz, Carlos; Sips, Robert-Jan; Aroyo, Lora; Welty, Chris

doi:10.3233/SW-200415

Computer Science > Human-Computer Interaction

arXiv:1809.08888 (cs)

[Submitted on 24 Sep 2018]

Title:Empirical Methodology for Crowdsourcing Ground Truth

Authors:Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, Lora Aroyo, Chris Welty

View PDF

Abstract:The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.

Comments:	in publication at the Semantic Web Journal
Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	arXiv:1809.08888 [cs.HC]
	(or arXiv:1809.08888v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.1809.08888
Related DOI:	https://doi.org/10.3233/SW-200415

Submission history

From: Anca Dumitrache [view email]
[v1] Mon, 24 Sep 2018 13:04:56 UTC (2,072 KB)

Computer Science > Human-Computer Interaction

Title:Empirical Methodology for Crowdsourcing Ground Truth

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Empirical Methodology for Crowdsourcing Ground Truth

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators