Loading...
Mitigation of Unintended Biases against Non-Native English Texts in Sentiment Analysis
Zhiltsova, Alina ; Caton, Simon ; Mulwa, Catherine
Zhiltsova, Alina
Caton, Simon
Mulwa, Catherine
Citations
Altmetric:
Abstract
Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science
NUI Galway, Galway, Ireland, December 5-6th, 2019.
Currently the demand for text analytics grows due to the fact
that textual data is being created online in large amounts. A number of
tools are available for various tasks related to natural language process-
ing such as sentiment analysis and text pre-processing. The majority of
these tools are trained using unrepresentative data, which may lead to
unintended biases altering the results. Previous research indicates that
sentiment analysis tools show gender and race biases, and word embed-
dings discriminate against women. This research investigates previously
undefined non-native speaker bias in sentiment analysis, i.e. unintended
discrimination against English texts written by non-native speakers of
English. Non-native speakers of English tend to use cognates, English
words that have origin in the speaker’s language. To measure the non-
native speaker bias in 4 lexicon-based sentiment analysis systems, a new
Cognate Equity Evaluation Corpus was created, based on previous work
in the literature for measuring racial and gender biases. The tools gave
significantly different scores to English texts with features of non-native
speakers. The bias discovered in lexicon-based tools was mitigated by
updating 4 lexicons for English cognates in 3 languages. This paper pro-
poses a generalisable framework for measuring and mitigating non-native
speaker bias.
Description
Publication Date
2019-12
Journal Title
Journal ISSN
Volume Title
Publisher
Collections
Research Projects
Organizational Units
Journal Issue
Keywords
Fairness in machine learning, Natural language processing, Bias mitigation, Non-native speaker bias, Sentiment analysis