Show simple item record

dc.contributor.authorZhiltsova, Alina
dc.contributor.authorCaton, Simon
dc.contributor.authorMulwa, Catherine
dc.date.accessioned2023-07-24T12:44:47Z
dc.date.available2023-07-24T12:44:47Z
dc.date.issued2019-12
dc.identifier.issn1613-0073
dc.identifier.urihttp://hdl.handle.net/20.500.13012/162
dc.description.abstractProceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science NUI Galway, Galway, Ireland, December 5-6th, 2019. Currently the demand for text analytics grows due to the fact that textual data is being created online in large amounts. A number of tools are available for various tasks related to natural language process- ing such as sentiment analysis and text pre-processing. The majority of these tools are trained using unrepresentative data, which may lead to unintended biases altering the results. Previous research indicates that sentiment analysis tools show gender and race biases, and word embed- dings discriminate against women. This research investigates previously undefined non-native speaker bias in sentiment analysis, i.e. unintended discrimination against English texts written by non-native speakers of English. Non-native speakers of English tend to use cognates, English words that have origin in the speaker’s language. To measure the non- native speaker bias in 4 lexicon-based sentiment analysis systems, a new Cognate Equity Evaluation Corpus was created, based on previous work in the literature for measuring racial and gender biases. The tools gave significantly different scores to English texts with features of non-native speakers. The bias discovered in lexicon-based tools was mitigated by updating 4 lexicons for English cognates in 3 languages. This paper pro- poses a generalisable framework for measuring and mitigating non-native speaker bias.
dc.language.isoenen_US
dc.relation.urlhttps://ceur-ws.org/Vol-2563/en_US
dc.subjectFairness in machine learningen_US
dc.subjectNatural language processingen_US
dc.subjectBias mitigationen_US
dc.subjectNon-native speaker biasen_US
dc.subjectSentiment analysisen_US
dc.titleMitigation of Unintended Biases against Non-Native English Texts in Sentiment Analysisen_US
dc.typeArticleen_US
dc.source.journaltitleCEUR Workshop Proceedingsen_US
dc.source.volume2563en_US
dc.source.beginpage317en_US
dc.source.endpage328en_US
html.description.abstractProceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science NUI Galway, Galway, Ireland, December 5-6th, 2019. Currently the demand for text analytics grows due to the fact that textual data is being created online in large amounts. A number of tools are available for various tasks related to natural language process- ing such as sentiment analysis and text pre-processing. The majority of these tools are trained using unrepresentative data, which may lead to unintended biases altering the results. Previous research indicates that sentiment analysis tools show gender and race biases, and word embed- dings discriminate against women. This research investigates previously undefined non-native speaker bias in sentiment analysis, i.e. unintended discrimination against English texts written by non-native speakers of English. Non-native speakers of English tend to use cognates, English words that have origin in the speaker’s language. To measure the non- native speaker bias in 4 lexicon-based sentiment analysis systems, a new Cognate Equity Evaluation Corpus was created, based on previous work in the literature for measuring racial and gender biases. The tools gave significantly different scores to English texts with features of non-native speakers. The bias discovered in lexicon-based tools was mitigated by updating 4 lexicons for English cognates in 3 languages. This paper pro- poses a generalisable framework for measuring and mitigating non-native speaker bias.en_US


Files in this item

Thumbnail
Name:
aics_30.pdf
Size:
503.5Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record