User needs in language archives: Findings from interviews with language archive managers, depositors, and end-users (open access)

User needs in language archives: Findings from interviews with language archive managers, depositors, and end-users

This article is an exploratory study providing empirical data on language archive user needs and supports some anecdotal evidence of known issues facing language archive end-users, depositors, and managers in primarily academic contexts.
Date: April 2022
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Object Type: Article
System: The UNT Digital Library
Prenominal possessives in Yiddish: mayn khaver versus mayner a khaver (open access)

Prenominal possessives in Yiddish: mayn khaver versus mayner a khaver

Article provides a systematic comparison and detailed analysis of two prenominal possessive constructions in Yiddish, the familiar mayn khaver ‘my friend’ and the less well-known mayner a khaver ‘a friend of mine.’
Date: February 21, 2022
Creator: Roehrs, Dorian
Object Type: Article
System: The UNT Digital Library
Synthetic data for annotation and extraction of family history information from clinical text (open access)

Synthetic data for annotation and extraction of family history information from clinical text

This article investigates the use of synthetic data for the annotation and automated extraction of family history information relating to cases of cardiac disease from Norwegian clinical text. This work assesses the validity and applicability of the annotated synthetic corpus using machine learning techniques. The methodology outlined in this article may be useful in other situations where limited availability of clinical text hinders NLP tasks.
Date: July 14, 2021
Creator: Brekke, Pål H.; Kasicheyanula, Taraka; Pilán, Ildikó; Nytrø, Øystein & Øvrelid, Lilja
Object Type: Article
System: The UNT Digital Library
WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts (open access)

WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts

Article presents WikiPossessions, a new benchmark corpus for the task of temporally-oriented possession (TOP), or tracking objects as they change hands over time. In addition to the corpus, the authors release evaluation scripts and a baseline model for the task.
Date: May 2020
Creator: Blanco, Eduardo; Palmer, Alexis & Chinnappa, Dhivya
Object Type: Article
System: The UNT Digital Library
Challenges to Representing Personal Names and Language Names in Language Archives: Examples from Northeast India (open access)

Challenges to Representing Personal Names and Language Names in Language Archives: Examples from Northeast India

Article reviewing one particular challenge to data management relevant to South Asia, which is the complexity of names (of individuals, groups, and languages). It was presented at the 1st International Workshop on Digital Language Archives held on September 30-October 1, 2021 as part of the ACM/IEEE Joint Conference on Digital Libraries 2021.
Date: October 7, 2021
Creator: Burke, Mary & Chelliah, Shobhana Lakshmi
Object Type: Article
System: The UNT Digital Library

Leveraging Digital Library Infrastructure to Build a Language Archive

Presentation describing the ongoing CoRSAL (Computational Resource for South Asian Languages) project, including background on the UNT Digital Library infrastructure and metadata schema, specific fields that have presented issues or areas of discussion for language data records (language, creator/contributor, and relation), and final conclusions about the collaboration so far.
Date: September 30, 2021
Creator: Phillips, Mark Edward & Tarver, Hannah
Object Type: Presentation
System: The UNT Digital Library
What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity (open access)

What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity

Article describes how the authors present an analysis of eight measures used for quantifying morphological complexity of natural languages. The measures they study are corpus-based measures of morphological complexity with varying requirements for corpus annotation.
Date: September 22, 2022
Creator: Çöltekin, Çağrı & Rama, Taraka
Object Type: Article
System: The UNT Digital Library
It’s not a Non-Issue: Negation as a Source of Error in Machine Translation (open access)

It’s not a Non-Issue: Negation as a Source of Error in Machine Translation

Article investigates whether translating negation is an issue for modern MT systems using 17 translation directions as test bed and provides a linguistically motivated analysis that explains the majority of the findings. The authors release their annotations and code to replicate analysis here: https://github.com/mosharafhossain/negation-mt.
Date: November 2020
Creator: Hossain, Md Mosharaf; Blanco, Eduardo; Palmer, Alexis & Anastasopoulos, Antonios
Object Type: Article
System: The UNT Digital Library
Hierarchical Coding Scheme: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives (open access)

Hierarchical Coding Scheme: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives

This is the hierarchical coding scheme used for qualitative analysis of interviews with language archive managers, depositors, and end-users as part of the 'Exploring Methods and Techniques for Facilitating Access to Digital Language Archives' project (January 2019-August 2020).
Date: June 2020
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library
Neural classification of Norwegian radiology reports: using NLP to detect findings in CT-scans of children (open access)

Neural classification of Norwegian radiology reports: using NLP to detect findings in CT-scans of children

This article trained machine learning techniques to classify Norwegian radiology reports of pediatric CT examinations according to their description of abnormal findings. The developed models are robust with respect to different contexts, and may be used in quality assurance processes.
Date: March 4, 2021
Creator: Dahl, Fredrik A.; Rama, Taraka; Hurlen, Petter; Brekke, Pål H.; Husby, Haldor; Gundersen, Tore et al.
Object Type: Article
System: The UNT Digital Library
A test of Generalized Bayesian dating: A new linguistic dating method (open access)

A test of Generalized Bayesian dating: A new linguistic dating method

Article addressing if a new Bayesian framework can be introduced and ways to overcome subjectivity. The authors introduce a new method called Generalized Bayesian Dating (GBD) for inferring dates of language groups from lexical and phonological data. This work has implications for future performance testing in the area of linguistic dating.
Date: August 12, 2020
Creator: Kasicheyanula, Taraka & Søren Wichmann
Object Type: Article
System: The UNT Digital Library
Phrasal Proper Names in German and Norwegian (open access)

Phrasal Proper Names in German and Norwegian

Article discusses the morpho-syntax of phrasal proper names like Deutsche Bahn 'German Railway' and Norske Skog 'Norwegian Forest' in German and Norwegian. The authors document that phrasal proper names may show features of recursivity evidenced most clearly in Norwegian.
Date: September 9, 2023
Creator: Julien, Marit & Roehrs, Dorian
Object Type: Article
System: The UNT Digital Library