Leveraging Existing Bibliographic Metadata to Improve Automatic Document Identification in Web Archives

Presentation for the IIPC General Assembly and Web Archiving Conference held on May 10-12, 2023 in Hilversum, Netherlands. This presentation describes the results of the 2017 and 2022 ILMS grant projects carried out by the University of North Texas, which involved leveraging bibliographic metadata from UNT's collections to build better models for document classification.
Date: May 11, 2023
Creator: Phillips, Mark Edward; Caragea, Cornelia & Rikka, Praneeth
System: The UNT Digital Library

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

Presentation for the 2019 International Internet Preservation Consortium General Assembly and Web Archiving Conference. This presentation discusses research into leveraging machine learning to identify pdfs relevant to a collection from archived records.
Date: June 6, 2019
Creator: Phillips, Mark Edward; Caragea, Cornelia; Patel, Krutarth & Fox, Nathaniel T.
System: The UNT Digital Library