Extracting "Documents" from Web Archives

Presentation was given at the 2019 Texas Conference on Digital Libraries in Austin, Texas. This presentation discusses an IMLS funded research grant to use machine learning techniques to help identify high-value publications from web archives.
Date: May 22, 2019
Creator: Phillips, Mark Edward; Caragea, Cornelia; Patel, Krutarth & Fox, Nathaniel T.
Object Type: Presentation
System: The UNT Digital Library

Building Specialized Collections from Web Archives

Presentation given at the Artificial Intelligence for Data Discovery and Reuse (AIDR) 2019 conference in Pittsburgh, Pennsylvania. This presentation discusses work on creating datasets of high-value publications and documents from web archives that can be used for machine learning research to help classify these large collections of data.
Date: May 2019
Creator: Caragea, Cornelia & Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library

Facilitating User Access through the Extraction of Documents from Digital Archives

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about the University of North Texas' archive of government websites known as the CyberCemetery. The UNT Libraries have begun to extract documents embedded within the vast collection of web archives. Many of these documents include reports and transcripts from the various committees and agencies found in the collections. Through this project, the UNT Digital Library expands its role as a steward of digital resources in addition to making information easier to find.
Date: May 22, 2019
Creator: Fernandez, Mike & Tarver, Hannah
Object Type: Poster
System: The UNT Digital Library

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection
Date: May 22, 2019
Creator: Fox, Nathaniel T. & Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Observation Guide: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives (open access)

Observation Guide: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives

This is an observation guide used as part of the 'Exploring Methods and Techniques for Facilitating Access to Digital Language Archives' project (January 2019-August 2020).
Date: May 2019
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library

Classification of the End-of-Term Archive: Extending Collection Development Practices to Web Archives

This presentation is a brief outline of the End-of-Term archiving project done as a collaboration between the Library of Congress, the Internet Archive, the University of North Texas Libraries, and the California Digital Library.
Date: May 3, 2010
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Interview Guide for Archive Managers: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives (open access)

Interview Guide for Archive Managers: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives

This is an interview guide used as part of the 'Exploring Methods and Techniques for Facilitating Access to Digital Language Archives' project (January 2019-August 2020).
Date: May 2019
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library
Interview Guide for End Users: Researchers, Depositors, Language Communities: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives (open access)

Interview Guide for End Users: Researchers, Depositors, Language Communities: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives

This is an interview guide used as part of the 'Exploring Methods and Techniques for Facilitating Access to Digital Language Archives' project (January 2019-August 2020).
Date: May 2019
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library

CyberCemetery: Archiving Historically Significant Federal Websites

Presentation for the 2015 Society of Southwest Archivists Annual Meeting. This presentation discusses the CyberCemetery and archiving historically significant federal websites.
Date: May 22, 2015
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Focus Group Discussion Guide (open access)

Focus Group Discussion Guide

This document is part of the Web-at-Risk project. This is the focus group discussion guide used for the project. The purpose of the questions included in this guide are to create a comfortable atmosphere in which people feel valued for their participation, to establish the context for the discussion, and to provide the facilitator with information about the group.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library
Focus Group Participant Questionnaire (open access)

Focus Group Participant Questionnaire

This document is part of the Web-at-Risk project. This is the focus group participant questionnaire and lists seven questions for participants to answer.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library

Needs Assessment Toolkit

This presentation discusses the needs assessment toolkit created for the Web-at-Risk project. This presentation outlines the details related to the web archive development process and the activities related to the needs assessment.
Date: May 2005
Creator: Murray, Kathleen R.
Object Type: Presentation
System: The UNT Digital Library

Web Curation within Institutions: Dealing with Researchers

Presentation for the 2014 International Internet Preservation Consortium (IICP) General Assembly. This presentation discusses web curation within institutions and dealing with researchers.
Date: May 23, 2014
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
End User Interview Questionnaire (open access)

End User Interview Questionnaire

This document is the end user interview questionnaire used for the Web-at-Risk project. It includes instructions for the interviewer, key concepts, and digital archive examples along with the questions to be asked.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library
Needs Assessment Survey (open access)

Needs Assessment Survey

This document is part of the Web-at-Risk project. This is the needs assessment survey for the project. The purpose of this assessment is twofold: (1) to identify curator and end-user needs that impact the collection development process for web archives, and (2) To identify the requirements for the Curator User Interface (CUI) to the web crawler and associated tools in the areas of content crawling, crawl progress monitoring, crawl quality assessment, management and description of crawled content, searching and browsing of crawled content, and preservation of crawled content.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library

Breaking Down the Book: Literature Review and Practice of Book Digitization Methods

Presentation exploring book digitization practices and standards and share how they are implemented by the UNT Digital Projects Lab. With a large number of books flowing into their department, the authors decided to conduct research on the methods of scanning using the equipment and software available to them. This presentation reports on their findings and what factors went into deciding what type of books get scanned where and how. Results include changes to local digitizing standards and clarified workflows based on the types of books to be scanned. It was presented at the 2023 Texas Conference on Digital Libraries (TCDL) held May 16-18, 2023 in Austin, Texas.
Date: May 18, 2023
Creator: McIntosh, Marcia & Kellum, Christina
Object Type: Presentation
System: The UNT Digital Library

URL Nomination Tool

Presentation for the 2014 International Internet Preservation Consortium (IICP) General Assembly. This presentation discusses the URL nomination tool.
Date: May 24, 2014
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library

Investigations Into Using Machine Learning Models to Automate the Sorting of Digitized Texas State Publications.

This poster highlighting the development of machine learning model to automate part of the process of digitizing and archiving documents from the Texas State Depository Program. This particular part of the process is the sorting of documents to facilitate metadata creation. It was presented at the 2023 Texas Conference on Digital Libraries (TCDL) held May 16-18, 2023 in Austin, Texas.
Date: May 16, 2023
Creator: Rikka, Praneeth & Phillips, Mark Edward
Object Type: Poster
System: The UNT Digital Library

Accessible History: Putting a Century of The Chronicles of Oklahoma Online

Presentation sharing the project workflows for digitizing back issues of The Chronicles of Oklahoma. It has been published since 1921, and in 2020, the Oklahoma Historical Society partnered with the UNT Digital Library to make the back issues freely available through The Gateway to Oklahoma History. It was presented at the 2023 NASIG Conference held May 22-25, 2023 in Pittsburgh, Pennsylvania.
Date: May 25, 2023
Creator: Johnson-Freeman, Whitney R.; Scott, Megan E. & Carroll, Hannah
Object Type: Presentation
System: The UNT Digital Library
Preserving Access to Government Websites: Development and Practice in the CyberCemetery (open access)

Preserving Access to Government Websites: Development and Practice in the CyberCemetery

This paper discusses the development and practice in the CyberCemetery. In the late 1990's, online U.S. government information was appearing and disappearing at a rapid pace. In 1999, the University of North Texas Libraries (UNT) formed a partnership with the U.S. Government Printing Office (GPO) to address this issue by archiving electronic government websites. This archive, known as the CyberCemetery, provides permanent public access to the websites and publications of defunct U.S. government agencies and commissions. This partnership between UNT and GPO has expanded to include the National Archives and Records Administration (NARA). This paper covers the CyberCemetery's development and the process of identifying, capturing, and publishing content in the archive.
Date: May 26, 2008
Creator: Hoffman, Starr
Object Type: Paper
System: The UNT Digital Library

Saving the Byrds: Reshaping Digitization Workflows for Photographic Materials

Presentation detailing the customized workflow established for the creation of the Byrd Williams Family Photography Collection in the UNT Digital Library, as well as the unique problems and solutions that arose throughout the course of the project. It was presented at the 2023 Texas Conference on Digital Libraries (TCDL) held May 16-18, 2023 in Austin, Texas.
Date: May 18, 2023
Creator: Ekberg, Samantha
Object Type: Presentation
System: The UNT Digital Library
Open Source Components, Standards Conformance, and UCD: Building Blocks for Successfully Managing and Enhancing an Established Digital Archive (open access)

Open Source Components, Standards Conformance, and UCD: Building Blocks for Successfully Managing and Enhancing an Established Digital Archive

This paper discusses open source components, standard conformance, and UCD as it relates to The Portal to Texas History.
Date: May 2010
Creator: Murray, Kathleen R. & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library
Content Producer Interview Questionnaire (open access)

Content Producer Interview Questionnaire

This document is an interview, questionnaire for the Web-at-Risk project. The purpose of this interview is to explore the issues information publishers or content producers have regarding web archives. The purpose of this discussion is to elicit the needs and thoughts of the users regarding web archives of materials created by a third party, such as a universal library.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library

IIIF and the UNT Digital Collections

Presentation for the International Image Interoperability Framework (IIIF) Archives Community Group virtual meeting held on May 12, 2020. The presentation is a demonstration of the IIIF by the University of North Texas' digital collections.
Date: May 12, 2020
Creator: Hicks, William & McIntosh, Marcia
Object Type: Presentation
System: The UNT Digital Library