Extracting "Documents" from Web Archives

Presentation was given at the 2019 Texas Conference on Digital Libraries in Austin, Texas. This presentation discusses an IMLS funded research grant to use machine learning techniques to help identify high-value publications from web archives.
Date: May 22, 2019
Creator: Phillips, Mark Edward; Caragea, Cornelia; Patel, Krutarth & Fox, Nathaniel T.
Object Type: Presentation
System: The UNT Digital Library

Building Specialized Collections from Web Archives

Presentation given at the Artificial Intelligence for Data Discovery and Reuse (AIDR) 2019 conference in Pittsburgh, Pennsylvania. This presentation discusses work on creating datasets of high-value publications and documents from web archives that can be used for machine learning research to help classify these large collections of data.
Date: May 2019
Creator: Caragea, Cornelia & Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library

Facilitating User Access through the Extraction of Documents from Digital Archives

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about the University of North Texas' archive of government websites known as the CyberCemetery. The UNT Libraries have begun to extract documents embedded within the vast collection of web archives. Many of these documents include reports and transcripts from the various committees and agencies found in the collections. Through this project, the UNT Digital Library expands its role as a steward of digital resources in addition to making information easier to find.
Date: May 22, 2019
Creator: Fernandez, Mike & Tarver, Hannah
Object Type: Poster
System: The UNT Digital Library

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection
Date: May 22, 2019
Creator: Fox, Nathaniel T. & Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library

Entry door to the Lorenzo de Zavala Texas State Library and Archives building

Photograph of the detail of the entry doorway to the Lorenzo de Zavala Texas State Library and Archives Building. A sign above the doorway reads "Lorenzo de Zavala", and a panel of green glass is set into the wall above it.
Date: May 5, 2005
Creator: Belden, Dreanna L.
Object Type: Photograph
System: The Portal to Texas History

Classification of the End-of-Term Archive: Extending Collection Development Practices to Web Archives

This presentation is a brief outline of the End-of-Term archiving project done as a collaboration between the Library of Congress, the Internet Archive, the University of North Texas Libraries, and the California Digital Library.
Date: May 3, 2010
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Looting and Restitution During World War II: a Comparison Between the Soviet Union Trophy Commission and the Western Allies Monuments, Fine Arts, and Archives Commission (open access)

Looting and Restitution During World War II: a Comparison Between the Soviet Union Trophy Commission and the Western Allies Monuments, Fine Arts, and Archives Commission

From the earliest civilizations, victorious armies would loot defeated cities or nations. the practice evolved into art theft as a symbol of power. Cultural superiority confirmed a country or empire’s regime. Throughout history, the Greeks and Romans cultivated, Napoleon Bonaparte refined, and Adolf Hitler perfected the practice of plunder. As the tides of Second World War began to shift in favor of the Allied Powers, special commissions, established to locate the Germans’ hoards of treasure, discovered Nazi art repositories filled with art objects looted from throughout Europe. the Soviet Union Trophy Commission and the Western Allies Monuments, Fine Arts, and Archives Commission competed to discover Nazi war loot. the two organizations not only approached the subject of plunder as a treasure hunt, but the ideology motivating both commissions made uncovering the depositories first, a priority. the Soviet trophy brigades’ mission was to dismantle all items of financial worth and ship them eastward to help rebuild a devastated Soviet economy. the Soviet Union wished for the re-compensation of cultural valuables destroyed by the Nazis’ purification practices regarding “inferior” Slavic art and architecture; however, the defeated German nation did not have the ability to reimburse the Soviet State. the trophy brigades implemented …
Date: May 2012
Creator: Zelman, Laura Holsomback
Object Type: Thesis or Dissertation
System: The UNT Digital Library

CyberCemetery: Archiving Historically Significant Federal Websites

Presentation for the 2015 Society of Southwest Archivists Annual Meeting. This presentation discusses the CyberCemetery and archiving historically significant federal websites.
Date: May 22, 2015
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Focus Group Discussion Guide (open access)

Focus Group Discussion Guide

This document is part of the Web-at-Risk project. This is the focus group discussion guide used for the project. The purpose of the questions included in this guide are to create a comfortable atmosphere in which people feel valued for their participation, to establish the context for the discussion, and to provide the facilitator with information about the group.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library
Focus Group Participant Questionnaire (open access)

Focus Group Participant Questionnaire

This document is part of the Web-at-Risk project. This is the focus group participant questionnaire and lists seven questions for participants to answer.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library

Needs Assessment Toolkit

This presentation discusses the needs assessment toolkit created for the Web-at-Risk project. This presentation outlines the details related to the web archive development process and the activities related to the needs assessment.
Date: May 2005
Creator: Murray, Kathleen R.
Object Type: Presentation
System: The UNT Digital Library

Web Curation within Institutions: Dealing with Researchers

Presentation for the 2014 International Internet Preservation Consortium (IICP) General Assembly. This presentation discusses web curation within institutions and dealing with researchers.
Date: May 23, 2014
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
End User Interview Questionnaire (open access)

End User Interview Questionnaire

This document is the end user interview questionnaire used for the Web-at-Risk project. It includes instructions for the interviewer, key concepts, and digital archive examples along with the questions to be asked.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library
Needs Assessment Survey (open access)

Needs Assessment Survey

This document is part of the Web-at-Risk project. This is the needs assessment survey for the project. The purpose of this assessment is twofold: (1) to identify curator and end-user needs that impact the collection development process for web archives, and (2) To identify the requirements for the Curator User Interface (CUI) to the web crawler and associated tools in the areas of content crawling, crawl progress monitoring, crawl quality assessment, management and description of crawled content, searching and browsing of crawled content, and preservation of crawled content.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library
Designing Archival Collections to Support Language Revitalization: Case Study of the Boro Language Resource (open access)

Designing Archival Collections to Support Language Revitalization: Case Study of the Boro Language Resource

Indigenous communities around the world are losing their languages at accelerating rates to the effects of the climate crisis and global capitalism. To preserve samples of these languages facing endangerment and extinction, samples of language use (e.g., audio-video recordings, photographs, textual transcriptions, translations, and analyses) are created and stored in language archives: repositories intended to provide long-term preservation of and access to language materials. In recent years, archives of all kinds are considering their origins and audiences. With the emergence of the community paradigm of archiving framework, the roles of archivists, communities, and institutions are under re-examination. Language archives too are reflecting this trend, as it becomes more common for speakers of Indigenous languages (also known as language communities) to document and archive their own languages and histories. As the landscape of language archiving expands, we now see increased emphasis on the re-use of archival material, particularly to support language revitalization—efforts to increase and maintain the use of the language. There are calls for language documentation (and, by extension, language archiving) to prioritize revitalization efforts. This dissertation is a case study of one language archive collection: the Boro Language Resource in the Computational Resource for South Asian Languages (CoRSAL) archive. …
Date: May 2023
Creator: Burke, Mary
Object Type: Thesis or Dissertation
System: The UNT Digital Library

URL Nomination Tool

Presentation for the 2014 International Internet Preservation Consortium (IICP) General Assembly. This presentation discusses the URL nomination tool.
Date: May 24, 2014
Creator: Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library

Investigations Into Using Machine Learning Models to Automate the Sorting of Digitized Texas State Publications.

This poster highlighting the development of machine learning model to automate part of the process of digitizing and archiving documents from the Texas State Depository Program. This particular part of the process is the sorting of documents to facilitate metadata creation. It was presented at the 2023 Texas Conference on Digital Libraries (TCDL) held May 16-18, 2023 in Austin, Texas.
Date: May 16, 2023
Creator: Rikka, Praneeth & Phillips, Mark Edward
Object Type: Poster
System: The UNT Digital Library

Accessible History: Putting a Century of The Chronicles of Oklahoma Online

Presentation sharing the project workflows for digitizing back issues of The Chronicles of Oklahoma. It has been published since 1921, and in 2020, the Oklahoma Historical Society partnered with the UNT Digital Library to make the back issues freely available through The Gateway to Oklahoma History. It was presented at the 2023 NASIG Conference held May 22-25, 2023 in Pittsburgh, Pennsylvania.
Date: May 25, 2023
Creator: Johnson-Freeman, Whitney R.; Scott, Megan E. & Carroll, Hannah
Object Type: Presentation
System: The UNT Digital Library
Preserving Access to Government Websites: Development and Practice in the CyberCemetery (open access)

Preserving Access to Government Websites: Development and Practice in the CyberCemetery

This paper discusses the development and practice in the CyberCemetery. In the late 1990's, online U.S. government information was appearing and disappearing at a rapid pace. In 1999, the University of North Texas Libraries (UNT) formed a partnership with the U.S. Government Printing Office (GPO) to address this issue by archiving electronic government websites. This archive, known as the CyberCemetery, provides permanent public access to the websites and publications of defunct U.S. government agencies and commissions. This partnership between UNT and GPO has expanded to include the National Archives and Records Administration (NARA). This paper covers the CyberCemetery's development and the process of identifying, capturing, and publishing content in the archive.
Date: May 26, 2008
Creator: Hoffman, Starr
Object Type: Paper
System: The UNT Digital Library
Open Source Components, Standards Conformance, and UCD: Building Blocks for Successfully Managing and Enhancing an Established Digital Archive (open access)

Open Source Components, Standards Conformance, and UCD: Building Blocks for Successfully Managing and Enhancing an Established Digital Archive

This paper discusses open source components, standard conformance, and UCD as it relates to The Portal to Texas History.
Date: May 2010
Creator: Murray, Kathleen R. & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library
Content Producer Interview Questionnaire (open access)

Content Producer Interview Questionnaire

This document is an interview, questionnaire for the Web-at-Risk project. The purpose of this interview is to explore the issues information publishers or content producers have regarding web archives. The purpose of this discussion is to elicit the needs and thoughts of the users regarding web archives of materials created by a third party, such as a universal library.
Date: May 31, 2005
Creator: Murray, Kathleen R.
Object Type: Text
System: The UNT Digital Library

Texas Borderlands Newspaper Collection: Newspaper Preservation and Access, One Page at a Time

Presentation given at the Texas Conference on Digital Libraries 2019 in Austin, Texas. This presentation discusses the Texas Digital Newspaper Program (TDNP) and the Texas Borderlands Newspaper Collection, a project funded by the TexTreasures program of the Texas State Library and Archives Commission (TSLAC).
Date: May 22, 2019
Creator: Krahmer, Ana & Phillips, Mark Edward
Object Type: Presentation
System: The UNT Digital Library
Building Specialized Collections from Web Archiving (open access)

Building Specialized Collections from Web Archiving

Short paper presented at Artificial Intelligence for Data Discovery and Reuse (AIDR). This short paper presents work on creating datasets of high-value publications and documents from web archives that can be used for machine learning research to help classify these large collections of data.
Date: May 2019
Creator: Caragea, Cornelia & Phillips, Mark Edward
Object Type: Paper
System: The UNT Digital Library
The West Gulf Blockade, 1861-1865: An Evaluation (open access)

The West Gulf Blockade, 1861-1865: An Evaluation

This investigation resulted from a pilot research paper prepared in conjunction with a graduate course on the Civil War. This study suggested that the Federal blockade of the Confederacy may not have contributed significantly to its defeat. Traditionally, historians had assumed that the Union's Anaconda Plan had effectively strangled the Confederacy. Recent studies which compared the statistics of ships captured to successful infractions of the blockade had somewhat revised these views. While accepting these revisionist findings as broadly valid, this investigation strove to determine specifically the effectiveness of Admiral Farragut's West Gulf Blockading Squadron. Since the British Foreign Office maintained consulates in three blockaded southern ports and in many Caribbean ports through which blockade running was conducted, these consular records were vital for this study. Personal research in Great Britain's Public Record Office disclosed valuable consular reports pertaining to the effectiveness of the Federal blockade. American consular records, found in the National Archives in Washington, D.C. provided excellent comparative reports from those same Gulf ports. Official Confederate reports, contained in the National Archives, various state archives and in the published Official Records of the Union and Confederate Armies revealed valuable statistical data on foreign imports. Limited use was made of …
Date: May 1974
Creator: Glover, Robert W.
Object Type: Thesis or Dissertation
System: The UNT Digital Library