Degree Department

End of Term 2008 Presidential Web Archive: PDF Content Analysis

This presentation discusses the End of Term 2008 Presidential Web Archive. The University of North Texas (UNT) Libraries collaborated with members of the International Internet Preservation Consortium (IIPC) on the End of Term 2008 Presidential Web Harvest from October, 2008 to February, 2009. The project team archived 160,211,356 URIs during this collaboration, which became a research dataset for an IMLS-funded grant to investigate collection development using web archives. The project team analyzed the 10,318,073 PDFs and developed a retrieval and exploration system for collection developers interested in acquiring and developing born-digital collections from the End of Term Web Archive.
Date: December 5, 2012
Creator: Phillips, Mark Edward
System: The UNT Digital Library