Labeled PDF Dataset from End of Term (EOT) 2008 Web Archive

This dataset contains a random sample of 2000 PDF documents from the usda.gov domain in the End of Term (EOT) 2008 Web Archive. These samples were categorized as being of interest for possible inclusion in the Technical Report Archive and Image Library (TRAIL). Each PDF has been sorted into two categories, Technical_Report and Not_Technical_Report.
Date: July 2018
Creator: Kirkwood, Patricia; Phillips, Mark Edward & Caldwell, Christopher
System: The UNT Digital Library

Labeled PDF Dataset from Texas Records and Information Locator (TRAIL) Web Archive

This dataset contains a random sample of 2000 PDF documents from the Texas Records and Information Locator (TRAIL) Web Archive from the Texas State Library and Archives Commission. Each PDF has been sorted into two categories, TX_Pub_In_Scope and Not_TX_Pub.
Date: July 2018
Creator: Tarver, Hannah & Phillips, Mark Edward
System: The UNT Digital Library