States

Dallas Police Shooting Twitter Dataset

This dataset contains Twitter JSON data for several Twitter search queries that were collected the week following the shooting of police officers in Dallas, Texas on July 7th 2017, using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 7,146,993 Tweets make up the combined dataset.
Date: 2016-07-05/2016-07-14
Creator: Phillips, Mark Edward
System: The UNT Digital Library

2016 Democratic National Convention in Philadelphia Twitter Dataset

This dataset is comprised of tweets that are related to the 2016 Democratic National Committee meeting in Philadelphia, Pennsylvania that took place on July 25–28, 2016. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 15,676 Tweets make up the combined dataset.
Date: 2016-07-15/2016-08-01
Creator: Phillips, Mark Edward
System: The UNT Digital Library

Labeled PDF Dataset from End of Term (EOT) 2008 Web Archive

This dataset contains a random sample of 2000 PDF documents from the usda.gov domain in the End of Term (EOT) 2008 Web Archive. These samples were categorized as being of interest for possible inclusion in the Technical Report Archive and Image Library (TRAIL). Each PDF has been sorted into two categories, Technical_Report and Not_Technical_Report.
Date: July 2018
Creator: Kirkwood, Patricia; Phillips, Mark Edward & Caldwell, Christopher
System: The UNT Digital Library

John Lewis Twitter Dataset

This dataset contains Twitter JSON data for several Twitter search queries that were collected following the death on July 17, 2020, of American politician and civil-rights leader John Lewis, who served in the United States House of Representatives for Georgia's 5th congressional district from 1987 until his death. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 6,870,881 Tweets and 42,055 media files make up the combined dataset.
Date: 2020-07-10/2020-08-10
Creator: Phillips, Mark Edward
System: The UNT Digital Library

Hydroxychloroquine Twitter Dataset

This dataset contains Twitter JSON data for several Twitter search queries that were collected related to the drug hydroxychloroquine and its relationship as an effective coronavirus treatment. This dataset was created to capture the opinions on Twitter after a group of people calling themselves "America’s Frontline Doctors" released a video sharing misleading claims about the virus and the drugs use as an effective treatment. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 4,187,890 Tweets and 15,779 media files make up the combined dataset.
Date: 2020-07-20/2020-08-11
Creator: Phillips, Mark Edward
System: The UNT Digital Library

[Response Data: Survey of Benchmarks in Metadata Quality]

Complete, anonymized dataset of responses to the Survey of Benchmarks in Metadata Quality. Date, time, IP addresses, and geographic data has been omitted. Responses that included project, organization, and/or repository names were removed from this data, as well as potentially identifying names, acronyms, and/or links.
Date: July 2019
Creator: Digital Library Federation. Assessment Interest Group. Metadata Working Group. Benchmarks Sub-Group.
System: The UNT Digital Library

Labeled PDF Dataset from Texas Records and Information Locator (TRAIL) Web Archive

This dataset contains a random sample of 2000 PDF documents from the Texas Records and Information Locator (TRAIL) Web Archive from the Texas State Library and Archives Commission. Each PDF has been sorted into two categories, TX_Pub_In_Scope and Not_TX_Pub.
Date: July 2018
Creator: Tarver, Hannah & Phillips, Mark Edward
System: The UNT Digital Library