SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction

Access: Use of this item is restricted to the UNT Community
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Date: August 2019
Creator: Florescu, Corina Andreea
System: The UNT Digital Library

Integrating Multiple Deep Learning Models for Disaster Description in Low-Altitude Videos

Computer vision technologies are rapidly improving and becoming more important in disaster response. The majority of disaster description techniques now focus either on identify objects or categorize disasters. In this study, we trained multiple deep neural networks on low-altitude imagery with highly imbalanced and noisy labels. We utilize labeled images from the LADI dataset to formulate a solution for general problem in disaster classification and object detection. Our research integrated and developed multiple deep learning models that does the object detection task as well as the disaster scene classification task. Our solution is competitive in the TRECVID Disaster Scene Description and Indexing (DSDI) task, demonstrating that it is comparable to other suggested approaches in retrieving disaster-related video clips.
Date: December 2022
Creator: Wang, Haili
System: The UNT Digital Library
A Smooth-turn Mobility Model for Airborne Networks (open access)

A Smooth-turn Mobility Model for Airborne Networks

In this article, I introduce a novel airborne network mobility model, called the Smooth Turn Mobility Model, that captures the correlation of acceleration for airborne vehicles across time and spatial coordinates. E?ective routing in airborne networks (ANs) relies on suitable mobility models that capture the random movement pattern of airborne vehicles. As airborne vehicles cannot make sharp turns as easily as ground vehicles do, the widely used mobility models for Mobile Ad Hoc Networks such as Random Waypoint and Random Direction models fail. Our model is realistic in capturing the tendency of airborne vehicles toward making straight trajectory and smooth turns with large radius, and whereas is simple enough for tractable connectivity analysis and routing design.
Date: August 2012
Creator: He, Dayin
System: The UNT Digital Library
Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling (open access)

Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling

Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I …
Date: August 2011
Creator: Shen, Yao
System: The UNT Digital Library
Procedural Generation of Content for Online Role Playing Games (open access)

Procedural Generation of Content for Online Role Playing Games

Video game players demand a volume of content far in excess of the ability of game designers to create it. For example, a single quest might take a week to develop and test, which means that companies such as Blizzard are spending millions of dollars each month on new content for their games. As a result, both players and developers are frustrated with the inability to meet the demand for new content. By generating content on-demand, it is possible to create custom content for each player based on player preferences. It is also possible to make use of the current world state during generation, something which cannot be done with current techniques. Using developers to create rules and assets for a content generator instead of creating content directly will lower development costs as well as reduce the development time for new game content to seconds rather than days. This work is part of the field of computational creativity, and involves the use of computers to create aesthetically pleasing game content, such as terrain, characters, and quests. I demonstrate agent-based terrain generation, and economic modeling of game spaces. I also demonstrate the autonomous generation of quests for online role playing games, …
Date: August 2014
Creator: Doran, Jonathon
System: The UNT Digital Library
Biomedical Semantic Embeddings: Using Hybrid Sentences to Construct Biomedical Word Embeddings and its Applications (open access)

Biomedical Semantic Embeddings: Using Hybrid Sentences to Construct Biomedical Word Embeddings and its Applications

Word embeddings is a useful method that has shown enormous success in various NLP tasks, not only in open domain but also in biomedical domain. The biomedical domain provides various domain specific resources and tools that can be exploited to improve performance of these word embeddings. However, most of the research related to word embeddings in biomedical domain focuses on analysis of model architecture, hyper-parameters and input text. In this paper, we use SemMedDB to design new sentences called `Semantic Sentences'. Then we use these sentences in addition to biomedical text as inputs to the word embedding model. This approach aims at introducing biomedical semantic types defined by UMLS, into the vector space of word embeddings. The semantically rich word embeddings presented here rivals state of the art biomedical word embedding in both semantic similarity and relatedness metrics up to 11%. We also demonstrate how these semantic types in word embeddings can be utilized.
Date: December 2019
Creator: Shaik, Arshad
System: The UNT Digital Library
General Purpose Programming on Modern Graphics Hardware (open access)

General Purpose Programming on Modern Graphics Hardware

I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
Date: May 2008
Creator: Fleming, Robert
System: The UNT Digital Library
Modeling Epidemics on Structured Populations: Effects of Socio-demographic Characteristics and Immune Response Quality (open access)

Modeling Epidemics on Structured Populations: Effects of Socio-demographic Characteristics and Immune Response Quality

Epidemiologists engage in the study of the distribution and determinants of health-related states or events in human populations. Eventually, they will apply that study to prevent and control problems and contingencies associated with the health of the population. Due to the spread of new pathogens and the emergence of new bio-terrorism threats, it has become imperative to develop new and expand existing techniques to equip public health providers with robust tools to predict and control health-related crises. In this dissertation, I explore the effects caused in the disease dynamics by the differences in individuals’ physiology and social/behavioral characteristics. Multiple computational and mathematical models were developed to quantify the effect of those factors on spatial and temporal variations of the disease epidemics. I developed statistical methods to measure the effects caused in the outbreak dynamics by the incorporation of heterogeneous demographics and social interactions to the individuals of the population. Specifically, I studied the relationship between demographics and the physiological characteristics of an individual when preparing for an infectious disease epidemic.
Date: August 2014
Creator: Reyes Silveyra, Jorge A.
System: The UNT Digital Library
Modeling Synergistic Relationships Between Words and Images (open access)

Modeling Synergistic Relationships Between Words and Images

Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept. By uncovering synergistic, semantic relationships that exist between words and images, I am working to develop novel techniques that can help improve tasks in natural language processing, as well as effective models for text-to-image synthesis, image retrieval, and automatic image annotation. Specifically, in my dissertation, I will explore the interoperability of features between language and vision tasks. In the first part, I will show how it is possible to apply features generated using evidence gathered from text corpora to solve the image annotation problem in computer vision, without the use of any visual information. In the second part, I will address research in the reverse direction, and show how visual cues can be used to improve tasks in natural language processing. Importantly, I propose a novel metric to estimate the similarity of words by comparing the visual similarity of concepts invoked by these words, and show that it can be used further to advance the state-of-the-art methods that employ corpus-based and knowledge-based semantic similarity measures. Finally, I attempt to construct a joint semantic space connecting words with images, and synthesize an evaluation framework to quantify cross-modal …
Date: December 2012
Creator: Leong, Chee Wee
System: The UNT Digital Library
Secure and Energy Efficient Execution Frameworks Using Virtualization and Light-weight Cryptographic Components (open access)

Secure and Energy Efficient Execution Frameworks Using Virtualization and Light-weight Cryptographic Components

Security is a primary concern in this era of pervasive computing. Hardware based security mechanisms facilitate the construction of trustworthy secure systems; however, existing hardware security approaches require modifications to the micro-architecture of the processor and such changes are extremely time consuming and expensive to test and implement. Additionally, they incorporate cryptographic security mechanisms that are computationally intensive and account for excessive energy consumption, which significantly degrades the performance of the system. In this dissertation, I explore the domain of hardware based security approaches with an objective to overcome the issues that impede their usability. I have proposed viable solutions to successfully test and implement hardware security mechanisms in real world computing systems. Moreover, with an emphasis on cryptographic memory integrity verification technique and embedded systems as the target application, I have presented energy efficient architectures that considerably reduce the energy consumption of the security mechanisms, thereby improving the performance of the system. The detailed simulation results show that the average energy savings are in the range of 36% to 99% during the memory integrity verification phase, whereas the total power savings of the entire embedded processor are approximately 57%.
Date: August 2014
Creator: Nimgaonkar, Satyajeet
System: The UNT Digital Library

Peptide-based hidden Markov model for peptide fingerprint mapping.

Access: Use of this item is restricted to the UNT Community
Peptide mass fingerprinting (PMF) was the first automated method for protein identification in proteomics, and it remains in common usage today because of its simplicity and the low equipment costs for generating fingerprints. However, one of the problems with PMF is its limited specificity and sensitivity in protein identification. Here I present a method that shows potential to significantly enhance the accuracy of peptide mass fingerprinting, using a machine learning approach based on a hidden Markov model (HMM). This method is applied to improve differentiation of real protein matches from those that occur by chance. The system was trained using 300 examples of combined real and false-positive protein identification results, and 10-fold cross-validation applied to assess model discrimination. The model can achieve 93% accuracy in distinguishing correct and real protein identification results versus false-positive matches. The receiver operating characteristic (ROC) curve area for the best model was 0.833.
Date: December 2004
Creator: Yang, Dongmei
System: The UNT Digital Library
Techniques for Improving Uniformity in Direct Mapped Caches (open access)

Techniques for Improving Uniformity in Direct Mapped Caches

Directly mapped caches are an attractive option for processor designers as they combine fast lookup times with reduced complexity and area. However, directly-mapped caches are prone to higher miss-rates as there are no candidates for replacement on a cache miss, hence data residing in a cache set would have to be evicted to the next level cache. Another issue that inhibits cache performance is the non-uniformity of accesses exhibited by most applications: some sets are under-utilized while others receive the majority of accesses. This implies that increasing the size of caches may not lead to proportionally improved cache hit rates. Several solutions that address cache non-uniformity have been proposed in the literature. These techniques have been proposed over the past decade and each proposal independently claims the benefit of reduced conflict misses. However, because the published results use different benchmarks and different experimental setups, (there is no established frame of reference for comparing these results) it is not easy to compare them. In this work we report a side-by-side comparison of these techniques. Finally, we propose and Adaptive-Partitioned cache for multi-threaded applications. This design limits inter-thread thrashing while dynamically reducing traffic to heavily accessed sets.
Date: May 2011
Creator: Nwachukwu, Izuchukwu Udochi
System: The UNT Digital Library

Extracting Dimensions of Interpersonal Interactions and Relationships

People interact with each other through natural language to express feelings, thoughts, intentions, instructions etc. These interactions as a result form relationships. Besides names of relationships like siblings, spouse, friends etc., a number of dimensions (e.g. cooperative vs. competitive, temporary vs. enduring, equal vs. hierarchical etc.) can also be used to capture the underlying properties of interpersonal interactions and relationships. More fine-grained descriptors (e.g. angry, rude, nice, supportive etc.) can also be used to indicate the reasons or social-acts behind the dimension cooperative vs. competitive. The way people interact with others may also tell us about their personal traits, which in turn may be indicative of their probable success in their future. The works presented in the dissertation involve creating corpora with fine-grained descriptors of interactions and relationships. We also described experiments and their results that indicated that the processes of identifying the dimensions can be automated.
Date: August 2020
Creator: Rashid, Farzana
System: The UNT Digital Library
Traffic Forecasting Applications Using Crowdsourced Traffic Reports and Deep Learning (open access)

Traffic Forecasting Applications Using Crowdsourced Traffic Reports and Deep Learning

Intelligent transportation systems (ITS) are essential tools for traffic planning, analysis, and forecasting that can utilize the huge amount of traffic data available nowadays. In this work, we aggregated detailed traffic flow sensor data, Waze reports, OpenStreetMap (OSM) features, and weather data, from California Bay Area for 6 months. Using that data, we studied three novel ITS applications using convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The first experiment is an analysis of the relation between roadway shapes and accident occurrence, where results show that the speed limit and number of lanes are significant predictors for major accidents on highways. The second experiment presents a novel method for forecasting congestion severity using crowdsourced data only (Waze, OSM, and weather), without the need for traffic sensor data. The third experiment studies the improvement of traffic flow forecasting using accidents, number of lanes, weather, and time-related features, where results show significant performance improvements when the additional features where used.
Date: May 2020
Creator: Alammari, Ali
System: The UNT Digital Library
Cross Language Information Retrieval for Languages with Scarce Resources (open access)

Cross Language Information Retrieval for Languages with Scarce Resources

Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system.
Date: May 2009
Creator: Loza, Christian
System: The UNT Digital Library
Extracting Possessions and Their Attributes (open access)

Extracting Possessions and Their Attributes

Possession is an asymmetric semantic relation between two entities, where one entity (the possessee) belongs to the other entity (the possessor). Automatically extracting possessions are useful in identifying skills, recommender systems and in natural language understanding. Possessions can be found in different communication modalities including text, images, videos, and audios. In this dissertation, I elaborate on the techniques I used to extract possessions. I begin with extracting possessions at the sentence level including the type and temporal anchors. Then, I extract the duration of possession and co-possessions (if multiple possessors possess the same entity). Next, I extract possessions from an entire Wikipedia article capturing the change of possessors over time. I extract possessions from social media including both text and images. Finally, I also present dense annotations generating possession timelines. I present separate datasets, detailed corpus analysis, and machine learning models for each task described above.
Date: May 2020
Creator: Chinnappa, Dhivya Infant
System: The UNT Digital Library

Deep Learning Optimization and Acceleration

The novelty of this dissertation is the optimization and acceleration of deep neural networks aimed at real-time predictions with minimal energy consumption. It consists of cross-layer optimization, output directed dynamic quantization, and opportunistic near-data computation for deep neural network acceleration. On two datasets (CIFAR-10 and CIFAR-100), the proposed deep neural network optimization and acceleration frameworks are tested using a variety of Convolutional neural networks (e.g., LeNet-5, VGG-16, GoogLeNet, DenseNet, ResNet). Experimental results are promising when compared to other state-of-the-art deep neural network acceleration efforts in the literature.
Date: August 2022
Creator: Jiang, Beilei
System: The UNT Digital Library
Distributed Frameworks Towards Building an Open Data Architecture (open access)

Distributed Frameworks Towards Building an Open Data Architecture

Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework …
Date: May 2015
Creator: Venumuddala, Ramu Reddy
System: The UNT Digital Library
The Influence of Social Network Graph Structure on Disease Dynamics in a Simulated Environment (open access)

The Influence of Social Network Graph Structure on Disease Dynamics in a Simulated Environment

The fight against epidemics/pandemics is one of man versus nature. Technological advances have not only improved existing methods for monitoring and controlling disease outbreaks, but have also provided new means for investigation, such as through modeling and simulation. This dissertation explores the relationship between social structure and disease dynamics. Social structures are modeled as graphs, and outbreaks are simulated based on a well-recognized standard, the susceptible-infectious-removed (SIR) paradigm. Two independent, but related, studies are presented. The first involves measuring the severity of outbreaks as social network parameters are altered. The second study investigates the efficacy of various vaccination policies based on social structure. Three disease-related centrality measures are introduced, contact, transmission, and spread centrality, which are related to previously established centrality measures degree, betweenness, and closeness, respectively. The results of experiments presented in this dissertation indicate that reducing the neighborhood size along with outside-of-neighborhood contacts diminishes the severity of disease outbreaks. Vaccination strategies can effectively reduce these parameters. Additionally, vaccination policies that target individuals with high centrality are generally shown to be slightly more effective than a random vaccination policy. These results combined with past and future studies will assist public health officials in their effort to minimize the effects …
Date: December 2010
Creator: Johnson, Tina V.
System: The UNT Digital Library
Multilingual Word Sense Disambiguation Using Wikipedia (open access)

Multilingual Word Sense Disambiguation Using Wikipedia

Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Generally the problem of resolving ambiguity in literature has revolved around the famous quote “you shall know the meaning of the word by the company it keeps.” In this thesis, we investigate the role of context for resolving ambiguity through three different approaches. Instead of using a predefined monolingual sense inventory such as WordNet, we use a language-independent framework where the word senses and sense-tagged data are derived automatically from Wikipedia. Using Wikipedia as a source of sense-annotations provides the much needed solution for knowledge acquisition bottleneck. In order to evaluate the viability of Wikipedia based sense-annotations, we cast the task of disambiguating polysemous nouns as a monolingual classification task and experimented on lexical samples from four different languages (viz. English, German, Italian and Spanish). The experiments confirm that the Wikipedia based sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. …
Date: August 2013
Creator: Dandala, Bharath
System: The UNT Digital Library
Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacenters (open access)

Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacenters

In this dissertation, I first incorporate declustered redundant array of independent disks (RAID) technology in the existing system by maximizing the aggregated recovery I/O and accelerating post-failure remediation. Our analytical model affirms the accelerated data recovery stage significantly improves storage reliability. Then I present a proactive data protection framework that augments storage availability and reliability. It utilizes the failure prediction methods to efficiently rescue data on drives before failures occur, which significantly reduces the storage downtime and lowers the risk of nested failures. Finally, I investigate how an active storage system enables energy-efficient computing. I explore an emerging storage device named Ethernet drive to offload data-intensive workloads from the host to drives and process the data on drives. It not only minimizes data movement and power usage, but also enhances data availability and storage scalability. In summary, my dissertation research provides intelligence at the drive, storage node, and system levels to tackle the rising reliability challenge in modern HPC datacenters. The results indicate that this novel storage paradigm cost-effectively improves storage scalability, availability, and reliability.
Date: August 2020
Creator: Qiao, Zhi
System: The UNT Digital Library
Radio Resource Control Approaches for LTE-Advanced Femtocell Networks (open access)

Radio Resource Control Approaches for LTE-Advanced Femtocell Networks

The architecture of mobile networks has dramatically evolved in order to fulfill the growing demands on wireless services and data. The radio resources, which are used by the current mobile networks, are limited while the users demands are substantially increasing. In the future, tremendous Internet applications are expected to be served by mobile networks. Therefore, increasing the capacity of mobile networks has become a vital issue. Heterogeneous networks (HetNets) have been considered as a promising paradigm for future mobile networks. Accordingly, the concept of small cell has been introduced in order to increase the capacity of the mobile networks. A femtocell network is a kind of small cell networks. Femtocells are deployed within macrocells coverage. Femtocells cover small areas and operate with low transmission power while providing high capacity. Also, UEs can be offloaded from macrocells to femtocells. Thus, the capacity can be increased. However, this will introduce different technical challenges. The interference has become one of the key challenges for deploying femtocells within a certain macrocells coverage. Undesirable impact of the interference can degrade the performance of the mobile networks. Therefore, radio resource management mechanisms are needed in order to address key challenges of deploying femtocells. The objective of …
Date: August 2018
Creator: Alotaibi, Sultan Radhi
System: The UNT Digital Library
Investigation on Segmentation, Recognition and 3D Reconstruction of Objects Based on LiDAR Data Or MRI (open access)

Investigation on Segmentation, Recognition and 3D Reconstruction of Objects Based on LiDAR Data Or MRI

Segmentation, recognition and 3D reconstruction of objects have been cutting-edge research topics, which have many applications ranging from environmental and medical to geographical applications as well as intelligent transportation. In this dissertation, I focus on the study of segmentation, recognition and 3D reconstruction of objects using LiDAR data/MRI. Three main works are that (I). Feature extraction algorithm based on sparse LiDAR data. A novel method has been proposed for feature extraction from sparse LiDAR data. The algorithm and the related principles have been described. Also, I have tested and discussed the choices and roles of parameters. By using correlation of neighboring points directly, statistic distribution of normal vectors at each point has been effectively used to determine the category of the selected point. (II). Segmentation and 3D reconstruction of objects based on LiDAR/MRI. The proposed method includes that the 3D LiDAR data are layered, that different categories are segmented, and that 3D canopy surfaces of individual tree crowns and clusters of trees are reconstructed from LiDAR point data based on a region active contour model. The proposed method allows for delineations of 3D forest canopy naturally from the contours of raw LiDAR point clouds. The proposed model is suitable not …
Date: May 2015
Creator: Tang, Shijun
System: The UNT Digital Library

Secure and Decentralized Data Cooperatives via Reputation Systems and Blockchain

This dissertation focuses on a novel area of secure data management referred to as data cooperatives. A data cooperative solution promises its users better protection and control of their personal data as compared to the traditional way of their handling by the data collectors (such as governments, big data companies, and others). However, despite the many interesting benefits that the data cooperative approach tends to provide its users, it suffers from a few challenges hindering its development, adoption, and widespread use among data providers and consumers. To address these issues, we have divided this dissertation into two parts. In the first part, we identify the existing challenges and propose and implement a decentralized architecture built atop a blockchain system. Our solution leverages the inherent decentralized, tamper-resistant, and security properties of the blockchain. The implementation of our system was carried out on an existing blockchain test network, Ropsten, and our results show that blockchain is an efficient and scalable platform for the development of a decentralized data cooperative solution. In the second part of this work, we further addressed the existing challenges and the limitations of the implementation from the first part of our work. In particular, we addressed inclusivity---a core …
Date: December 2022
Creator: Salau, Abiola
System: The UNT Digital Library