Enhancing Storage Dependability and Computing Energy Efficiency for Large-Scale High Performance Computing Systems

Access: Use of this item is restricted to the UNT Community
With the advent of information explosion age, larger capacity disk drives are used to store data and powerful devices are used to process big data. As the scale and complexity of computer systems increase, we expect these systems to provide dependable and energy-efficient services and computation. Although hard drives are reliable in general, they are the most commonly replaced hardware components. Disk failures cause data corruption and even data loss, which can significantly affect system performance and financial losses. In this dissertation research, I analyze different manifestations of disk failures in production data centers and explore data mining techniques combined with statistical analysis methods to discover categories of disk failures and their distinctive properties. I use similarity measures to quantify the degradation process of each failure type and derive the degradation signature. The derived degradation signatures are further leveraged to forecast when future disk failures may happen. Meanwhile, this dissertation also studies energy efficiency of high performance computers. Specifically, I characterize the power and energy consumption of Haswell processors which are used in multiple supercomputers, and analyze the power and energy consumption of Legion, a data-centric programming model and runtime system, and Legion applications. We find that power and energy …
Date: May 2019
Creator: Huang, Song
System: The UNT Digital Library

Revealing the Positive Meaning of a Negation

Access: Use of this item is restricted to the UNT Community
Negation is a complex phenomenon present in all human languages, allowing for the uniquely human capacities of denial, contradiction, misrepresentation, lying, and irony. It is in the first place a phenomenon of semantical opposition. Sentences containing negation are generally (a) less informative than affirmative ones, (b) morphosyntactically more marked—all languages have negative markers while only a few have affirmative markers, and (c) psychologically more complex and harder to process. Negation often conveys positive meaning. This meaning ranges from implicatures to entailments. In this dissertation, I develop a system to reveal the underlying positive interpretation of negation. I first identify which words are intended to be negated (i.e, the focus of negation) and second, I rewrite those tokens to generate an actual positive interpretation. I identify the focus of negation by scoring probable foci along a continuous scale. One of the obstacles to exploring foci scoring is that no public datasets exist for this task. Thus, to study this problem I create new corpora. The corpora contain verbal, nominal and adjectival negations and their potential positive interpretations along with their scores ranging from 1 to 5. Then, I use supervised learning models for scoring the focus of negation. In order to …
Date: May 2019
Creator: Sarabi, Zahra
System: The UNT Digital Library

Parallel Analysis of Aspect-Based Sentiment Summarization from Online Big-Data

Access: Use of this item is restricted to the UNT Community
Consumer's opinions and sentiments on products can reflect the performance of products in general or in various aspects. Analyzing these data is becoming feasible, considering the availability of immense data and the power of natural language processing. However, retailers have not taken full advantage of online comments. This work is dedicated to a solution for automatically analyzing and summarizing these valuable data at both product and category levels. In this research, a system was developed to retrieve and analyze extensive data from public online resources. A parallel framework was created to make this system extensible and efficient. In this framework, a star topological network was adopted in which each computing unit was assigned to retrieve a fraction of data and to assess sentiment. Finally, the preprocessed data were collected and summarized by the central machine which generates the final result that can be rendered through a web interface. The system was designed to have sound performance, robustness, manageability, extensibility, and accuracy.
Date: May 2019
Creator: Wei, Jinliang
System: The UNT Digital Library

A Performance and Security Analysis of Elliptic Curve Cryptography Based Real-Time Media Encryption

Access: Use of this item is restricted to the UNT Community
This dissertation emphasizes the security aspects of real-time media. The problems of existing real-time media protections are identified in this research, and viable solutions are proposed. First, the security of real-time media depends on the Secure Real-time Transport Protocol (SRTP) mechanism. We identified drawbacks of the existing SRTP Systems, which use symmetric key encryption schemes, which can be exploited by attackers. Elliptic Curve Cryptography (ECC), an asymmetric key cryptography scheme, is proposed to resolve these problems. Second, the ECC encryption scheme is based on elliptic curves. This dissertation explores the weaknesses of a widely used elliptic curve in terms of security and describes a more secure elliptic curve suitable for real-time media protection. Eighteen elliptic curves had been tested in a real-time video transmission system, and fifteen elliptic curves had been tested in a real-time audio transmission system. Based on the performance, X9.62 standard 256-bit prime curve, NIST-recommended 256-bit prime curves, and Brainpool 256-bit prime curves were found to be suitable for real-time audio encryption. Again, X9.62 standard 256-bit prime and 272-bit binary curves, and NIST-recommended 256-bit prime curves were found to be suitable for real-time video encryption.The weaknesses of NIST-recommended elliptic curves are discussed and a more secure new …
Date: December 2019
Creator: Sen, Nilanjan
System: The UNT Digital Library

SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction

Access: Use of this item is restricted to the UNT Community
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Date: August 2019
Creator: Florescu, Corina Andreea
System: The UNT Digital Library

Enhanced Approach for the Classification of Ulcerative Colitis Severity in Colonoscopy Videos Using CNN

Access: Use of this item is restricted to the UNT Community
Ulcerative colitis (UC) is a chronic inflammatory disease characterized by periods of relapses and remissions affecting more than 500,000 people in the United States. To achieve the therapeutic goals of UC, which are to first induce and then maintain disease remission, doctors need to evaluate the severity of UC of a patient. However, it is very difficult to evaluate the severity of UC objectively because of non-uniform nature of symptoms and large variations in their patterns. To address this, in our previous works, we developed two different approaches in which one is using the image textures, and the other is using CNN (convolutional neural network) to measure and classify objectively the severity of UC presented in optical colonoscopy video frames. But, we found that the image texture based approach could not handle larger number of variations in their patterns, and the CNN based approach could not achieve very high accuracy. In this paper, we improve our CNN based approach in two ways to provide better accuracy for the classification. We add more thorough and essential preprocessing, and generate more classes to accommodate large variations in their patterns. The experimental results show that the proposed preprocessing can improve the overall accuracy …
Date: August 2019
Creator: Sure, Venkata Leela
System: The UNT Digital Library

Mining Biomedical Data for Hidden Relationship Discovery

Access: Use of this item is restricted to the UNT Community
With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
Date: August 2019
Creator: Dharmavaram, Sirisha
System: The UNT Digital Library

Spatial Partitioning Algorithms for Solving Location-Allocation Problems

Access: Use of this item is restricted to the UNT Community
This dissertation presents spatial partitioning algorithms to solve location-allocation problems. Location-allocations problems pertain to both the selection of facilities to serve demand at demand points and the assignment of demand points to the selected or known facilities. In the first part of this dissertation, we focus on the well known and well-researched location-allocation problem, the "p-median problem", which is a distance-based location-allocation problem that involves selection and allocation of p facilities for n demand points. We evaluate the performance of existing p-median heuristic algorithms and investigate the impact of the scale of the problem, and the spatial distribution of demand points on the performance of these algorithms. Based on the results from this comparative study, we present guidelines for location analysts to aid them in selecting the best heuristic and corresponding parameters depending on the problem at hand. Additionally, we found that existing heuristic algorithms are not suitable for solving large-scale p-median problems in a reasonable amount of time. We present a density-based decomposition methodology to solve large-scale p-median problems efficiently. This algorithm identifies dense clusters in the region and uses a MapReduce procedure to select facilities in the clustered regions independently and combine the solutions from the subproblems. Lastly, …
Date: December 2019
Creator: Gwalani, Harsha
System: The UNT Digital Library