Boosting for Learning From Imbalanced, Multiclass Data Sets (open access)

Boosting for Learning From Imbalanced, Multiclass Data Sets

In many real-world applications, it is common to have uneven number of examples among multiple classes. The data imbalance, however, usually complicates the learning process, especially for the minority classes, and results in deteriorated performance. Boosting methods were proposed to handle the imbalance problem. These methods need elongated training time and require diversity among the classifiers of the ensemble to achieve improved performance. Additionally, extending the boosting method to handle multi-class data sets is not straightforward. Examples of applications that suffer from imbalanced multi-class data can be found in face recognition, where tens of classes exist, and in capsule endoscopy, which suffers massive imbalance between the classes. This dissertation introduces RegBoost, a new boosting framework to address the imbalanced, multi-class problems. This method applies a weighted stratified sampling technique and incorporates a regularization term that accommodates multi-class data sets and automatically determines the error bound of each base classifier. The regularization parameter penalizes the classifier when it misclassifies instances that were correctly classified in the previous iteration. The parameter additionally reduces the bias towards majority classes. Experiments are conducted using 12 diverse data sets with moderate to high imbalance ratios. The results demonstrate superior performance of the proposed method compared …
Date: December 2013
Creator: Abouelenien, Mohamed
System: The UNT Digital Library
Online Construction of Android Application Test Suites (open access)

Online Construction of Android Application Test Suites

Mobile applications play an important role in the dissemination of computing and information resources. They are often used in domains such as mobile banking, e-commerce, and health monitoring. Cost-effective testing techniques in these domains are critical. This dissertation contributes novel techniques for automatic construction of mobile application test suites. In particular, this work provides solutions that focus on the prohibitively large number of possible event sequences that must be sampled in GUI-based mobile applications. This work makes three major contributions: (1) an automated GUI testing tool, Autodroid, that implements a novel online approach to automatic construction of Android application test suites (2) probabilistic and combinatorial-based algorithms that systematically sample the input space of Android applications to generate test suites with GUI/context events and (3) empirical studies to evaluate the cost-effectiveness of our techniques on real-world Android applications. Our experiments show that our techniques achieve better code coverage and event coverage compared to random test generation. We demonstrate that our techniques are useful for automatic construction of Android application test suites in the absence of source code and preexisting abstract models of an Application Under Test (AUT). The insights derived from our empirical studies provide guidance to researchers and practitioners involved …
Date: December 2017
Creator: Adamo, David T., Jr.
System: The UNT Digital Library
3GPP Long Term Evolution LTE Scheduling (open access)

3GPP Long Term Evolution LTE Scheduling

Future generation cellular networks are expected to deliver an omnipresent broadband access network for an endlessly increasing number of subscribers. Long term Evolution (LTE) represents a significant milestone towards wireless networks known as 4G cellular networks. A key feature of LTE is the implementation of enhanced Radio Resource Management (RRM) mechanism to improve the system performance. The structure of LTE networks was simplified by diminishing the number of the nodes of the core network. Also, the design of the radio protocol architecture is quite unique. In order to achieve high data rate in LTE, 3rd Generation Partnership Project (3GPP) has selected Orthogonal Frequency Division Multiplexing (OFDM) as an appropriate scheme in terms of downlinks. However, the proper scheme for an uplink is the Single-Carrier Frequency Domain Multiple Access due to the peak-to-average-power-ratio (PAPR) constraint. LTE packet scheduling plays a primary role as part of RRM to improve the system’s data rate as well as supporting various QoS requirements of mobile services. The major function of the LTE packet scheduler is to assign Physical Resource Blocks (PRBs) to mobile User Equipment (UE). In our work, we formed a proposed packet scheduler algorithm. The proposed scheduler algorithm acts based on the number …
Date: December 2013
Creator: Alotaibi, Sultan
System: The UNT Digital Library
Towards a Unilateral Sensing System for Detecting Person-to-Person Contacts (open access)

Towards a Unilateral Sensing System for Detecting Person-to-Person Contacts

The contact patterns among individuals can significantly affect the progress of an infectious outbreak within a population. Gathering data about these interaction and mixing patterns is essential to assess computational modeling of infectious diseases. Various self-report approaches have been designed in different studies to collect data about contact rates and patterns. Recent advances in sensing technology provide researchers with a bilateral automated data collection devices to facilitate contact gathering overcoming the disadvantages of previous approaches. In this study, a novel unilateral wearable sensing architecture has been proposed that overcome the limitations of the bi-lateral sensing. Our unilateral wearable sensing system gather contact data using hybrid sensor arrays embedded in wearable shirt. A smartphone application has been used to transfer the collected sensors data to the cloud and apply deep learning model to estimate the number of human contacts and the results are stored in the cloud database. The deep learning model has been developed on the hand labelled data over multiple experiments. This model has been tested and evaluated, and these results were reported in the study. Sensitivity analysis has been performed to choose the most suitable image resolution and format for the model to estimate contacts and to analyze …
Date: December 2018
Creator: Amara, Pavan Kumar
System: The UNT Digital Library
Statistical Strategies for Efficient Signal Detection and Parameter Estimation in Wireless Sensor Networks (open access)

Statistical Strategies for Efficient Signal Detection and Parameter Estimation in Wireless Sensor Networks

This dissertation investigates data reduction strategies from a signal processing perspective in centralized detection and estimation applications. First, it considers a deterministic source observed by a network of sensors and develops an analytical strategy for ranking sensor transmissions based on the magnitude of their test statistics. The benefit of the proposed strategy is that the decision to transmit or not to transmit observations to the fusion center can be made at the sensor level resulting in significant savings in transmission costs. A sensor network based on target tracking application is simulated to demonstrate the benefits of the proposed strategy over the unconstrained energy approach. Second, it considers the detection of random signals in noisy measurements and evaluates the performance of eigenvalue-based signal detectors. Due to their computational simplicity, robustness and performance, these detectors have recently received a lot of attention. When the observed random signal is correlated, several researchers claim that the performance of eigenvalue-based detectors exceeds that of the classical energy detector. However, such claims fail to consider the fact that when the signal is correlated, the optimal detector is the estimator-correlator and not the energy detector. In this dissertation, through theoretical analyses and Monte Carlo simulations, eigenvalue-based detectors …
Date: December 2013
Creator: Ayeh, Eric
System: The UNT Digital Library
Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications (open access)

Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications

Recent advancements in machine learning have started to put their mark on educational technology. Technology is evolving fast and, as people adopt it, schools and universities must also keep up (nearly 70% of primary and secondary schools in the UK are now using tablets for various purposes). As these numbers are likely going to follow the same increasing trend, it is imperative for schools to adapt and benefit from the advantages offered by technology: real-time processing of data, availability of different resources through connectivity, efficiency, and many others. To this end, this work contributes to the growth of educational technology by developing several algorithms and models that are meant to ease several tasks for the instructors, engage students in deep discussions and ultimately, increase their learning gains. First, a novel, fine-grained knowledge representation is introduced that splits phrases into their constituent propositions that are both meaningful and minimal. An automated extraction algorithm of the propositions is also introduced. Compared with other fine-grained representations, the extraction model does not require any human labor after it is trained, while the results show considerable improvement over two meaningful baselines. Second, a proposition alignment model is created that relies on even finer-grained units of …
Date: December 2018
Creator: Bulgarov, Florin Adrian
System: The UNT Digital Library
A New Look at Retargetable Compilers (open access)

A New Look at Retargetable Compilers

Consumers demand new and innovative personal computing devices every 2 years when their cellular phone service contracts are renewed. Yet, a 2 year development cycle for the concurrent development of both hardware and software is nearly impossible. As more components and features are added to the devices, maintaining this 2 year cycle with current tools will become commensurately harder. This dissertation delves into the feasibility of simplifying the development of such systems by employing heterogeneous systems on a chip in conjunction with a retargetable compiler such as the hybrid computer retargetable compiler (Hy-C). An example of a simple architecture description of sufficient detail for use with a retargetable compiler like Hy-C is provided. As a software engineer with 30 years of experience, I have witnessed numerous system failures. A plethora of software development paradigms and tools have been employed to prevent software errors, but none have been completely successful. Much discussion centers on software development in the military contracting market, as that is my background. The dissertation reviews those tools, as well as some existing retargetable compilers, in an attempt to determine how those errors occurred and how a system like Hy-C could assist in reducing future software errors. In …
Date: December 2014
Creator: Burke, Patrick William
System: The UNT Digital Library
Investigating the Extractive Summarization of Literary Novels (open access)

Investigating the Extractive Summarization of Literary Novels

Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems …
Date: December 2011
Creator: Ceylan, Hakan
System: The UNT Digital Library
Measuring Vital Signs Using Smart Phones (open access)

Measuring Vital Signs Using Smart Phones

Smart phones today have become increasingly popular with the general public for its diverse abilities like navigation, social networking, and multimedia facilities to name a few. These phones are equipped with high end processors, high resolution cameras, built-in sensors like accelerometer, orientation-sensor, light-sensor, and much more. According to comScore survey, 25.3% of US adults use smart phones in their daily lives. Motivated by the capability of smart phones and their extensive usage, I focused on utilizing them for bio-medical applications. In this thesis, I present a new application for a smart phone to quantify the vital signs such as heart rate, respiratory rate and blood pressure with the help of its built-in sensors. Using the camera and a microphone, I have shown how the blood pressure and heart rate can be determined for a subject. People sometimes encounter minor situations like fainting or fatal accidents like car crash at unexpected times and places. It would be useful to have a device which can measure all vital signs in such an event. The second part of this thesis demonstrates a new mode of communication for next generation 9-1-1 calls. In this new architecture, the call-taker will be able to control the …
Date: December 2010
Creator: Chandrasekaran, Vikram
System: The UNT Digital Library
Detection of Ulcerative Colitis Severity and Enhancement of Informative Frame Filtering Using Texture Analysis in Colonoscopy Videos (open access)

Detection of Ulcerative Colitis Severity and Enhancement of Informative Frame Filtering Using Texture Analysis in Colonoscopy Videos

There are several types of disorders that affect our colon’s ability to function properly such as colorectal cancer, ulcerative colitis, diverticulitis, irritable bowel syndrome and colonic polyps. Automatic detection of these diseases would inform the endoscopist of possible sub-optimal inspection during the colonoscopy procedure as well as save time during post-procedure evaluation. But existing systems only detects few of those disorders like colonic polyps. In this dissertation, we address the automatic detection of another important disorder called ulcerative colitis. We propose a novel texture feature extraction technique to detect the severity of ulcerative colitis in block, image, and video levels. We also enhance the current informative frame filtering methods by detecting water and bubble frames using our proposed technique. Our feature extraction algorithm based on accumulation of pixel value difference provides better accuracy at faster speed than the existing methods making it highly suitable for real-time systems. We also propose a hybrid approach in which our feature method is combined with existing feature method(s) to provide even better accuracy. We extend the block and image level detection method to video level severity score calculation and shot segmentation. Also, the proposed novel feature extraction method can detect water and bubble frames …
Date: December 2015
Creator: Dahal, Ashok
System: The UNT Digital Library
Graph-Based Keyphrase Extraction Using Wikipedia (open access)

Graph-Based Keyphrase Extraction Using Wikipedia

Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.
Date: December 2010
Creator: Dandala, Bharath
System: The UNT Digital Library
Modeling and Analysis of Intentional And Unintentional Security Vulnerabilities in a Mobile Platform (open access)

Modeling and Analysis of Intentional And Unintentional Security Vulnerabilities in a Mobile Platform

Mobile phones are one of the essential parts of modern life. Making a phone call is not the main purpose of a smart phone anymore, but merely one of many other features. Online social networking, chatting, short messaging, web browsing, navigating, and photography are some of the other features users enjoy in modern smartphones, most of which are provided by mobile apps. However, with this advancement, many security vulnerabilities have opened up in these devices. Malicious apps are a major threat for modern smartphones. According to Symantec Corp., by the middle of 2013, about 273,000 Android malware apps were identified. It is a complex issue to protect everyday users of mobile devices from the attacks of technologically competent hackers, illegitimate users, trolls, and eavesdroppers. This dissertation emphasizes the concept of intention identification. Then it looks into ways to utilize this intention identification concept to enforce security in a mobile phone platform. For instance, a battery monitoring app requiring SMS permissions indicates suspicious intention as battery monitoring usually does not need SMS permissions. Intention could be either the user's intention or the intention of an app. These intentions can be identified using their behavior or by using their source code. Regardless …
Date: December 2014
Creator: Fazeen, Mohamed & Issadeen, Mohamed
System: The UNT Digital Library
Improving Software Quality through Syntax and Semantics Verification of Requirements Models (open access)

Improving Software Quality through Syntax and Semantics Verification of Requirements Models

Software defects can frequently be traced to poorly-specified requirements. Many software teams manage their requirements using tools such as checklists and databases, which lack a formal semantic mapping to system behavior. Such a mapping can be especially helpful for safety-critical systems. Another limitation of many requirements analysis methods is that much of the analysis must still be done manually. We propose techniques that automate portions of the requirements analysis process, as well as clarify the syntax and semantics of requirements models using a variety of methods, including machine learning tools and our own tool, VeriCCM. The machine learning tools used help us identify potential model elements and verify their correctness. VeriCCM, a formalized extension of the causal component model (CCM), uses formal methods to ensure that requirements are well-formed, as well as providing the beginnings of a full formal semantics. We also explore the use of statecharts to identify potential abnormal behaviors from a given set of requirements. At each stage, we perform empirical studies to evaluate the effectiveness of our proposed approaches.
Date: December 2018
Creator: Gaither, Danielle
System: The UNT Digital Library
Simulating the Spread of Infectious Diseases in Heterogeneous Populations with Diverse Interactions Characteristics (open access)

Simulating the Spread of Infectious Diseases in Heterogeneous Populations with Diverse Interactions Characteristics

The spread of infectious diseases has been a public concern throughout human history. Historic recorded data has reported the severity of infectious disease epidemics in different ages. Ancient Greek physician Hippocrates was the first to analyze the correlation between diseases and their environment. Nowadays, health authorities are in charge of planning strategies that guarantee the welfare of citizens. The simulation of contagion scenarios contributes to the understanding of the epidemic behavior of diseases. Computational models facilitate the study of epidemics by integrating disease and population data to the simulation. The use of detailed demographic and geographic characteristics allows researchers to construct complex models that better resemble reality and the integration of these attributes permits us to understand the rules of interaction. The interaction of individuals with similar characteristics forms synthetic structures that depict clusters of interaction. The synthetic environments facilitate the study of the spread of infectious diseases in diverse scenarios. The characteristics of the population and the disease concurrently affect the local and global epidemic progression. Every cluster’ epidemic behavior constitutes the global epidemic for a clustered population. By understanding the correlation between structured populations and the spread of a disease, current dissertation research makes possible to identify risk …
Date: December 2013
Creator: Gomez-Lopez, Iris Nelly
System: The UNT Digital Library

Spatial Partitioning Algorithms for Solving Location-Allocation Problems

Access: Use of this item is restricted to the UNT Community
This dissertation presents spatial partitioning algorithms to solve location-allocation problems. Location-allocations problems pertain to both the selection of facilities to serve demand at demand points and the assignment of demand points to the selected or known facilities. In the first part of this dissertation, we focus on the well known and well-researched location-allocation problem, the "p-median problem", which is a distance-based location-allocation problem that involves selection and allocation of p facilities for n demand points. We evaluate the performance of existing p-median heuristic algorithms and investigate the impact of the scale of the problem, and the spatial distribution of demand points on the performance of these algorithms. Based on the results from this comparative study, we present guidelines for location analysts to aid them in selecting the best heuristic and corresponding parameters depending on the problem at hand. Additionally, we found that existing heuristic algorithms are not suitable for solving large-scale p-median problems in a reasonable amount of time. We present a density-based decomposition methodology to solve large-scale p-median problems efficiently. This algorithm identifies dense clusters in the region and uses a MapReduce procedure to select facilities in the clustered regions independently and combine the solutions from the subproblems. Lastly, …
Date: December 2019
Creator: Gwalani, Harsha
System: The UNT Digital Library
The Influence of Social Network Graph Structure on Disease Dynamics in a Simulated Environment (open access)

The Influence of Social Network Graph Structure on Disease Dynamics in a Simulated Environment

The fight against epidemics/pandemics is one of man versus nature. Technological advances have not only improved existing methods for monitoring and controlling disease outbreaks, but have also provided new means for investigation, such as through modeling and simulation. This dissertation explores the relationship between social structure and disease dynamics. Social structures are modeled as graphs, and outbreaks are simulated based on a well-recognized standard, the susceptible-infectious-removed (SIR) paradigm. Two independent, but related, studies are presented. The first involves measuring the severity of outbreaks as social network parameters are altered. The second study investigates the efficacy of various vaccination policies based on social structure. Three disease-related centrality measures are introduced, contact, transmission, and spread centrality, which are related to previously established centrality measures degree, betweenness, and closeness, respectively. The results of experiments presented in this dissertation indicate that reducing the neighborhood size along with outside-of-neighborhood contacts diminishes the severity of disease outbreaks. Vaccination strategies can effectively reduce these parameters. Additionally, vaccination policies that target individuals with high centrality are generally shown to be slightly more effective than a random vaccination policy. These results combined with past and future studies will assist public health officials in their effort to minimize the effects …
Date: December 2010
Creator: Johnson, Tina V.
System: The UNT Digital Library
Analysis and Performance of a Cyber-Human System and Protocols for Geographically Separated Collaborators (open access)

Analysis and Performance of a Cyber-Human System and Protocols for Geographically Separated Collaborators

This dissertation provides an innovative mechanism to collaborate two geographically separated people on a physical task and a novel method to measure Complexity Index (CI) and calculate Minimal Complexity Index (MCI) of a collaboration protocol. The protocol is represented as a structure, and the information content of it is measured in bits to understand the complex nature of the protocol. Using the complexity metrics, one can analyze the performance of a collaborative system and a collaboration protocol. Security and privacy of the consumers are vital while seeking remote help; this dissertation also provides a novel authorization framework for dynamic access control of resources on an input-constrained appliance used for completing the physical task. Using the innovative Collaborative Appliance for REmote-help (CARE) and with the support of a remotely located expert, fifty-nine subjects with minimal or no prior mechanical knowledge are able to elevate a car for replacing a tire in an average time of six minutes and 53 seconds and with an average protocol complexity of 171.6 bits. Moreover, thirty subjects with minimal or no prior plumbing knowledge are able to change the cartridge of a faucet in an average time of ten minutes and with an average protocol complexity …
Date: December 2017
Creator: Jonnada, Srikanth
System: The UNT Digital Library
Ontology Based Security Threat Assessment and Mitigation for Cloud Systems (open access)

Ontology Based Security Threat Assessment and Mitigation for Cloud Systems

A malicious actor often relies on security vulnerabilities of IT systems to launch a cyber attack. Most cloud services are supported by an orchestration of large and complex systems which are prone to vulnerabilities, making threat assessment very challenging. In this research, I developed formal and practical ontology-based techniques that enable automated evaluation of a cloud system's security threats. I use an architecture for threat assessment of cloud systems that leverages a dynamically generated ontology knowledge base. I created an ontology model and represented the components of a cloud system. These ontologies are designed for a set of domains that covers some cloud's aspects and information technology products' cyber threat data. The inputs to our architecture are the configurations of cloud assets and components specification (which encompass the desired assessment procedures) and the outputs are actionable threat assessment results. The focus of this work is on ways of enumerating, assessing, and mitigating emerging cyber security threats. A research toolkit system has been developed to evaluate our architecture. We expect our techniques to be leveraged by any cloud provider or consumer in closing the gap of identifying and remediating known or impending security threats facing their cloud's assets.
Date: December 2018
Creator: Kamongi, Patrick
System: The UNT Digital Library
Influence of Underlying Random Walk Types in Population Models on Resulting Social Network Types and Epidemiological Dynamics (open access)

Influence of Underlying Random Walk Types in Population Models on Resulting Social Network Types and Epidemiological Dynamics

Epidemiologists rely on human interaction networks for determining states and dynamics of disease propagations in populations. However, such networks are empirical snapshots of the past. It will greatly benefit if human interaction networks are statistically predicted and dynamically created while an epidemic is in progress. We develop an application framework for the generation of human interaction networks and running epidemiological processes utilizing research on human mobility patterns and agent-based modeling. The interaction networks are dynamically constructed by incorporating different types of Random Walks and human rules of engagements. We explore the characteristics of the created network and compare them with the known theoretical and empirical graphs. The dependencies of epidemic dynamics and their outcomes on patterns and parameters of human motion and motives are encountered and presented through this research. This work specifically describes how the types and parameters of random walks define properties of generated graphs. We show that some configurations of the system of agents in random walk can produce network topologies with properties similar to small-world networks. Our goal is to find sets of mobility patterns that lead to empirical-like networks. The possibility of phase transitions in the graphs due to changes in the parameterization of agent …
Date: December 2016
Creator: Kolgushev, Oleg
System: The UNT Digital Library
Privacy Preserving EEG-based Authentication Using Perceptual Hashing (open access)

Privacy Preserving EEG-based Authentication Using Perceptual Hashing

The use of electroencephalogram (EEG), an electrophysiological monitoring method for recording the brain activity, for authentication has attracted the interest of researchers for over a decade. In addition to exhibiting qualities of biometric-based authentication, they are revocable, impossible to mimic, and resistant to coercion attacks. However, EEG signals carry a wealth of information about an individual and can reveal private information about the user. This brings significant privacy issues to EEG-based authentication systems as they have access to raw EEG signals. This thesis proposes a privacy-preserving EEG-based authentication system that preserves the privacy of the user by not revealing the raw EEG signals while allowing the system to authenticate the user accurately. In that, perceptual hashing is utilized and instead of raw EEG signals, their perceptually hashed values are used in the authentication process. In addition to describing the authentication process, algorithms to compute the perceptual hash are developed based on two feature extraction techniques. Experimental results show that an authentication system using perceptual hashing can achieve performance comparable to a system that has access to raw EEG signals if enough EEG channels are used in the process. This thesis also presents a security analysis to show that perceptual hashing …
Date: December 2016
Creator: Koppikar, Samir Dilip
System: The UNT Digital Library
Modeling Synergistic Relationships Between Words and Images (open access)

Modeling Synergistic Relationships Between Words and Images

Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept. By uncovering synergistic, semantic relationships that exist between words and images, I am working to develop novel techniques that can help improve tasks in natural language processing, as well as effective models for text-to-image synthesis, image retrieval, and automatic image annotation. Specifically, in my dissertation, I will explore the interoperability of features between language and vision tasks. In the first part, I will show how it is possible to apply features generated using evidence gathered from text corpora to solve the image annotation problem in computer vision, without the use of any visual information. In the second part, I will address research in the reverse direction, and show how visual cues can be used to improve tasks in natural language processing. Importantly, I propose a novel metric to estimate the similarity of words by comparing the visual similarity of concepts invoked by these words, and show that it can be used further to advance the state-of-the-art methods that employ corpus-based and knowledge-based semantic similarity measures. Finally, I attempt to construct a joint semantic space connecting words with images, and synthesize an evaluation framework to quantify cross-modal …
Date: December 2012
Creator: Leong, Chee Wee
System: The UNT Digital Library
Source and Channel Coding Strategies for Wireless Sensor Networks (open access)

Source and Channel Coding Strategies for Wireless Sensor Networks

In this dissertation, I focus on source coding techniques as well as channel coding techniques. I addressed the challenges in WSN by developing (1) a new source coding strategy for erasure channels that has better distortion performance compared to MDC; (2) a new cooperative channel coding strategy for multiple access channels that has better channel outage performances compared to MIMO; (3) a new source-channel cooperation strategy to accomplish source-to-fusion center communication that reduces system distortion and improves outage performance. First, I draw a parallel between the 2x2 MDC scheme and the Alamouti's space time block coding (STBC) scheme and observe the commonality in their mathematical models. This commonality allows us to observe the duality between the two diversity techniques. Making use of this duality, I develop an MDC scheme with pairwise complex correlating transform. Theoretically, I show that MDC scheme results in: 1) complete elimination of the estimation error when only one descriptor is received; 2) greater efficiency in recovering the stronger descriptor (with larger variance) from the weaker descriptor; and 3) improved performance in terms of minimized distortion as the quantization error gets reduced. Experiments are also performed on real images to demonstrate these benefits. Second, I present a …
Date: December 2012
Creator: Li, Li
System: The UNT Digital Library
Event Sequence Identification and Deep Learning Classification for Anomaly Detection and Predication on High-Performance Computing Systems (open access)

Event Sequence Identification and Deep Learning Classification for Anomaly Detection and Predication on High-Performance Computing Systems

High-performance computing (HPC) systems continue growing in both scale and complexity. These large-scale, heterogeneous systems generate tens of millions of log messages every day. Effective log analysis for understanding system behaviors and identifying system anomalies and failures is highly challenging. Existing log analysis approaches use line-by-line message processing. They are not effective for discovering subtle behavior patterns and their transitions, and thus may overlook some critical anomalies. In this dissertation research, I propose a system log event block detection (SLEBD) method which can extract the log messages that belong to a component or system event into an event block (EB) accurately and automatically. At the event level, we can discover new event patterns, the evolution of system behavior, and the interaction among different system components. To find critical event sequences, existing sequence mining methods are mostly based on the a priori algorithm which is compute-intensive and runs for a long time. I develop a novel, topology-aware sequence mining (TSM) algorithm which is efficient to generate sequence patterns from the extracted event block lists. I also train a long short-term memory (LSTM) model to cluster sequences before specific events. With the generated sequence pattern and trained LSTM model, we can predict …
Date: December 2019
Creator: Li, Zongze
System: The UNT Digital Library
Location Estimation and Geo-Correlated Information Trends (open access)

Location Estimation and Geo-Correlated Information Trends

A tremendous amount of information is being shared every day on social media sites such as Facebook, Twitter or Google+. However, only a small portion of users provide their location information, which can be helpful in targeted advertising and many other services. Current methods in location estimation using social relationships consider social friendship as a simple binary relationship. However, social closeness between users and structure of friends have strong implications on geographic distances. In the first task, we introduce new measures to evaluate the social closeness between users and structure of friends. Then we propose models that use them for location estimation. Compared with the models which take the friend relation as a binary feature, social closeness can help identify which friend of a user is more important and friend structure can help to determine significance level of locations, thus improving the accuracy of the location estimation models. A confidence iteration method is further introduced to improve estimation accuracy and overcome the problem of scarce location information. We evaluate our methods on two different datasets, Twitter and Gowalla. The results show that our model can improve the estimation accuracy by 5% - 20% compared with state-of-the-art friend-based models. In the …
Date: December 2017
Creator: Liu, Zhi
System: The UNT Digital Library