Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery (open access)

Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery

Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) …
Date: August 2010
Creator: Samuel, Jarvie John
System: The UNT Digital Library
Information Storage and Retrieval Systems (open access)

Information Storage and Retrieval Systems

This thesis describes the implementation of a general purpose personal information storage and retrieval system. Chapter one contains an introduction to information storage and retrieval. Chapter two contains a description of the features a useful personal information retrieval system should contain. This description forms the basis for the implementation of the personal information storage and retrieval system described in chapter three. The system is implemented in UCSD Pascal on an Apple II microcomputer.
Date: May 1983
Creator: Creech, Teresa Adams
System: The UNT Digital Library
Mediation on XQuery Views (open access)

Mediation on XQuery Views

The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm …
Date: December 2006
Creator: Peng, Xiaobo
System: The UNT Digital Library

Mining Biomedical Data for Hidden Relationship Discovery

Access: Use of this item is restricted to the UNT Community
With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
Date: August 2019
Creator: Dharmavaram, Sirisha
System: The UNT Digital Library
Modeling and Simulation of the Vector-Borne Dengue Disease and the Effects of Regional Variation of Temperature in  the Disease Prevalence in Homogenous and Heterogeneous Human Populations (open access)

Modeling and Simulation of the Vector-Borne Dengue Disease and the Effects of Regional Variation of Temperature in the Disease Prevalence in Homogenous and Heterogeneous Human Populations

The history of mitigation programs to contain vector-borne diseases is a story of successes and failures. Due to the complex interplay among multiple factors that determine disease dynamics, the general principles for timely and specific intervention for incidence reduction or eradication of life-threatening diseases has yet to be determined. This research discusses computational methods developed to assist in the understanding of complex relationships affecting vector-borne disease dynamics. A computational framework to assist public health practitioners with exploring the dynamics of vector-borne diseases, such as malaria and dengue in homogenous and heterogeneous populations, has been conceived, designed, and implemented. The framework integrates a stochastic computational model of interactions to simulate horizontal disease transmission. The intent of the computational modeling has been the integration of stochasticity during simulation of the disease progression while reducing the number of necessary interactions to simulate a disease outbreak. While there are improvements in the computational time reducing the number of interactions needed for simulating disease dynamics, the realization of interactions can remain computationally expensive. Using multi-threading technology to improve performance upon the original computational model, multi-threading experimental results have been tested and reported. In addition, to the contact model, the modeling of biological processes specific to …
Date: August 2016
Creator: Bravo-Salgado, Angel D
System: The UNT Digital Library
Performance Study of Concurrent Search Trees and Hash Algorithms on Multiprocessors Systems (open access)

Performance Study of Concurrent Search Trees and Hash Algorithms on Multiprocessors Systems

This study examines the performance of concurrent algorithms for B-trees and linear hashing. B-trees are widely used as an access method for large, single key, database files, stored in lexicographic order on secondary storage devices. Linear hashing is a fast and reliable hash algorithm, suitable for accessing records stored unordered in buckets. This dissertation presents performance results on implementations of concurrent Bunk-tree and linear hashing algorithms, using lock-based, partitioned and distributed methods on the Sequent Symmetry shared memory multiprocessor system and on a network of distributed processors created with PVM (Parallel Virtual Machine) software. Initial experiments, which started with empty data structures, show good results for the partitioned implementations and lock-based linear hashing, but poor ones for lock-based Blink-trees. A subsequent test, which started with loaded data structures, shows similar results, but with much improved performances for locked Blink- trees. The data also highlighted the high cost of split operations, which reached up to 70% of the total insert time.
Date: May 1996
Creator: Demuynck, Marie-Anne
System: The UNT Digital Library
A Comparison of File Organization Techniques (open access)

A Comparison of File Organization Techniques

This thesis compares the file organization techniques that are implemented on two different types of computer systems, the large-scale and the small-scale. File organizations from representative computers in each class are examined in detail: the IBM System/370 (OS/370) and the Harris 1600 Distributed Processing System with the Extended Communications Operating System (ECOS). In order to establish the basic framework for comparison, an introduction to file organizations is presented. Additionally, the functional requirements for file organizations are described by their characteristics and user demands. Concluding remarks compare file organization techniques and discuss likely future developments of file systems.
Date: August 1977
Creator: Rogers, Roy Lee
System: The UNT Digital Library

Evaluating Stack Overflow Usability Posts in Conjunction with Usability Heuristics

This thesis explores the critical role of usability in software development and uses usability heuristics as a cost-effective and efficient method for evaluating various software functions and interfaces. With the proliferation of software development in the modern digital age, developing user-friendly interfaces that meet the needs and preferences of users has become a complex process. Usability heuristics, a set of guidelines based on principles of human-computer interaction, provide a starting point for designers to create intuitive, efficient, and easy-to-use interfaces that provide a seamless user experience. The study uses Jakob Nieson's ten usability heuristics to evaluate the usability of Stack Overflow posts, a popular Q\&A website for developers. Through the analysis of 894 posts related to usability, the study identifies common usability problems faced by users and developers, providing valuable insights into the effectiveness of usability guidelines in software development practice. The research findings emphasize the need for ongoing evaluation and improvement of software interfaces to ensure a seamless user experience. The thesis concludes by highlighting the potential of usability heuristics in guiding the design of user-friendly software interfaces and improving the overall user experience in software development.
Date: May 2023
Creator: Jalali, Hamed
System: The UNT Digital Library
Higher Compression from the Burrows-Wheeler Transform with New Algorithms for the List Update Problem (open access)

Higher Compression from the Burrows-Wheeler Transform with New Algorithms for the List Update Problem

Burrows-Wheeler compression is a three stage process in which the data is transformed with the Burrows-Wheeler Transform, then transformed with Move-To-Front, and finally encoded with an entropy coder. Move-To-Front, Transpose, and Frequency Count are some of the many algorithms used on the List Update problem. In 1985, Competitive Analysis first showed the superiority of Move-To-Front over Transpose and Frequency Count for the List Update problem with arbitrary data. Earlier studies due to Bitner assumed independent identically distributed data, and showed that while Move-To-Front adapts to a distribution faster, incurring less overwork, the asymptotic costs of Frequency Count and Transpose are less. The improvements to Burrows-Wheeler compression this work covers are increases in the amount, not speed, of compression. Best x of 2x-1 is a new family of algorithms created to improve on Move-To-Front's processing of the output of the Burrows-Wheeler Transform which is like piecewise independent identically distributed data. Other algorithms for both the middle stage of Burrows-Wheeler compression and the List Update problem for which overwork, asymptotic cost, and competitive ratios are also analyzed are several variations of Move One From Front and part of the randomized algorithm Timestamp. The Best x of 2x - 1 family includes Move-To-Front, …
Date: August 2001
Creator: Chapin, Brenton
System: The UNT Digital Library
Brain Computer Interface (BCI) Applications: Privacy Threats and Countermeasures (open access)

Brain Computer Interface (BCI) Applications: Privacy Threats and Countermeasures

In recent years, brain computer interfaces (BCIs) have gained popularity in non-medical domains such as the gaming, entertainment, personal health, and marketing industries. A growing number of companies offer various inexpensive consumer grade BCIs and some of these companies have recently introduced the concept of BCI "App stores" in order to facilitate the expansion of BCI applications and provide software development kits (SDKs) for other developers to create new applications for their devices. The BCI applications access to users' unique brainwave signals, which consequently allows them to make inferences about users' thoughts and mental processes. Since there are no specific standards that govern the development of BCI applications, its users are at the risk of privacy breaches. In this work, we perform first comprehensive analysis of BCI App stores including software development kits (SDKs), application programming interfaces (APIs), and BCI applications w.r.t privacy issues. The goal is to understand the way brainwave signals are handled by BCI applications and what threats to the privacy of users exist. Our findings show that most applications have unrestricted access to users' brainwave signals and can easily extract private information about their users without them even noticing. We discuss potential privacy threats posed by …
Date: May 2017
Creator: Bhalotiya, Anuj Arun
System: The UNT Digital Library
A Mechanism for Facilitating Temporal Reasoning in Discrete Event Simulation (open access)

A Mechanism for Facilitating Temporal Reasoning in Discrete Event Simulation

This research establishes the feasibility and potential utility of a software mechanism which employs artificial intelligence techniques to enhance the capabilities of standard discrete event simulators. As background, current methods of integrating artificial intelligence with simulation and relevant research are briefly reviewed.
Date: May 1992
Creator: Legge, Gaynor W.
System: The UNT Digital Library
Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications. (open access)

Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications.

Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in …
Date: May 2004
Creator: Rezaei, Mehran
System: The UNT Digital Library
Concurrent Pattern Recognition and Optical Character Recognition (open access)

Concurrent Pattern Recognition and Optical Character Recognition

The problem of interest as indicated is to develop a general purpose technique that is a combination of the structural approach, and an extension of the Finite Inductive Sequence (FI) technique. FI technology is pre-algebra, and deals with patterns for which an alphabet can be formulated.
Date: August 1991
Creator: An, Kyung Hee
System: The UNT Digital Library
Split array and scalar data cache: A comprehensive study of data cache organization. (open access)

Split array and scalar data cache: A comprehensive study of data cache organization.

Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application. The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates …
Date: August 2007
Creator: Naz, Afrin
System: The UNT Digital Library
A Timescale Estimating Model for Rule-Based Systems (open access)

A Timescale Estimating Model for Rule-Based Systems

The purpose of this study was to explore the subject of timescale estimating for rule-based systems. A model for estimating the timescale necessary to build rule-based systems was built and then tested in a controlled environment.
Date: December 1987
Creator: Moseley, Charles Warren
System: The UNT Digital Library
A Platform for Aligning Academic Assessments to Industry and Federal Job Postings (open access)

A Platform for Aligning Academic Assessments to Industry and Federal Job Postings

The proposed tool will provide users with a platform to access a side-by-side comparison of classroom assessment and job posting requirements. Using techniques and methodologies from NLP, machine learning, data analysis, and data mining: the employed algorithm analyzes job postings and classroom assessments, extracts and classifies skill units within, then compares sets of skills from different input volumes. This effectively provides a predicted alignment between academic and career sources, both federal and industrial. The compilation of tool results indicates an overall accuracy score of 82%, and an alignment score of only 75.5% between the input assessments and overall job postings. These results describe that the 50 UNT assessments and 5,000 industry and federal job postings examined, demonstrate a compatibility (alignment) of 75.5%; and, that this measure was calculated using a tool operating at an 82% precision rate.
Date: July 2023
Creator: Parks, Tyler J.
System: The UNT Digital Library
Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach (open access)

Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach

Many infectious diseases are spread through interactions between susceptible and infectious individuals. Keeping track of where each exposure to the disease took place, when it took place, and which individuals were involved in the exposure can give public health officials important information that they may use to formulate their interventions. Further, knowing which individuals in the population are at the highest risk of becoming infected with the disease may prove to be a useful tool for public health officials trying to curtail the spread of the disease. Epidemiological models are needed to allow epidemiologists to study the population dynamics of transmission of infectious agents and the potential impact of infectious disease control programs. While many agent-based computational epidemiological models exist in the literature, they focus on the spread of disease rather than exposure risk. These models are designed to simulate very large populations, representing individuals as agents, and using random experiments and probabilities in an attempt to more realistically guide the course of the modeled disease outbreak. The work presented in this thesis focuses on tracking exposure risk to chickenpox in an elementary school setting. This setting is chosen due to the high level of detailed information realistically available to …
Date: August 2009
Creator: O'Neill, Martin Joseph, II
System: The UNT Digital Library
Automated GUI Tests Generation for Android Apps Using Q-learning (open access)

Automated GUI Tests Generation for Android Apps Using Q-learning

Mobile applications are growing in popularity and pose new problems in the area of software testing. In particular, mobile applications heavily depend upon user interactions and a dynamically changing environment of system events. In this thesis, we focus on user-driven events and use Q-learning, a reinforcement machine learning algorithm, to generate tests for Android applications under test (AUT). We implement a framework that automates the generation of GUI test cases by using our Q-learning approach and compare it to a uniform random (UR) implementation. A novel feature of our approach is that we generate user-driven event sequences through the GUI, without the source code or the model of the AUT. Hence, considerable amount of cost and time are saved by avoiding the need for model generation for generating the tests. Our results show that the systematic path exploration used by Q-learning results in higher average code coverage in comparison to the uniform random approach.
Date: May 2017
Creator: Koppula, Sreedevi
System: The UNT Digital Library
Measuring Semantic Relatedness Using Salient Encyclopedic Concepts (open access)

Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized …
Date: August 2011
Creator: Hassan, Samer
System: The UNT Digital Library
Defensive Programming (open access)

Defensive Programming

This research explores the concepts of defensive programming as currently defined in the literature. Then these concepts are extended and more explicitly defined. The relationship between defensive programming, as presented in this research, and current programming practices is discussed and several benefits are observed. Defensive programming appears to benefit the entire software life cycle. Four identifiable phases of the software development process are defined, and the relationship between these four phases and defensive programming is shown. In this research, defensive programming is defined as writing programs in such a way that during execution the program itself produces communication allowing the programmer and the user to observe its dynamic states accurately and critically. To accomplish this end, the use of defensive programming snap shots is presented as a software development tool.
Date: May 1980
Creator: Bailey, L. Mark
System: The UNT Digital Library
Quality-of-Service Provisioning and Resource Reservation Mechanisms for Mobile Wireless Networks (open access)

Quality-of-Service Provisioning and Resource Reservation Mechanisms for Mobile Wireless Networks

In this thesis, a framework for Quality of Service provisioning in next generation wireless access networks is proposed. The framework aims at providing a differentiated service treatment to real-time (delay-sensitive) and non-real-time (delay-tolerant) multimedia traffic flows at the link layer. Novel techniques such as bandwidth compaction, channel reservation, and channel degradation are proposed. Using these techniques, we develop a call admission control algorithm and a call control block as part of the QoS framework. The performance of the framework is captured through analytical modeling and simulation experiments. By analytical modeling, the average carried traffic and the worst case buffer requirements for real-time and non-real-time calls are estimated. Simulation results show a 21% improvement in call admission probability of real-time calls, and a 17% improvement for non-real-time calls, when bandwidth compaction is employed. The channel reservation technique shows a 12% improvement in call admission probability in comparison with another proposed scheme in the literature.
Date: August 1998
Creator: Jayaram, Rajeev, 1971-
System: The UNT Digital Library

A Netcentric Scientific Research Repository

Access: Use of this item is restricted to the UNT Community
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them …
Date: December 2006
Creator: Harrington, Brian
System: The UNT Digital Library
XML-Based Agent Scripts and Inference Mechanisms (open access)

XML-Based Agent Scripts and Inference Mechanisms

Natural language understanding has been a persistent challenge to researchers in various computer science fields, in a number of applications ranging from user support systems to entertainment and online teaching. A long term goal of the Artificial Intelligence field is to implement mechanisms that enable computers to emulate human dialogue. The recently developed ALICEbots, virtual agents with underlying AIML scripts, by A.L.I.C.E. foundation, use AIML scripts - a subset of XML - as the underlying pattern database for question answering. Their goal is to enable pattern-based, stimulus-response knowledge content to be served, received and processed over the Web, or offline, in the manner similar to HTML and XML. In this thesis, we describe a system that converts the AIML scripts to Prolog clauses and reuses them as part of a knowledge processor. The inference mechanism developed in this thesis is able to successfully match the input pattern with our clauses database even if words are missing. We also emulate the pattern deduction algorithm of the original logic deduction mechanism. Our rules, compatible with Semantic Web standards, bring structure to the meaningful content of Web pages and support interactive content retrieval using natural language.
Date: August 2003
Creator: Sun, Guili
System: The UNT Digital Library
Quantifying Design Principles in Reusable Software Components (open access)

Quantifying Design Principles in Reusable Software Components

Software reuse can occur in various places during the software development cycle. Reuse of existing source code is the most commonly practiced form of software reuse. One of the key requirements for software reuse is readability, thus the interest in the use of data abstraction, inheritance, modularity, and aspects of the visible portion of module specifications. This research analyzed the contents of software reuse libraries to answer the basic question of what makes a good reusable software component. The approach taken was to measure and analyze various software metrics as mapped to design characteristics. A related research question investigated the change in the design principles over time. This was measured by comparing sets of Ada reuse libraries categorized into two time periods. It was discovered that recently developed Ada reuse components scored better on readability than earlier developed components. A benefit of this research has been the development of a set of "design for reuse" guidelines. These guidelines address coding practices as well as design principles for an Ada implementation. C++ software reuse libraries were also analyzed to determine if design principles can be applied in a language independent fashion. This research used cyclomatic complexity metrics, software science metrics, and …
Date: December 1995
Creator: Moore, Freeman Leroy
System: The UNT Digital Library