Degree Discipline

Language

An Approach Towards Self-Supervised Classification Using Cyc (open access)

An Approach Towards Self-Supervised Classification Using Cyc

Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.
Date: December 2006
Creator: Coursey, Kino High
System: The UNT Digital Library
CLUE: A Cluster Evaluation Tool (open access)

CLUE: A Cluster Evaluation Tool

Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
Date: December 2006
Creator: Parker, Brandon S.
System: The UNT Digital Library
Grid-based Coordinated Routing in Wireless Sensor Networks (open access)

Grid-based Coordinated Routing in Wireless Sensor Networks

Wireless sensor networks are battery-powered ad-hoc networks in which sensor nodes that are scattered over a region connect to each other and form multi-hop networks. These nodes are equipped with sensors such as temperature sensors, pressure sensors, and light sensors and can be queried to get the corresponding values for analysis. However, since they are battery operated, care has to be taken so that these nodes use energy efficiently. One of the areas in sensor networks where an energy analysis can be done is routing. This work explores grid-based coordinated routing in wireless sensor networks and compares the energy available in the network over time for different grid sizes.
Date: December 2006
Creator: Sawant, Uttara
System: The UNT Digital Library

A Language and Visual Interface to Specify Complex Spatial Pattern Mining

Access: Use of this item is restricted to the UNT Community
The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Date: December 2006
Creator: Li, Xiaohui
System: The UNT Digital Library
Mediation on XQuery Views (open access)

Mediation on XQuery Views

The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm …
Date: December 2006
Creator: Peng, Xiaobo
System: The UNT Digital Library

A Multi-Variate Analysis of SMTP Paths and Relays to Restrict Spam and Phishing Attacks in Emails

Access: Use of this item is restricted to the UNT Community
The classifier discussed in this thesis considers the path traversed by an email (instead of its content) and reputation of the relays, features inaccessible to spammers. Groups of spammers and individual behaviors of a spammer in a given domain were analyzed to yield association patterns, which were then used to identify similar spammers. Unsolicited and phishing emails were successfully isolated from legitimate emails, using analysis results. Spammers and phishers are also categorized into serial spammers/phishers, recent spammers/phishers, prospective spammers/phishers, and suspects. Legitimate emails and trusted domains are classified into socially close (family members, friends), socially distinct (strangers etc), and opt-outs (resolved false positives and false negatives). Overall this classifier resulted in far less false positives when compared to current filters like SpamAssassin, achieving a 98.65% precision, which is well comparable to the precisions achieved by SPF, DNSRBL blacklists.
Date: December 2006
Creator: Palla, Srikanth
System: The UNT Digital Library
Natural Language Interfaces to Databases (open access)

Natural Language Interfaces to Databases

Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field.
Date: December 2006
Creator: Chandra, Yohan
System: The UNT Digital Library

A Netcentric Scientific Research Repository

Access: Use of this item is restricted to the UNT Community
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them …
Date: December 2006
Creator: Harrington, Brian
System: The UNT Digital Library
Power-benefit analysis of erasure encoding with redundant routing in sensor networks. (open access)

Power-benefit analysis of erasure encoding with redundant routing in sensor networks.

One of the problems sensor networks face is adversaries corrupting nodes along the path to the base station. One way to reduce the effect of these attacks is multipath routing. This introduces some intrusion-tolerance in the network by way of redundancy but at the cost of a higher power consumption by the sensor nodes. Erasure coding can be applied to this scenario in which the base station can receive a subset of the total data sent and reconstruct the entire message packet at its end. This thesis uses two commonly used encodings and compares their performance with respect to power consumed for unencoded data in multipath routing. It is found that using encoding with multipath routing reduces the power consumption and at the same time enables the user to send reasonably large data sizes. The experiments in this thesis were performed on the Tiny OS platform with the simulations done in TOSSIM and the power measurements were taken in PowerTOSSIM. They were performed on the simple radio model and the lossy radio model provided by Tiny OS. The lossy radio model was simulated with distances of 10 feet, 15 feet and 20 feet between nodes. It was found that by …
Date: December 2006
Creator: Vishwanathan, Roopa
System: The UNT Digital Library