312 Matching Results

Results open in a new window/tab.

Comparison and Evaluation of Existing Analog Circuit Simulator using Sigma-Delta Modulator

Access: Use of this item is restricted to the UNT Community
In the world of VLSI (very large scale integration) technology, there are many different types of circuit simulators that are used to design and predict the circuit behavior before actual fabrication of the circuit. In this thesis, I compared and evaluated existing circuit simulators by considering standard benchmark circuits. The circuit simulators which I evaluated and explored are Ngspice, Tclspice, Winspice (open source) and Spectre® (commercial). I also tested standard benchmarks using these circuit simulators and compared their outputs. The simulators are evaluated using design metrics in order to quantify their performance and identify efficient circuit simulators. In addition, I designed a sigma-delta modulator and its individual components using the analog behavioral language Verilog-A. Initially, I performed simulations of individual components of the sigma-delta modulator and later of the whole system. Finally, CMOS (complementary metal-oxide semiconductor) transistor-level circuits were designed for the differential amplifier, operational amplifier and comparator of the modulator.
Date: December 2006
Creator: Ale, Anil Kumar
Object Type: Thesis or Dissertation
System: The UNT Digital Library

A Language and Visual Interface to Specify Complex Spatial Pattern Mining

Access: Use of this item is restricted to the UNT Community
The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Date: December 2006
Creator: Li, Xiaohui
Object Type: Thesis or Dissertation
System: The UNT Digital Library

A Multi-Variate Analysis of SMTP Paths and Relays to Restrict Spam and Phishing Attacks in Emails

Access: Use of this item is restricted to the UNT Community
The classifier discussed in this thesis considers the path traversed by an email (instead of its content) and reputation of the relays, features inaccessible to spammers. Groups of spammers and individual behaviors of a spammer in a given domain were analyzed to yield association patterns, which were then used to identify similar spammers. Unsolicited and phishing emails were successfully isolated from legitimate emails, using analysis results. Spammers and phishers are also categorized into serial spammers/phishers, recent spammers/phishers, prospective spammers/phishers, and suspects. Legitimate emails and trusted domains are classified into socially close (family members, friends), socially distinct (strangers etc), and opt-outs (resolved false positives and false negatives). Overall this classifier resulted in far less false positives when compared to current filters like SpamAssassin, achieving a 98.65% precision, which is well comparable to the precisions achieved by SPF, DNSRBL blacklists.
Date: December 2006
Creator: Palla, Srikanth
Object Type: Thesis or Dissertation
System: The UNT Digital Library

Design and Optimization of Components in a 45nm CMOS Phase Locked Loop

Access: Use of this item is restricted to the UNT Community
A novel scheme of optimizing the individual components of a phase locked loop (PLL) which is used for stable clock generation and synchronization of signals is considered in this work. Verilog-A is used for the high level system design of the main components of the PLL, followed by the individual component wise optimization. The design of experiments (DOE) approach to optimize the analog, 45nm voltage controlled oscillator (VCO) is presented. Also a mixed signal analysis using the analog and digital Verilog behavior of components is studied. Overall a high level system design of a PLL, a systematic optimization of each of its components, and an analog and mixed signal behavioral design approach have been implemented using cadence custom IC design tools.
Date: December 2006
Creator: Sarivisetti, Gayathri
Object Type: Thesis or Dissertation
System: The UNT Digital Library

Modeling and reduction of gate leakage during behavioral synthesis of nanoscale CMOS circuits.

Access: Use of this item is restricted to the UNT Community
The major sources of power dissipation in a nanometer CMOS circuit are capacitive switching, short-circuit current, static leakage and gate oxide tunneling. However, with the aggressive scaling of technology the gate oxide direct tunneling current (gate leakage) is emerging as a prominent component of power dissipation. For sub-65 nm CMOS technology where the gate oxide (SiO2) thickness is very low, the direct tunneling current is the major form of tunneling. There are two contribution parts in this thesis: analytical modeling of behavioral level components for direct tunneling current and propagation delay, and the reduction of tunneling current during behavioral synthesis. Gate oxides of multiple thicknesses are useful in reducing the gate leakage dissipation. Analytical models from first principles to calculate the tunneling current and the propagation delay of behavioral level components is presented, which are backed by BSIM4/5 models and SPICE simulations. These components are characterized for 45 nm technology and an algorithm is provided for scheduling of datapath operations such that the overall tunneling current dissipation of a datapath circuit under design is minimal. It is observed that the oxide thickness that is being considered is very low it may not remain constant during the course of fabrication. Hence …
Date: May 2006
Creator: Velagapudi, Ramakrishna
Object Type: Thesis or Dissertation
System: The UNT Digital Library

A Netcentric Scientific Research Repository

Access: Use of this item is restricted to the UNT Community
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them …
Date: December 2006
Creator: Harrington, Brian
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Keywords in the mist:  Automated keyword extraction for very large documents and back of the book indexing. (open access)

Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.

This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back …
Date: May 2008
Creator: Csomai, Andras
Object Type: Thesis or Dissertation
System: The UNT Digital Library
General Purpose Programming on Modern Graphics Hardware (open access)

General Purpose Programming on Modern Graphics Hardware

I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
Date: May 2008
Creator: Fleming, Robert
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study (open access)

Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study

Trusted platform modules (TPMs) are hardware modules that are bound to a computer's motherboard, that are being included in many desktops and laptops. Augmenting computers with these hardware modules adds powerful functionality in distributed settings, allowing us to reason about the security of these systems in new ways. In this dissertation, I study the functionality of TPMs from a theoretical as well as an experimental perspective. On the theoretical front, I leverage various features of TPMs to construct applications like random oracles that are impossible to implement in a standard model of computation. Apart from random oracles, I construct a new cryptographic primitive which is basically a non-interactive form of the standard cryptographic primitive of oblivious transfer. I apply this new primitive to secure mobile agent computations, where interaction between various entities is typically required to ensure security. I prove these constructions are secure using standard cryptographic techniques and assumptions. To test the practicability of these constructions and their applications, I performed an experimental study, both on an actual TPM and a software TPM simulator which has been enhanced to make it reflect timings from a real TPM. This allowed me to benchmark the performance of the applications and test …
Date: May 2008
Creator: Gunupudi, Vandana
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Models to Combat Email Spam Botnets and Unwanted Phone Calls (open access)

Models to Combat Email Spam Botnets and Unwanted Phone Calls

With the amount of email spam received these days it is hard to imagine that spammers act individually. Nowadays, most of the spam emails have been sent from a collection of compromised machines controlled by some spammers. These compromised computers are often called bots, using which the spammers can send massive volume of spam within a short period of time. The motivation of this work is to understand and analyze the behavior of spammers through a large collection of spam mails. My research examined a the data set collected over a 2.5-year period and developed an algorithm which would give the botnet features and then classify them into various groups. Principal component analysis was used to study the association patterns of group of spammers and the individual behavior of a spammer in a given domain. This is based on the features which capture maximum variance of information we have clustered. Presence information is a growing tool towards more efficient communication and providing new services and features within a business setting and much more. The main contribution in my thesis is to propose the willingness estimator that can estimate the callee's willingness without his/her involvement, the model estimates willingness level based …
Date: May 2008
Creator: Husna, Husain
Object Type: Thesis or Dissertation
System: The UNT Digital Library
General Nathan Twining and the Fifteenth Air Force in World War II (open access)

General Nathan Twining and the Fifteenth Air Force in World War II

General Nathan F. Twining distinguished himself in leading the American Fifteenth Air Force during the last full year of World War II in the European Theatre. Drawing on the leadership qualities he had already shown in combat in the Pacific Theatre, he was the only USAAF leader who commanded three separate air forces during World War II. His command of the Fifteenth Air Force gave him his biggest, longest lasting, and most challenging experience of the war, which would be the foundation for the reputation that eventually would win him appointment to the nation's highest military post as Chairman of the Joint Chiefs of Staff during the Cold War.
Date: May 2008
Creator: Hutchins, Brian
Object Type: Thesis or Dissertation
System: The UNT Digital Library
A CAM-Based, High-Performance Classifier-Scheduler for a Video Network Processor. (open access)

A CAM-Based, High-Performance Classifier-Scheduler for a Video Network Processor.

Classification and scheduling are key functionalities of a network processor. Network processors are equipped with application specific integrated circuits (ASIC), so that as IP (Internet Protocol) packets arrive, they can be processed directly without using the central processing unit. A new network processor is proposed called the video network processor (VNP) for real time broadcasting of video streams for IP television (IPTV). This thesis explores the challenge in designing a combined classification and scheduling module for a VNP. I propose and design the classifier-scheduler module which will classify and schedule data for VNP. The proposed module discriminates between IP packets and video packets. The video packets are further processed for digital rights management (DRM). IP packets which carry regular traffic will traverse without any modification. Basic architecture of VNP and architecture of classifier-scheduler module based on content addressable memory (CAM) and random access memory (RAM) has been proposed. The module has been designed and simulated in Xilinx 9.1i; is built in ISE simulator with a throughput of 1.79 Mbps and a maximum working frequency of 111.89 MHz at a power dissipation of 33.6mW. The code has been translated and mapped for Spartan and Virtex family of devices.
Date: May 2008
Creator: Tarigopula, Srivamsi
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Non-Uniform Grid-Based Coordinated Routing in Wireless Sensor Networks (open access)

Non-Uniform Grid-Based Coordinated Routing in Wireless Sensor Networks

Wireless sensor networks are ad hoc networks of tiny battery powered sensor nodes that can organize themselves to form self-organized networks and collect information regarding temperature, light, and pressure in an area. Though the applications of sensor networks are very promising, sensor nodes are limited in their capability due to many factors. The main limitation of these battery powered nodes is energy. Sensor networks are expected to work for long periods of time once deployed and it becomes important to conserve the battery life of the nodes to extend network lifetime. This work examines non-uniform grid-based routing protocol as an effort to minimize energy consumption in the network and extend network lifetime. The entire test area is divided into non-uniformly shaped grids. Fixed source and sink nodes with unlimited energy are placed in the network. Sensor nodes with full battery life are deployed uniformly and randomly in the field. The source node floods the network with only the coordinator node active in each grid and the other nodes sleeping. The sink node traces the same route back to the source node through the same coordinators. This process continues till a coordinator node runs out of energy, when new coordinator nodes …
Date: August 2008
Creator: Kadiyala, Priyanka
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Region aware DCT domain invisible robust blind watermarking for color images. (open access)

Region aware DCT domain invisible robust blind watermarking for color images.

The multimedia revolution has made a strong impact on our society. The explosive growth of the Internet, the access to this digital information generates new opportunities and challenges. The ease of editing and duplication in digital domain created the concern of copyright protection for content providers. Various schemes to embed secondary data in the digital media are investigated to preserve copyright and to discourage unauthorized duplication: where digital watermarking is a viable solution. This thesis proposes a novel invisible watermarking scheme: a discrete cosine transform (DCT) domain based watermark embedding and blind extraction algorithm for copyright protection of the color images. Testing of the proposed watermarking scheme's robustness and security via different benchmarks proves its resilience to digital attacks. The detectors response, PSNR and RMSE results show that our algorithm has a better security performance than most of the existing algorithms.
Date: December 2008
Creator: Naraharisetti, Sahasan
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation (open access)

Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation

This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a …
Date: December 2008
Creator: Sinha, Ravi Som
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Direct Online/Offline Digital Signature Schemes. (open access)

Direct Online/Offline Digital Signature Schemes.

Online/offline signature schemes are useful in many situations, and two such scenarios are considered in this dissertation: bursty server authentication and embedded device authentication. In this dissertation, new techniques for online/offline signing are introduced, those are applied in a variety of ways for creating online/offline signature schemes, and five different online/offline signature schemes that are proved secure under a variety of models and assumptions are proposed. Two of the proposed five schemes have the best offline or best online performance of any currently known technique, and are particularly well-suited for the scenarios that are considered in this dissertation. To determine if the proposed schemes provide the expected practical improvements, a series of experiments were conducted comparing the proposed schemes with each other and with other state-of-the-art schemes in this area, both on a desktop class computer, and under AVR Studio, a simulation platform for an 8-bit processor that is popular for embedded systems. Under AVR Studio, the proposed SGE scheme using a typical key size for the embedded device authentication scenario, can complete the offline phase in about 24 seconds and then produce a signature (the online phase) in 15 milliseconds, which is the best offline performance of any known …
Date: December 2008
Creator: Yu, Ping
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Variability-aware low-power techniques for nanoscale mixed-signal circuits. (open access)

Variability-aware low-power techniques for nanoscale mixed-signal circuits.

New circuit design techniques that accommodate lower supply voltages necessary for portable systems need to be integrated into the semiconductor intellectual property (IP) core. Systems that once worked at 3.3 V or 2.5 V now need to work at 1.8 V or lower, without causing any performance degradation. Also, the fluctuation of device characteristics caused by process variation in nanometer technologies is seen as design yield loss. The numerous parasitic effects induced by layouts, especially for high-performance and high-speed circuits, pose a problem for IC design. Lack of exact layout information during circuit sizing leads to long design iterations involving time-consuming runs of complex tools. There is a strong need for low-power, high-performance, parasitic-aware and process-variation-tolerant circuit design. This dissertation proposes methodologies and techniques to achieve variability, power, performance, and parasitic-aware circuit designs. Three approaches are proposed: the single iteration automatic approach, the hybrid Monte Carlo and design of experiments (DOE) approach, and the corner-based approach. Widely used mixed-signal circuits such as analog-to-digital converter (ADC), voltage controlled oscillator (VCO), voltage level converter and active pixel sensor (APS) have been designed at nanoscale complementary metal oxide semiconductor (CMOS) and subjected to the proposed methodologies. The effectiveness of the proposed methodologies has …
Date: May 2009
Creator: Ghai, Dhruva V.
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Development, Implementation, and Analysis of a Contact Model for an Infectious Disease (open access)

Development, Implementation, and Analysis of a Contact Model for an Infectious Disease

With a growing concern of an infectious diseases spreading in a population, epidemiology is becoming more important for the future of public health. In the past epidemiologist used existing data of an outbreak to help them determine how an infectious disease might spread in the future. Now with computational models, they able to analysis data produced by these models to help with prevention and intervention plans. This paper looks at the design, implementation, and analysis of a computational model based on the interactions of the population between individuals. The design of the working contact model looks closely at the SEIR model used as the foundation and the two timelines of a disease. The implementation of the contact model is reviewed while looking closely at data structures. The analysis of the experiments provide evidence this contact model can be used to help epidemiologist study the spread of an infectious disease based on the contact rate of individuals.
Date: May 2009
Creator: Thompson, Brett Morinaga
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Social Network Simulation and Mining Social Media to Advance Epidemiology (open access)

Social Network Simulation and Mining Social Media to Advance Epidemiology

Traditional Public Health decision-support can benefit from the Web and social media revolution. This dissertation presents approaches to mining social media benefiting public health epidemiology. Through discovery and analysis of trends in Influenza related blogs, a correlation to Centers for Disease Control and Prevention (CDC) influenza-like-illness patient reporting at sentinel health-care providers is verified. A second approach considers personal beliefs of vaccination in social media. A vaccine for human papillomavirus (HPV) was approved by the Food and Drug Administration in May 2006. The virus is present in nearly all cervical cancers and implicated in many throat and oral cancers. Results from automatic sentiment classification of HPV vaccination beliefs are presented which will enable more accurate prediction of the vaccine's population-level impact. Two epidemic models are introduced that embody the intimate social networks related to HPV transmission. Ultimately, aggregating these methodologies with epidemic and social network modeling facilitate effective development of strategies for targeted interventions.
Date: August 2009
Creator: Corley, Courtney David
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach (open access)

Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach

Many infectious diseases are spread through interactions between susceptible and infectious individuals. Keeping track of where each exposure to the disease took place, when it took place, and which individuals were involved in the exposure can give public health officials important information that they may use to formulate their interventions. Further, knowing which individuals in the population are at the highest risk of becoming infected with the disease may prove to be a useful tool for public health officials trying to curtail the spread of the disease. Epidemiological models are needed to allow epidemiologists to study the population dynamics of transmission of infectious agents and the potential impact of infectious disease control programs. While many agent-based computational epidemiological models exist in the literature, they focus on the spread of disease rather than exposure risk. These models are designed to simulate very large populations, representing individuals as agents, and using random experiments and probabilities in an attempt to more realistically guide the course of the modeled disease outbreak. The work presented in this thesis focuses on tracking exposure risk to chickenpox in an elementary school setting. This setting is chosen due to the high level of detailed information realistically available to …
Date: August 2009
Creator: O'Neill, Martin Joseph, II
Object Type: Thesis or Dissertation
System: The UNT Digital Library
FPGA Implementation of Low Density Party Check Codes Decoder (open access)

FPGA Implementation of Low Density Party Check Codes Decoder

Reliable communication over the noisy channel has become one of the major concerns in the field of digital wireless communications. The low density parity check codes (LDPC) has gained lot of attention recently because of their excellent error-correcting capacity. It was first proposed by Robert G. Gallager in 1960. LDPC codes belong to the class of linear block codes. Near capacity performance is achievable on a large collection of data transmission and storage.In my thesis I have focused on hardware implementation of (3, 6) - regular LDPC codes. A fully parallel decoder will require too high complexity of hardware realization. Partly parallel decoder has the advantage of effective compromise between decoding throughput and high hardware complexity. The decoding of the codeword follows the belief propagation alias probability propagation algorithm in log domain. A 9216 bit, (3, 6) regular LDPC code with code rate ½ was implemented on FPGA targeting Xilinx Virtex 4 XC4VLX80 device with package FF1148. This decoder achieves a maximum throughput of 82 Mbps. The entire model was designed in VHDL in the Xilinx ISE 9.2 environment.
Date: August 2009
Creator: Vijayakumar, Suresh
Object Type: Thesis or Dissertation
System: The UNT Digital Library
Inferring Social and Internal Context Using a Mobile Phone (open access)

Inferring Social and Internal Context Using a Mobile Phone

This dissertation is composed of research studies that contribute to three research areas including social context-aware computing, internal context-aware computing, and human behavioral data mining. In social context-aware computing, four studies are conducted. First, mobile phone user calling behavioral patterns are characterized in forms of randomness level where relationships among them are then identified. Next, a study is conducted to investigate the relationship between the calling behavior and organizational groups. Third, a method is presented to quantitatively define mobile social closeness and social groups, which are then used to identify social group sizes and scaling ratio. Last, based on the mobile social grouping framework, the significant role of social ties in communication patterns is revealed. In internal context-aware computing, two studies are conducted where the notions of internal context are intention and situation. For intentional context, the goal is to sense the intention of the user in placing calls. A model is thus presented for predicting future calls envisaged as a call predicted list (CPL), which makes use of call history to build a probabilistic model of calling behavior. As an incoming call predictor, CPL is a list of numbers/contacts that are the most likely to be the callers within …
Date: December 2009
Creator: Phithakkitnukoon, Santi
Object Type: Thesis or Dissertation
System: The UNT Digital Library
E‐Shape Analysis (open access)

E‐Shape Analysis

The motivation of this work is to understand E-shape analysis and how it can be applied to various classification tasks. It has a powerful feature to not only look at what information is contained, but rather how that information looks. This new technique gives E-shape analysis the ability to be language independent and to some extent size independent. In this thesis, I present a new mechanism to characterize an email without using content or context called E-shape analysis for email. I explore the applications of the email shape by carrying out a case study; botnet detection and two possible applications: spam filtering and social-context based finger printing. The second part of this thesis takes what I apply E-shape analysis to activity recognition of humans. Using the Android platform and a T-Mobile G1 phone I collect data from the triaxial accelerometer and use it to classify the motion behavior of a subject.
Date: December 2009
Creator: Sroufe, Paul
Object Type: Thesis or Dissertation
System: The UNT Digital Library
The Value of Everything: Ranking and Association with Encyclopedic Knowledge (open access)

The Value of Everything: Ranking and Association with Encyclopedic Knowledge

This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus …
Date: December 2009
Creator: Coursey, Kino High
Object Type: Thesis or Dissertation
System: The UNT Digital Library