Degree Department

Degree Discipline

Degree Level

Decade

Year

Month

32 Matching Results

Start Over Degree Department Department of Computer Science and Engineering

Results open in a new window/tab. Unexpected Results? Search the Catalog Instead.

Results: 1 - 24 of 32 next

(open access)

Privacy Preserving EEG-based Authentication Using Perceptual Hashing

The use of electroencephalogram (EEG), an electrophysiological monitoring method for recording the brain activity, for authentication has attracted the interest of researchers for over a decade. In addition to exhibiting qualities of biometric-based authentication, they are revocable, impossible to mimic, and resistant to coercion attacks. However, EEG signals carry a wealth of information about an individual and can reveal private information about the user. This brings significant privacy issues to EEG-based authentication systems as they have access to raw EEG signals. This thesis proposes a privacy-preserving EEG-based authentication system that preserves the privacy of the user by not revealing the raw EEG signals while allowing the system to authenticate the user accurately. In that, perceptual hashing is utilized and instead of raw EEG signals, their perceptually hashed values are used in the authentication process. In addition to describing the authentication process, algorithms to compute the perceptual hash are developed based on two feature extraction techniques. Experimental results show that an authentication system using perceptual hashing can achieve performance comparable to a system that has access to raw EEG signals if enough EEG channels are used in the process. This thesis also presents a security analysis to show that perceptual hashing …

Date: December 2016

Creator: Koppikar, Samir Dilip

System: The UNT Digital Library

(open access)

Online Construction of Android Application Test Suites

Mobile applications play an important role in the dissemination of computing and information resources. They are often used in domains such as mobile banking, e-commerce, and health monitoring. Cost-effective testing techniques in these domains are critical. This dissertation contributes novel techniques for automatic construction of mobile application test suites. In particular, this work provides solutions that focus on the prohibitively large number of possible event sequences that must be sampled in GUI-based mobile applications. This work makes three major contributions: (1) an automated GUI testing tool, Autodroid, that implements a novel online approach to automatic construction of Android application test suites (2) probabilistic and combinatorial-based algorithms that systematically sample the input space of Android applications to generate test suites with GUI/context events and (3) empirical studies to evaluate the cost-effectiveness of our techniques on real-world Android applications. Our experiments show that our techniques achieve better code coverage and event coverage compared to random test generation. We demonstrate that our techniques are useful for automatic construction of Android application test suites in the absence of source code and preexisting abstract models of an Application Under Test (AUT). The insights derived from our empirical studies provide guidance to researchers and practitioners involved …

Date: December 2017

Creator: Adamo, David T., Jr.

System: The UNT Digital Library

(open access)

Modeling and Simulation of the Vector-Borne Dengue Disease and the Effects of Regional Variation of Temperature in the Disease Prevalence in Homogenous and Heterogeneous Human Populations

The history of mitigation programs to contain vector-borne diseases is a story of successes and failures. Due to the complex interplay among multiple factors that determine disease dynamics, the general principles for timely and specific intervention for incidence reduction or eradication of life-threatening diseases has yet to be determined. This research discusses computational methods developed to assist in the understanding of complex relationships affecting vector-borne disease dynamics. A computational framework to assist public health practitioners with exploring the dynamics of vector-borne diseases, such as malaria and dengue in homogenous and heterogeneous populations, has been conceived, designed, and implemented. The framework integrates a stochastic computational model of interactions to simulate horizontal disease transmission. The intent of the computational modeling has been the integration of stochasticity during simulation of the disease progression while reducing the number of necessary interactions to simulate a disease outbreak. While there are improvements in the computational time reducing the number of interactions needed for simulating disease dynamics, the realization of interactions can remain computationally expensive. Using multi-threading technology to improve performance upon the original computational model, multi-threading experimental results have been tested and reported. In addition, to the contact model, the modeling of biological processes specific to …

Date: August 2016

Creator: Bravo-Salgado, Angel D

System: The UNT Digital Library

(open access)

Simulink Based Modeling of a Multi Global Navigation Satellite System

The objective of this thesis is to design a model for a multi global navigation satellite system using Simulink. It explains a design procedure which includes the models for transmitter and receiver for two different navigation systems. To overcome the problem, where less number of satellites are visible to determine location degrades the performance of any positioning system significantly, this research has done to make use of multi GNSS satellite signals in one navigation receiver.

Date: December 2016

Creator: Mukka, Nagaraju

System: The UNT Digital Library

Multiomics Data Integration and Multiplex Graph Neural Network Approaches

With increasing data and technology, multiple types of data from the same set of nodes have been generated. Since each data modality contains a unique aspect of the underlying mechanisms, multiple datatypes are integrated. In addition to multiple datatypes, networks are important to store information representing associations between entities such as genes of a protein-protein interaction network and authors of a citation network. Recently, some advanced approaches to graph-structured data leverage node associations and features simultaneously, called Graph Neural Network (GNN), but they have limitations for integrative approaches. The overall aim of this dissertation is to integrate multiple data modalities on graph-structured data to infer some context-specific gene regulation and predict outcomes of interest. To this end, first, we introduce a computational tool named CRINET to infer genome-wide competing endogenous RNA (ceRNA) networks. By integrating multiple data properly, we had a better understanding of gene regulatory circuitry addressing important drawbacks pertaining to ceRNA regulation. We tested CRINET on breast cancer data and found that ceRNA interactions and groups were significantly enriched in the cancer-related genes and processes. CRINET-inferred ceRNA groups supported the studies claiming the relation between immunotherapy and cancer. Second, we present SUPREME, a node classification framework, by comprehensively …

Date: May 2023

Creator: Kesimoglu, Ziynet Nesibe

System: The UNT Digital Library

Understanding and Reasoning with Negation

In this dissertation, I start with an analysis of negation in eleven benchmark corpora covering six Natural Language Understanding (NLU) tasks. With a thorough investigation, I first show that (a) these benchmarks contain fewer negations compared to general-purpose English and (b) the few negations they contain are often unimportant. Further, my empirical studies demonstrate that state-of-the-art transformers trained using these corpora obtain substantially worse results with the instances that contain negation, especially if the negations are important. Second, I investigate whether translating negation is also an issue for modern machine translation (MT) systems. My studies find that indeed the presence of negation can significantly impact translation quality, in some cases resulting in reductions of over 60%. In light of these findings, I investigate strategies to better understand the semantics of negation. I start with identifying the focus of negation. I develop a neural model that takes into account the scope of negation, context from neighboring sentences, or both. My best proposed system obtains an accuracy improvement of 7.4% over prior work. Further, I analyze the main error categories of the systems through a detailed error analysis. Next, I explore more practical ways to understand the semantics of negation. I consider …

Date: December 2022

Creator: Hossain, Md Mosharaf

System: The UNT Digital Library

(open access)

A Dual Dielectric Approach for Performance Aware Reduction of Gate Leakage in Combinational Circuits

Design of systems in the low-end nanometer domain has introduced new dimensions in power consumption and dissipation in CMOS devices. With continued and aggressive scaling, using low thickness SiO2 for the transistor gates, gate leakage due to gate oxide direct tunneling current has emerged as the major component of leakage in the CMOS circuits. Therefore, providing a solution to the issue of gate oxide leakage has become one of the key concerns in achieving low power and high performance CMOS VLSI circuits. In this thesis, a new approach is proposed involving dual dielectric of dual thicknesses (DKDT) for the reducing both ON and OFF state gate leakage. It is claimed that the simultaneous utilization of SiON and SiO2 each with multiple thicknesses is a better approach for gate leakage reduction than the conventional usage of a single gate dielectric (SiO2), possibly with multiple thicknesses. An algorithm is developed for DKDT assignment that minimizes the overall leakage for a circuit without compromising with the performance. Extensive experiments were carried out on ISCAS'85 benchmarks using 45nm technology which showed that the proposed approach can reduce the leakage, as much as 98% (in an average 89.5%), without degrading the performance.

Date: May 2006

Creator: Mukherjee, Valmiki

System: The UNT Digital Library

(open access)

Capacity and Throughput Optimization in Multi-cell 3G WCDMA Networks

User modeling enables in the computation of the traffic density in a cellular network, which can be used to optimize the placement of base stations and radio network controllers as well as to analyze the performance of resource management algorithms towards meeting the final goal: the calculation and maximization of network capacity and throughput for different data rate services. An analytical model is presented for approximating the user distributions in multi-cell third generation wideband code division multiple access (WCDMA) networks using 2-dimensional Gaussian distributions by determining the means and the standard deviations of the distributions for every cell. This model allows for the calculation of the inter-cell interference and the reverse-link capacity of the network. An analytical model for optimizing capacity in multi-cell WCDMA networks is presented. Capacity is optimized for different spreading factors and for perfect and imperfect power control. Numerical results show that the SIR threshold for the received signals is decreased by 0.5 to 1.5 dB due to the imperfect power control. The results also show that the determined parameters of the 2-dimensional Gaussian model match well with traditional methods for modeling user distribution. A call admission control algorithm is designed that maximizes the throughput in multi-cell …

Date: December 2005

Creator: Nguyen, Son

System: The UNT Digital Library

(open access)

New Computational Methods for Literature-Based Discovery

In this work, we leverage the recent developments in computer science to address several of the challenges in current literature-based discovery (LBD) solutions. First, LBD solutions cannot use semantics or are too computational complex. To solve the problems we propose a generative model OverlapLDA based on topic modeling, which has been shown both effective and efficient in extracting semantics from a corpus. We also introduce an inference method of OverlapLDA. We conduct extensive experiments to show the effectiveness and efficiency of OverlapLDA in LBD. Second, we expand LBD to a more complex and realistic setting. The settings are that there can be more than one concept connecting the input concepts, and the connectivity pattern between concepts can also be more complex than a chain. Current LBD solutions can hardly complete the LBD task in the new setting. We simplify the hypotheses as concept sets and propose LBDSetNet based on graph neural networks to solve this problem. We also introduce different training schemes based on self-supervised learning to train LBDSetNet without relying on comprehensive labeled hypotheses that are extremely costly to get. Our comprehensive experiments show that LBDSetNet outperforms strong baselines on simple hypotheses and addresses complex hypotheses.

Date: May 2022

Creator: Ding, Juncheng

System: The UNT Digital Library

(open access)

Modeling Epidemics on Structured Populations: Effects of Socio-demographic Characteristics and Immune Response Quality

Epidemiologists engage in the study of the distribution and determinants of health-related states or events in human populations. Eventually, they will apply that study to prevent and control problems and contingencies associated with the health of the population. Due to the spread of new pathogens and the emergence of new bio-terrorism threats, it has become imperative to develop new and expand existing techniques to equip public health providers with robust tools to predict and control health-related crises. In this dissertation, I explore the effects caused in the disease dynamics by the differences in individuals’ physiology and social/behavioral characteristics. Multiple computational and mathematical models were developed to quantify the effect of those factors on spatial and temporal variations of the disease epidemics. I developed statistical methods to measure the effects caused in the outbreak dynamics by the incorporation of heterogeneous demographics and social interactions to the individuals of the population. Specifically, I studied the relationship between demographics and the physiological characteristics of an individual when preparing for an infectious disease epidemic.

Date: August 2014

Creator: Reyes Silveyra, Jorge A.

System: The UNT Digital Library

(open access)

Paradigm Shift from Vague Legal Contracts to Blockchain-Based Smart Contracts

In this dissertation, we address the problem of vagueness in traditional legal contracts by presenting novel methodologies that aid in the paradigm shift from traditional legal contracts to smart contracts. We discuss key enabling technologies that assist in converting the traditional natural language legal contract, which is full of vague words, phrases, and sentences to the blockchain-based precise smart contract, including metrics evaluation during our conversion experiment. To address the challenge of this contract-transformation process, we propose four novel proof-of-concept approaches that take vagueness and different possible interpretations into significant consideration, where we experiment with popular vendors' existing vague legal contracts. We show through experiments that our proposed methodologies are able to study the degree of vagueness in every interpretation and demonstrate which vendor's translated-smart contract can be more accurate, optimized, and have a lesser degree of vagueness. We also incorporated the method of fuzzy logic inside the blockchain-based smart contract, to successfully model the semantics of linguistic expressions. Our experiments and results show that the smart contract with the higher degrees of truth can be very complex technically but more accurate at the same time. By using fuzzy logic inside a smart contract, it becomes easier to solve the …

Date: July 2023

Creator: Upadhyay, Kritagya Raj

System: The UNT Digital Library

(open access)

The Value of Everything: Ranking and Association with Encyclopedic Knowledge

This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus …

Date: December 2009

Creator: Coursey, Kino High

System: The UNT Digital Library

Optimization of Massive MIMO Systems for 5G Networks

In the first part of the dissertation, we provide an extensive overview of sub-6 GHz wireless access technology known as massive multiple-input multiple-output (MIMO) systems, highlighting its benefits, deployment challenges, and the key enabling technologies envisaged for 5G networks. We investigate the fundamental issues that degrade the performance of massive MIMO systems such as pilot contamination, precoding, user scheduling, and signal detection. In the second part, we optimize the performance of the massive MIMO system by proposing several algorithms, system designs, and hardware architectures. To mitigate the effect of pilot contamination, we propose a pilot reuse factor scheme based on the user environment and the number of active users. The results through simulations show that the proposed scheme ensures the system always operates at maximal spectral efficiency and achieves higher throughput. To address the user scheduling problem, we propose two user scheduling algorithms bases upon the measured channel gain. The simulation results show that our proposed user scheduling algorithms achieve better error performance, improve sum capacity and throughput, and guarantee fairness among the users. To address the uplink signal detection challenge in the massive MIMO systems, we propose four algorithms and their system designs. We show through simulations that the …

Date: August 2020

Creator: Chataut, Robin

System: The UNT Digital Library

(open access)

Sensing and Decoding Brain States for Predicting and Enhancing Human Behavior, Health, and Security

The human brain acts as an intelligent sensor by helping in effective signal communication and execution of logical functions and instructions, thus, coordinating all functions of the human body. More importantly, it shows the potential to combine prior knowledge with adaptive learning, thus ensuring constant improvement. These qualities help the brain to interact efficiently with both, the body (brain-body) as well as the environment (brain-environment). This dissertation attempts to apply the brain-body-environment interactions (BBEI) to elevate human existence and enhance our day-to-day experiences. For instance, when one stepped out of the house in the past, one had to carry keys (for unlocking), money (for purchasing), and a phone (for communication). With the advent of smartphones, this scenario changed completely and today, it is often enough to carry just one's smartphone because all the above activities can be performed with a single device. In the future, with advanced research and progress in BBEI interactions, one will be able to perform many activities by dictating it in one's mind without any physical involvement. This dissertation aims to shift the paradigm of existing brain-computer-interfaces from just ‘control' to ‘monitor, control, enhance, and restore' in three main areas - healthcare, transportation safety, and cryptography. …

Date: August 2016

Creator: Bajwa, Garima

System: The UNT Digital Library

(open access)

Reinforcement Learning-Based Test Case Generation with Test Suite Prioritization for Android Application Testing

This dissertation introduces a hybrid strategy for automated testing of Android applications that combines reinforcement learning and test suite prioritization. These approaches aim to improve the effectiveness of the testing process by employing reinforcement learning algorithms, namely Q-learning and SARSA (State-Action-Reward-State-Action), for automated test case generation. The studies provide compelling evidence that reinforcement learning techniques hold great potential in generating test cases that consistently achieve high code coverage; however, the generated test cases may not always be in the optimal order. In this study, novel test case prioritization methods are developed, leveraging pairwise event interactions coverage, application state coverage, and application activity coverage, so as to optimize the rates of code coverage specifically for SARSA-generated test cases. Additionally, test suite prioritization techniques are introduced based on UI element coverage, test case cost, and test case complexity to further enhance the ordering of SARSA-generated test cases. Empirical investigations demonstrate that applying the proposed test suite prioritization techniques to the test suites generated by the reinforcement learning algorithm SARSA improved the rates of code coverage over original orderings and random orderings of test cases.

Date: July 2023

Creator: Khan, Md Khorrom

System: The UNT Digital Library

Toward Leveraging Artificial Intelligence to Support the Identification of Accessibility Challenges

The goal of this thesis is to support the automated identification of accessibility in user reviews or bug reports, to help technology professionals prioritize their handling, and, thus, to create more inclusive apps. Particularly, we propose a model that takes as input accessibility user reviews or bug reports and learns their keyword-based features to make a classification decision, for a given review, on whether it is about accessibility or not. Our empirically driven study follows a mixture of qualitative and quantitative methods. We introduced models that can accurately identify accessibility reviews and bug reports and automate detecting them. Our models can automatically classify app reviews and bug reports as accessibility-related or not so developers can easily detect accessibility issues with their products and improve them to more accessible and inclusive apps utilizing the users' input. Our goal is to create a sustainable change by including a model in the developer's software maintenance pipeline and raising awareness of existing errors that hinder the accessibility of mobile apps, which is a pressing need. In light of our findings from the Blackboard case study, Blackboard and the course material are not easily accessible to deaf students and hard of hearing. Thus, deaf students …

Date: May 2023

Creator: Aljedaani, Wajdi Mohammed R M., Sr.

System: The UNT Digital Library

(open access)

Modeling and Analysis of Intentional And Unintentional Security Vulnerabilities in a Mobile Platform

Mobile phones are one of the essential parts of modern life. Making a phone call is not the main purpose of a smart phone anymore, but merely one of many other features. Online social networking, chatting, short messaging, web browsing, navigating, and photography are some of the other features users enjoy in modern smartphones, most of which are provided by mobile apps. However, with this advancement, many security vulnerabilities have opened up in these devices. Malicious apps are a major threat for modern smartphones. According to Symantec Corp., by the middle of 2013, about 273,000 Android malware apps were identified. It is a complex issue to protect everyday users of mobile devices from the attacks of technologically competent hackers, illegitimate users, trolls, and eavesdroppers. This dissertation emphasizes the concept of intention identification. Then it looks into ways to utilize this intention identification concept to enforce security in a mobile phone platform. For instance, a battery monitoring app requiring SMS permissions indicates suspicious intention as battery monitoring usually does not need SMS permissions. Intention could be either the user's intention or the intention of an app. These intentions can be identified using their behavior or by using their source code. Regardless …

Date: December 2014

Creator: Fazeen, Mohamed & Issadeen, Mohamed

System: The UNT Digital Library

(open access)

Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized …

Date: August 2011

Creator: Hassan, Samer

System: The UNT Digital Library

(open access)

Inferring Social and Internal Context Using a Mobile Phone

This dissertation is composed of research studies that contribute to three research areas including social context-aware computing, internal context-aware computing, and human behavioral data mining. In social context-aware computing, four studies are conducted. First, mobile phone user calling behavioral patterns are characterized in forms of randomness level where relationships among them are then identified. Next, a study is conducted to investigate the relationship between the calling behavior and organizational groups. Third, a method is presented to quantitatively define mobile social closeness and social groups, which are then used to identify social group sizes and scaling ratio. Last, based on the mobile social grouping framework, the significant role of social ties in communication patterns is revealed. In internal context-aware computing, two studies are conducted where the notions of internal context are intention and situation. For intentional context, the goal is to sense the intention of the user in placing calls. A model is thus presented for predicting future calls envisaged as a call predicted list (CPL), which makes use of call history to build a probabilistic model of calling behavior. As an incoming call predictor, CPL is a list of numbers/contacts that are the most likely to be the callers within …

Date: December 2009

Creator: Phithakkitnukoon, Santi

System: The UNT Digital Library

(open access)

Framework for Evaluating Dynamic Memory Allocators Including a New Equivalence Class Based Cache-conscious Allocator

Software applications’ performance is hindered by a variety of factors, but most notably by the well-known CPU-memory speed gap (often known as the memory wall). This results in the CPU sitting idle waiting for data to be brought from memory to processor caches. The addressing used by caches cause non-uniform accesses to various cache sets. The non-uniformity is due to several reasons, including how different objects are accessed by the code and how the data objects are located in memory. Memory allocators determine where dynamically created objects are placed, thus defining addresses and their mapping to cache locations. It is important to evaluate how different allocators behave with respect to the localities of the created objects. Most allocators use a single attribute, the size, of an object in making allocation decisions. Additional attributes such as the placement with respect to other objects, or specific cache area may lead to better use of cache memories. In this dissertation, we proposed and implemented a framework that allows for the development and evaluation of new memory allocation techniques. At the root of the framework is a memory tracing tool called Gleipnir, which provides very detailed information about every memory access, and relates it …

Date: August 2013

Creator: Janjusic, Tomislav

System: The UNT Digital Library

(open access)

Timing and Congestion Driven Algorithms for FPGA Placement

Placement is one of the most important steps in physical design for VLSI circuits. For field programmable gate arrays (FPGAs), the placement step determines the location of each logic block. I present novel timing and congestion driven placement algorithms for FPGAs with minimal runtime overhead. By predicting the post-routing timing-critical edges and estimating congestion accurately, this algorithm is able to simultaneously reduce the critical path delay and the minimum number of routing tracks. The core of the algorithm consists of a criticality-history record of connection edges and a congestion map. This approach is applied to the 20 largest Microelectronics Center of North Carolina (MCNC) benchmark circuits. Experimental results show that compared with the state-of-the-art FPGA place and route package, the Versatile Place and Route (VPR) suite, this algorithm yields an average of 8.1% reduction (maximum 30.5%) in the critical path delay and 5% reduction in channel width. Meanwhile, the average runtime of the algorithm is only 2.3X as of VPR.

Date: December 2006

Creator: Zhuo, Yue

System: The UNT Digital Library

(open access)

Video Analytics with Spatio-Temporal Characteristics of Activities

As video capturing devices become more ubiquitous from surveillance cameras to smart phones, the demand of automated video analysis is increasing as never before. One obstacle in this process is to efficiently locate where a human operator’s attention should be, and another is to determine the specific types of activities or actions without ambiguity. It is the special interest of this dissertation to locate spatial and temporal regions of interest in videos and to develop a better action representation for video-based activity analysis. This dissertation follows the scheme of “locating then recognizing” activities of interest in videos, i.e., locations of potentially interesting activities are estimated before performing in-depth analysis. Theoretical properties of regions of interest in videos are first exploited, based on which a unifying framework is proposed to locate both spatial and temporal regions of interest with the same settings of parameters. The approach estimates the distribution of motion based on 3D structure tensors, and locates regions of interest according to persistent occurrences of low probability. Two contributions are further made to better represent the actions. The first is to construct a unifying model of spatio-temporal relationships between reusable mid-level actions which bridge low-level pixels and high-level activities. Dense …

Date: May 2015

Creator: Cheng, Guangchun

System: The UNT Digital Library

(open access)

Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.

This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back …

Date: May 2008

Creator: Csomai, Andras

System: The UNT Digital Library

(open access)

Modeling Alcohol Consumption Using Blog Data

How do the content and writing style of people who drink alcohol beverages stand out from non-drinkers? How much information can we learn about a person's alcohol consumption behavior by reading text that they have authored? This thesis attempts to extend the methods deployed in authorship attribution and authorship profiling research into the domain of automatically identifying the human action of drinking alcohol beverages. I examine how a psycholinguistics dictionary (the Linguistics Inquiry and Word Count lexicon, developed by James Pennebaker), together with Kenneth Burke's concept of words as symbols of human action, and James Wertsch's concept of mediated action provide a framework for analyzing meaningful data patterns from the content of blogs written by consumers of alcohol beverages. The contributions of this thesis to the research field are twofold. First, I show that it is possible to automatically identify blog posts that have content related to the consumption of alcohol beverages. And second, I provide a framework and tools to model human behavior through text analysis of blog data.

Date: May 2013

Creator: Koh, Kok Chuan

System: The UNT Digital Library