Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling (open access)

Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling

Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I …
Date: August 2011
Creator: Shen, Yao
System: The UNT Digital Library
Learning from small data set for object recognition in mobile platforms. (open access)

Learning from small data set for object recognition in mobile platforms.

Did you stand at a door with a bunch of keys and tried to find the right one to unlock the door? Did you hold a flower and wonder the name of it? A need of object recognition could rise anytime and any where in our daily lives. With the development of mobile devices object recognition applications become possible to provide immediate assistance. However, performing complex tasks in even the most advanced mobile platforms still faces great challenges due to the limited computing resources and computing power. In this thesis, we present an object recognition system that resides and executes within a mobile device, which can efficiently extract image features and perform learning and classification. To account for the computing constraint, a novel feature extraction method that minimizes the data size and maintains data consistency is proposed. This system leverages principal component analysis method and is able to update the trained classifier when new examples become available . Our system relieves users from creating a lot of examples and makes it user friendly. The experimental results demonstrate that a learning method trained with a very small number of examples can achieve recognition accuracy above 90% in various acquisition conditions. In …
Date: May 2016
Creator: Liu, Siyuan
System: The UNT Digital Library
Detection and Classification of Heart Sounds Using a Heart-Mobile Interface (open access)

Detection and Classification of Heart Sounds Using a Heart-Mobile Interface

An early detection of heart disease can save lives, caution individuals and also help to determine the type of treatment to be given to the patients. The first test of diagnosing a heart disease is through auscultation - listening to the heart sounds. The interpretation of heart sounds is subjective and requires a professional skill to identify the abnormalities in these sounds. A medical practitioner uses a stethoscope to perform an initial screening by listening for irregular sounds from the patient's chest. Later, echocardiography and electrocardiography tests are taken for further diagnosis. However, these tests are expensive and require specialized technicians to operate. A simple and economical way is vital for monitoring in homecare or rural hospitals and urban clinics. This dissertation is focused on developing a patient-centered device for initial screening of the heart sounds that is both low cost and can be used by the users on themselves, and later share the readings with the healthcare providers. An innovative mobile health service platform is created for analyzing and classifying heart sounds. Certain properties of heart sounds have to be evaluated to identify the irregularities such as the number of heart beats and gallops, intensity, frequency, and duration. Since …
Date: December 2016
Creator: Thiyagaraja, Shanti
System: The UNT Digital Library
Privacy Preserving EEG-based Authentication Using Perceptual Hashing (open access)

Privacy Preserving EEG-based Authentication Using Perceptual Hashing

The use of electroencephalogram (EEG), an electrophysiological monitoring method for recording the brain activity, for authentication has attracted the interest of researchers for over a decade. In addition to exhibiting qualities of biometric-based authentication, they are revocable, impossible to mimic, and resistant to coercion attacks. However, EEG signals carry a wealth of information about an individual and can reveal private information about the user. This brings significant privacy issues to EEG-based authentication systems as they have access to raw EEG signals. This thesis proposes a privacy-preserving EEG-based authentication system that preserves the privacy of the user by not revealing the raw EEG signals while allowing the system to authenticate the user accurately. In that, perceptual hashing is utilized and instead of raw EEG signals, their perceptually hashed values are used in the authentication process. In addition to describing the authentication process, algorithms to compute the perceptual hash are developed based on two feature extraction techniques. Experimental results show that an authentication system using perceptual hashing can achieve performance comparable to a system that has access to raw EEG signals if enough EEG channels are used in the process. This thesis also presents a security analysis to show that perceptual hashing …
Date: December 2016
Creator: Koppikar, Samir Dilip
System: The UNT Digital Library
Content and Temporal Analysis of Communications to Predict Task Cohesion in Software Development Global Teams (open access)

Content and Temporal Analysis of Communications to Predict Task Cohesion in Software Development Global Teams

Virtual teams in industry are increasingly being used to develop software, create products, and accomplish tasks. However, analyzing those collaborations under same-time/different-place conditions is well-known to be difficult. In order to overcome some of these challenges, this research was concerned with the study of collaboration-based, content-based and temporal measures and their ability to predict cohesion within global software development projects. Messages were collected from three software development projects that involved students from two different countries. The similarities and quantities of these interactions were computed and analyzed at individual and group levels. Results of interaction-based metrics showed that the collaboration variables most related to Task Cohesion were Linguistic Style Matching and Information Exchange. The study also found that Information Exchange rate and Reply rate have a significant and positive correlation to Task Cohesion, a factor used to describe participants' engagement in the global software development process. This relation was also found at the Group level. All these results suggest that metrics based on rate can be very useful for predicting cohesion in virtual groups. Similarly, content features based on communication categories were used to improve the identification of Task Cohesion levels. This model showed mixed results, since only Work similarity and …
Date: May 2017
Creator: Castro Hernandez, Alberto
System: The UNT Digital Library
Brain Computer Interface (BCI) Applications: Privacy Threats and Countermeasures (open access)

Brain Computer Interface (BCI) Applications: Privacy Threats and Countermeasures

In recent years, brain computer interfaces (BCIs) have gained popularity in non-medical domains such as the gaming, entertainment, personal health, and marketing industries. A growing number of companies offer various inexpensive consumer grade BCIs and some of these companies have recently introduced the concept of BCI "App stores" in order to facilitate the expansion of BCI applications and provide software development kits (SDKs) for other developers to create new applications for their devices. The BCI applications access to users' unique brainwave signals, which consequently allows them to make inferences about users' thoughts and mental processes. Since there are no specific standards that govern the development of BCI applications, its users are at the risk of privacy breaches. In this work, we perform first comprehensive analysis of BCI App stores including software development kits (SDKs), application programming interfaces (APIs), and BCI applications w.r.t privacy issues. The goal is to understand the way brainwave signals are handled by BCI applications and what threats to the privacy of users exist. Our findings show that most applications have unrestricted access to users' brainwave signals and can easily extract private information about their users without them even noticing. We discuss potential privacy threats posed by …
Date: May 2017
Creator: Bhalotiya, Anuj Arun
System: The UNT Digital Library
Exploring Memristor Based Analog Design in Simscape (open access)

Exploring Memristor Based Analog Design in Simscape

With conventional CMOS technologies approaching their scaling limits, researchers are actively investigating alternative technologies for ever increasing computing and mobile demand. A number of different technologies are currently being studied by different research groups. In the last decade, one-dimensional (1D) carbon nanotubes (CNT), graphene, which is a two-dimensional (2D) natural occurring carbon rolled in tubular form, and zero-dimensional (0D) fullerenes have been the subject of intensive research. In 2008, HP Labs announced a ground-breaking fabrication of memristors, the fourth fundamental element postulated by Chua at the University of California, Berkeley in 1971. In the last few years, the memristor has gained a lot of attention from the research community. In-depth studies of the memristor and its analog behavior have convinced the community that it has the potential in future nano-architectures for optimization of high-density memory and neuromorphic computing architectures. The objective of this thesis is to explore memristors for analog and mixed-signal system design using Simscape. This thesis presents a memristor model in the Simscape language. Simscape has been used as it has the potential for modeling large systems. A memristor based programmable oscillator is also presented with simulation results and characterization. In addition, simulation results of different memristor models …
Date: May 2013
Creator: Gautam, Mahesh
System: The UNT Digital Library
Automated GUI Tests Generation for Android Apps Using Q-learning (open access)

Automated GUI Tests Generation for Android Apps Using Q-learning

Mobile applications are growing in popularity and pose new problems in the area of software testing. In particular, mobile applications heavily depend upon user interactions and a dynamically changing environment of system events. In this thesis, we focus on user-driven events and use Q-learning, a reinforcement machine learning algorithm, to generate tests for Android applications under test (AUT). We implement a framework that automates the generation of GUI test cases by using our Q-learning approach and compare it to a uniform random (UR) implementation. A novel feature of our approach is that we generate user-driven event sequences through the GUI, without the source code or the model of the AUT. Hence, considerable amount of cost and time are saved by avoiding the need for model generation for generating the tests. Our results show that the systematic path exploration used by Q-learning results in higher average code coverage in comparison to the uniform random approach.
Date: May 2017
Creator: Koppula, Sreedevi
System: The UNT Digital Library
Metamodeling-based Fast Optimization of  Nanoscale Ams-socs (open access)

Metamodeling-based Fast Optimization of Nanoscale Ams-socs

Modern consumer electronic systems are mostly based on analog and digital circuits and are designed as analog/mixed-signal systems on chip (AMS-SoCs). the integration of analog and digital circuits on the same die makes the system cost effective. in AMS-SoCs, analog and mixed-signal portions have not traditionally received much attention due to their complexity. As the fabrication technology advances, the simulation times for AMS-SoC circuits become more complex and take significant amounts of time. the time allocated for the circuit design and optimization creates a need to reduce the simulation time. the time constraints placed on designers are imposed by the ever-shortening time to market and non-recurrent cost of the chip. This dissertation proposes the use of a novel method, called metamodeling, and intelligent optimization algorithms to reduce the design time. Metamodel-based ultra-fast design flows are proposed and investigated. Metamodel creation is a one time process and relies on fast sampling through accurate parasitic-aware simulations. One of the targets of this dissertation is to minimize the sample size while retaining the accuracy of the model. in order to achieve this goal, different statistical sampling techniques are explored and applied to various AMS-SoC circuits. Also, different metamodel functions are explored for their …
Date: May 2012
Creator: Garitselov, Oleg
System: The UNT Digital Library
A Driver, Vehicle and Road Safety System Using Smartphones (open access)

A Driver, Vehicle and Road Safety System Using Smartphones

As vehicle manufacturers continue to increase their emphasis on safety with advanced driver assistance systems (ADAS), I propose a ubiquitous device that is able to analyze and advise on safety conditions. Mobile smartphones are increasing in popularity among younger generations with an estimated 64% of 25-34 year olds already using one in their daily lives. with over 10 million car accidents reported in the United States each year, car manufacturers have shifted their focus of a passive approach (airbags) to more active by adding features associated with ADAS (lane departure warnings). However, vehicles manufactured with these sensors are not economically priced while older vehicles might only have passive safety features. Given its accessibility and portability, I target a mobile smartphone as a device to compliment ADAS that can bring a driver assist to any vehicle without regards for any on-vehicle communication system requirements. I use the 3-axis accelerometer of multiple Android based smartphone to record and analyze various safety factors which can influence a driver while operating a vehicle. These influences with respect to the driver, vehicle and road are lane change maneuvers, vehicular comfort and road conditions. Each factor could potentially be hazardous to the health of the driver, …
Date: May 2012
Creator: Gozick, Brandon
System: The UNT Digital Library
GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction (open access)

GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction

In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and …
Date: May 2012
Creator: Griffin, Terry W.
System: The UNT Digital Library
Rapid Prototyping and Design of a Fast Random Number Generator (open access)

Rapid Prototyping and Design of a Fast Random Number Generator

Information in the form of online multimedia, bank accounts, or password usage for diverse applications needs some form of security. the core feature of many security systems is the generation of true random or pseudorandom numbers. Hence reliable generators of such numbers are indispensable. the fundamental hurdle is that digital computers cannot generate truly random numbers because the states and transitions of digital systems are well understood and predictable. Nothing in a digital computer happens truly randomly. Digital computers are sequential machines that perform a current state and move to the next state in a deterministic fashion. to generate any secure hash or encrypted word a random number is needed. But since computers are not random, random sequences are commonly used. Random sequences are algorithms that generate a pattern of values that appear to be random but after some time start repeating. This thesis implements a digital random number generator using MATLAB, FGPA prototyping, and custom silicon design. This random number generator is able to use a truly random CMOS source to generate the random number. Statistical benchmarks are used to test the results and to show that the design works. Thus the proposed random number generator will be useful …
Date: May 2012
Creator: Franco, Juan
System: The UNT Digital Library
A Global Stochastic Modeling Framework to Simulate and Visualize Epidemics (open access)

A Global Stochastic Modeling Framework to Simulate and Visualize Epidemics

Epidemics have caused major human and monetary losses through the course of human civilization. It is very important that epidemiologists and public health personnel are prepared to handle an impending infectious disease outbreak. the ever-changing demographics, evolving infrastructural resources of geographic regions, emerging and re-emerging diseases, compel the use of simulation to predict disease dynamics. By the means of simulation, public health personnel and epidemiologists can predict the disease dynamics, population groups at risk and their geographic locations beforehand, so that they are prepared to respond in case of an epidemic outbreak. As a consequence of the large numbers of individuals and inter-personal interactions involved in simulating infectious disease spread in a region such as a county, sizeable amounts of data may be produced that have to be analyzed. Methods to visualize this data would be effective in facilitating people from diverse disciplines understand and analyze the simulation. This thesis proposes a framework to simulate and visualize the spread of an infectious disease in a population of a region such as a county. As real-world populations have a non-homogeneous demographic and spatial distribution, this framework models the spread of an infectious disease based on population of and geographic distance between …
Date: May 2012
Creator: Indrakanti, Saratchandra
System: The UNT Digital Library
Cuff-less Blood Pressure Measurement Using a Smart Phone (open access)

Cuff-less Blood Pressure Measurement Using a Smart Phone

Blood pressure is vital sign information that physicians often need as preliminary data for immediate intervention during emergency situations or for regular monitoring of people with cardiovascular diseases. Despite the availability of portable blood pressure meters in the market, they are not regularly carried by people, creating a need for an ultra-portable measurement platform or device that can be easily carried and used at all times. One such device is the smartphone which, according to comScore survey is used by 26.2% of the US adult population. the mass production of these phones with built-in sensors and high computation power has created numerous possibilities for application development in different domains including biomedical. Motivated by this capability and their extensive usage, this thesis focuses on developing a blood pressure measurement platform on smartphones. Specifically, I developed a blood pressure measurement system on a smart phone using the built-in camera and a customized external microphone. the system consists of first obtaining heart beats using the microphone and finger pulse with the camera, and finally calculating the blood pressure using the recorded data. I developed techniques for finding the best location for obtaining the data, making the system usable by all categories of people. …
Date: May 2012
Creator: Jonnada, Srikanth
System: The UNT Digital Library
Indoor Localization Using Magnetic Fields (open access)

Indoor Localization Using Magnetic Fields

Indoor localization consists of locating oneself inside new buildings. GPS does not work indoors due to multipath reflection and signal blockage. WiFi based systems assume ubiquitous availability and infrastructure based systems require expensive installations, hence making indoor localization an open problem. This dissertation consists of solving the problem of indoor localization by thoroughly exploiting the indoor ambient magnetic fields comprising mainly of disturbances termed as anomalies in the Earth’s magnetic field caused by pillars, doors and elevators in hallways which are ferromagnetic in nature. By observing uniqueness in magnetic signatures collected from different campus buildings, the work presents the identification of landmarks and guideposts from these signatures and further develops magnetic maps of buildings - all of which can be used to locate and navigate people indoors. To understand the reason behind these anomalies, first a comparison between the measured and model generated Earth’s magnetic field is made, verifying the presence of a constant field without any disturbances. Then by modeling the magnetic field behavior of different pillars such as steel reinforced concrete, solid steel, and other structures like doors and elevators, the interaction of the Earth’s field with the ferromagnetic fields is described thereby explaining the causes of the …
Date: December 2011
Creator: Pathapati Subbu, Kalyan Sasidhar
System: The UNT Digital Library
The Design Of A Benchmark For Geo-stream Management Systems (open access)

The Design Of A Benchmark For Geo-stream Management Systems

The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
Date: December 2011
Creator: Shen, Chao
System: The UNT Digital Library
Investigating the Extractive Summarization of Literary Novels (open access)

Investigating the Extractive Summarization of Literary Novels

Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems …
Date: December 2011
Creator: Ceylan, Hakan
System: The UNT Digital Library

SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction

Access: Use of this item is restricted to the UNT Community
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Date: August 2019
Creator: Florescu, Corina Andreea
System: The UNT Digital Library

Enhanced Approach for the Classification of Ulcerative Colitis Severity in Colonoscopy Videos Using CNN

Access: Use of this item is restricted to the UNT Community
Ulcerative colitis (UC) is a chronic inflammatory disease characterized by periods of relapses and remissions affecting more than 500,000 people in the United States. To achieve the therapeutic goals of UC, which are to first induce and then maintain disease remission, doctors need to evaluate the severity of UC of a patient. However, it is very difficult to evaluate the severity of UC objectively because of non-uniform nature of symptoms and large variations in their patterns. To address this, in our previous works, we developed two different approaches in which one is using the image textures, and the other is using CNN (convolutional neural network) to measure and classify objectively the severity of UC presented in optical colonoscopy video frames. But, we found that the image texture based approach could not handle larger number of variations in their patterns, and the CNN based approach could not achieve very high accuracy. In this paper, we improve our CNN based approach in two ways to provide better accuracy for the classification. We add more thorough and essential preprocessing, and generate more classes to accommodate large variations in their patterns. The experimental results show that the proposed preprocessing can improve the overall accuracy …
Date: August 2019
Creator: Sure, Venkata Leela
System: The UNT Digital Library
BSM Message and Video Streaming Quality Comparative Analysis Using Wave Short Message Protocol (WSMP) (open access)

BSM Message and Video Streaming Quality Comparative Analysis Using Wave Short Message Protocol (WSMP)

Vehicular ad-hoc networks (VANETs) are used for vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. The IEEE 802.11p/WAVE (Wireless Access in Vehicular Environment) and with WAVE Short Messaging Protocol (WSMP) has been proposed as the standard protocol for designing applications for VANETs. This communication protocol must be thoroughly tested before reliable and efficient applications can be built using its protocols. In this paper, we perform on-road experiments in a variety of scenarios to evaluate the performance of the standard. We use commercial VANET devices with 802.11p/WAVE compliant chipsets for both BSM (basic safety messages) as well as video streaming applications using WSMP as a communication protocol. We show that while the standard performs well for BSM application in lightly loaded conditions, the performance becomes inferior when traffic and other performance metric increases. Furthermore, we also show that the standard is not suitable for video streaming due to the bursty nature of traffic and the bandwidth throttling, which is a major shortcoming for V2X applications.
Date: August 2019
Creator: Win, Htoo Aung
System: The UNT Digital Library

Mining Biomedical Data for Hidden Relationship Discovery

Access: Use of this item is restricted to the UNT Community
With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
Date: August 2019
Creator: Dharmavaram, Sirisha
System: The UNT Digital Library
Skin Detection in Image and Video Founded in Clustering and Region Growing (open access)

Skin Detection in Image and Video Founded in Clustering and Region Growing

Researchers have been involved for decades in search of an efficient skin detection method. Yet current methods have not overcome the major limitations. To overcome these limitations, in this dissertation, a clustering and region growing based skin detection method is proposed. These methods together with a significant insight result in a more effective algorithm. The insight concerns a capability to define dynamically the number of clusters in a collection of pixels organized as an image. In clustering for most problem domains, the number of clusters is fixed a priori and does not perform effectively over a wide variety of data contents. Therefore, in this dissertation, a skin detection method has been proposed using the above findings and validated. This method assigns the number of clusters based on image properties and ultimately allows freedom from manual thresholding or other manual operations. The dynamic determination of clustering outcomes allows for greater automation of skin detection when dealing with uncertain real-world conditions.
Date: August 2019
Creator: Islam, A B M Rezbaul
System: The UNT Digital Library
Modeling Synergistic Relationships Between Words and Images (open access)

Modeling Synergistic Relationships Between Words and Images

Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept. By uncovering synergistic, semantic relationships that exist between words and images, I am working to develop novel techniques that can help improve tasks in natural language processing, as well as effective models for text-to-image synthesis, image retrieval, and automatic image annotation. Specifically, in my dissertation, I will explore the interoperability of features between language and vision tasks. In the first part, I will show how it is possible to apply features generated using evidence gathered from text corpora to solve the image annotation problem in computer vision, without the use of any visual information. In the second part, I will address research in the reverse direction, and show how visual cues can be used to improve tasks in natural language processing. Importantly, I propose a novel metric to estimate the similarity of words by comparing the visual similarity of concepts invoked by these words, and show that it can be used further to advance the state-of-the-art methods that employ corpus-based and knowledge-based semantic similarity measures. Finally, I attempt to construct a joint semantic space connecting words with images, and synthesize an evaluation framework to quantify cross-modal …
Date: December 2012
Creator: Leong, Chee Wee
System: The UNT Digital Library
Automated Classification of Emotions Using Song Lyrics (open access)

Automated Classification of Emotions Using Song Lyrics

This thesis explores the classification of emotions in song lyrics, using automatic approaches applied to a novel corpus of 100 popular songs. I use crowd sourcing via Amazon Mechanical Turk to collect line-level emotions annotations for this collection of song lyrics. I then build classifiers that rely on textual features to automatically identify the presence of one or more of the following six Ekman emotions: anger, disgust, fear, joy, sadness and surprise. I compare different classification systems and evaluate the performance of the automatic systems against the manual annotations. I also introduce a system that uses data collected from the social network Twitter. I use the Twitter API to collect a large corpus of tweets manually labeled by their authors for one of the six emotions of interest. I then compare the classification of emotions obtained when training on data automatically collected from Twitter versus data obtained through crowd sourced annotations.
Date: December 2012
Creator: Schellenberg, Rajitha
System: The UNT Digital Library