Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers (open access)

Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers

In the Information Age, a proliferation of unstructured text electronic documents exists. Processing these documents by humans is a daunting task as humans have limited cognitive abilities for processing large volumes of documents that can often be extremely lengthy. To address this problem, text data computer algorithms are being developed. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are two text data computer algorithms that have received much attention individually in the text data literature for topic extraction studies but not for document classification nor for comparison studies. Since classification is considered an important human function and has been studied in the areas of cognitive science and information science, in this dissertation a research study was performed to compare LDA, LSA and humans as document classifiers. The research questions posed in this study are: R1: How accurate is LDA and LSA in classifying documents in a corpus of textual data over a known set of topics? R2: How accurate are humans in performing the same classification task? R3: How does LDA classification performance compare to LSA classification performance? To address these questions, a classification study involving human subjects was designed where humans were asked to generate and classify documents …
Date: December 2011
Creator: Anaya, Leticia H.
System: The UNT Digital Library
Investigating the relationship between the business performance management framework and the Malcolm Baldrige National Quality Award framework. (open access)

Investigating the relationship between the business performance management framework and the Malcolm Baldrige National Quality Award framework.

The business performance management (BPM) framework helps an organization continuously adjust and successfully execute its strategies. BPM helps increase flexibility by providing managers with an early alert about changes and, as a result, allows faster response to such changes. The Malcolm Baldrige National Quality Award (MBNQA) framework provides a basis for self-assessment and a systems perspective for managing an organization's key processes for achieving business results. The MBNQA framework is a more comprehensive framework and encapsulates the underlying constructs in the BPM framework. The objectives of this dissertation are fourfold: (1) to validate the underlying relationships presented in the 2008 MBNQA framework, (2) to explore the MBNQA framework at the dimension level, and develop and test constructs measured at that level in a causal model, (3) to validate and create a common general framework for the business performance model by integrating the practitioner literature with basic theory including existing MBNQA theory, and (4) to integrate the BPM framework and the MBNQA framework into a new framework (BPM-MBNQA framework) that can guide organizations in their journey toward achieving and sustaining competitive and strategic advantages. The purpose of this study is to achieve these objectives by means of a combination of methodologies …
Date: August 2009
Creator: Hossain, Muhammad Muazzem
System: The UNT Digital Library
Accuracy and Interpretability Testing of Text Mining Methods (open access)

Accuracy and Interpretability Testing of Text Mining Methods

Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the …
Date: August 2013
Creator: Ashton, Triss A.
System: The UNT Digital Library
A Simulation Study Comparing Various Confidence Intervals for the Mean of Voucher Populations in Accounting (open access)

A Simulation Study Comparing Various Confidence Intervals for the Mean of Voucher Populations in Accounting

This research examined the performance of three parametric methods for confidence intervals: the classical, the Bonferroni, and the bootstrap-t method, as applied to estimating the mean of voucher populations in accounting. Usually auditing populations do not follow standard models. The population for accounting audits generally is a nonstandard mixture distribution in which the audit data set contains a large number of zero values and a comparatively small number of nonzero errors. This study assumed a situation in which only overstatement errors exist. The nonzero errors were assumed to be normally, exponentially, and uniformly distributed. Five indicators of performance were used. The classical method was found to be unreliable. The Bonferroni method was conservative for all population conditions. The bootstrap-t method was excellent in terms of reliability, but the lower limit of the confidence intervals produced by this method was unstable for all population conditions. The classical method provided the shortest average width of the confidence intervals among the three methods. This study provided initial evidence as to how the parametric bootstrap-t method performs when applied to the nonstandard distribution of audit populations of line items. Further research should provide a reliable confidence interval for a wider variety of accounting populations.
Date: December 1992
Creator: Lee, Ihn Shik
System: The UNT Digital Library
Links among perceived service quality, patient satisfaction and behavioral intentions in the urgent care industry: Empirical evidence from college students. (open access)

Links among perceived service quality, patient satisfaction and behavioral intentions in the urgent care industry: Empirical evidence from college students.

Patient perceptions of health care quality are critical to a health care service provider's long-term success because of the significant influence perceptions have on customer satisfaction and consequently organization financial performance. Patient satisfaction affects not only the outcome of the health care process such as patient compliance with physician advice and treatment, but also patient retention and favorable word-of-mouth. Accordingly, it is a critical strategy for health care organizations to provide quality service and address patient satisfaction. The urgent care (UC) industry is an integral part of the health care system in the United States that has been experiencing a rapid growth. UC provides a wide range of medical services for a large group of patients and now serves an increasing population. UC is becoming popular because of the convenient locations, extended hours, walk-in policy, short waiting times, and accessibility. A closer examination of the current health care research, however, indicates that there is a paucity of research on urgent care providers. Confronted with the emergence of the urgent care industry and the increasing demand for urgent care, it is necessary to understand how patients perceive urgent care providers and what influences patient satisfaction and retention. This dissertation addresses four …
Date: August 2009
Creator: Qin, Hong
System: The UNT Digital Library
Call Option Premium Dynamics (open access)

Call Option Premium Dynamics

This study has a twofold purpose: to demonstrate the use of the Marquardt compromise method in estimating the unknown parameters contained in the probability call-option pricing models and to test empirically the following models: the Boness, the Black-Scholes, the Merton proportional dividend, the Ingersoll differential tax, and the Ingersoll proportional dividend and differential tax.
Date: December 1982
Creator: Chen, Jim
System: The UNT Digital Library
The Chi Square Approximation to the Hypergeometric Probability Distribution (open access)

The Chi Square Approximation to the Hypergeometric Probability Distribution

This study compared the results of his chi square text of independence and the corrected chi square statistic against Fisher's exact probability test (the hypergeometric distribution) in contection with sampling from a finite population. Data were collected by advancing the minimum call size from zero to a maximum which resulted in a tail area probability of 20 percent for sample sizes from 10 to 100 by varying increments. Analysis of the data supported the rejection of the null hypotheses regarding the general rule-of-thumb guidelines concerning sample size, minimum cell expected frequency and the continuity correction factor. it was discovered that the computation using Yates' correction factor resulted in values which were so overly conservative (i.e. tail area porobabilities that were 20 to 50 percent higher than Fisher's exact test) that conclusions drawn from this calculation might prove to be inaccurate. Accordingly, a new correction factor was proposed which eliminated much of this discrepancy. Its performance was equally consistent with that of the uncorrected chi square statistic and at times, even better.
Date: August 1982
Creator: Anderson, Randy J. (Randy Jay)
System: The UNT Digital Library
Validation and Investigation of the Four Aspects of Cycle Regression: A New Algorithm for Extracting Cycles (open access)

Validation and Investigation of the Four Aspects of Cycle Regression: A New Algorithm for Extracting Cycles

The cycle regression analysis algorithm is the most recent addition to a group of techniques developed to detect "hidden periodicities." This dissertation investigates four major aspects of the algorithm. The objectives of this research are 1. To develop an objective method of obtaining an initial estimate of the cycle period? the present procedure of obtaining this estimate involves considerable subjective judgment; 2. To validate the algorithm's success in extracting cycles from multi-cylical data; 3. To determine if a consistent relationship exists among the smallest amplitude, the error standard deviation, and the number of replications of a cycle contained in the data; 4. To investigate the behavior of the algorithm in the predictions of major drops.
Date: December 1982
Creator: Mehta, Mayur Ravishanker
System: The UNT Digital Library
Robustness of Parametric and Nonparametric Tests When Distances between Points Change on an Ordinal Measurement Scale (open access)

Robustness of Parametric and Nonparametric Tests When Distances between Points Change on an Ordinal Measurement Scale

The purpose of this research was to evaluate the effect on parametric and nonparametric tests using ordinal data when the distances between points changed on the measurement scale. The research examined the performance of Type I and Type II error rates using selected parametric and nonparametric tests.
Date: August 1994
Creator: Chen, Andrew H. (Andrew Hwa-Fen)
System: The UNT Digital Library
The Comparative Effects of Varying Cell Sizes on Mcnemar's Test with the Χ^2 Test of Independence and T Test for Related Samples (open access)

The Comparative Effects of Varying Cell Sizes on Mcnemar's Test with the Χ^2 Test of Independence and T Test for Related Samples

This study compared the results for McNemar's test, the t test for related measures, and the chi-square test of independence as cell sized varied in a two-by-two frequency table. In this study. the probability results for McNemar's rest, the t test for related measures, and the chi-square test of independence were compared for 13,310 different combinations of cell sizes in a two-by-two design. Several conclusions were reached: With very few exceptions, the t test for related measures and McNemar's test yielded probability results within .002 of each other. The chi-square test seemed to equal the other two tests consistently only when low probabilities less than or equal to .001 were attained. It is recommended that the researcher consider using the t test for related measures as a viable option for McNemar's test except when the researcher is certain he/she is only interested in 'changes'. The chi-square test of independence not only tests a different hypothesis than McNemar's test, but it often yields greatly differing results from McNemar's test.
Date: August 1980
Creator: Black, Kenneth U.
System: The UNT Digital Library
Application of Spectral Analysis to the Cycle Regression Algorithm (open access)

Application of Spectral Analysis to the Cycle Regression Algorithm

Many techniques have been developed to analyze time series. Spectral analysis and cycle regression analysis represent two such techniques. This study combines these two powerful tools to produce two new algorithms; the spectral algorithm and the one-pass algorithm. This research encompasses four objectives. The first objective is to link spectral analysis with cycle regression analysis to determine an initial estimate of the sinusoidal period. The second objective is to determine the best spectral window and truncation point combination to use with cycle regression for the initial estimate of the sinusoidal period. The third is to determine whether the new spectral algorithm performs better than the old T-value algorithm in estimating sinusoidal parameters. The fourth objective is to determine whether the one-pass algorithm can be used to estimate all significant harmonics simultaneously.
Date: August 1984
Creator: Shah, Vivek
System: The UNT Digital Library
Comparing the Powers of Several Proposed Tests for Testing the Equality of the Means of Two Populations When Some Data Are Missing (open access)

Comparing the Powers of Several Proposed Tests for Testing the Equality of the Means of Two Populations When Some Data Are Missing

In comparing the means .of two normally distributed populations with unknown variance, two tests very often used are: the two independent sample and the paired sample t tests. There is a possible gain in the power of the significance test by using the paired sample design instead of the two independent samples design.
Date: May 1994
Creator: Dunu, Emeka Samuel
System: The UNT Digital Library