The Cluster Hypothesis: A Visual/Statistical Analysis

Access: Use of this item is restricted to the UNT Community
By allowing judgments based on a small number of exemplar documents to be applied to a larger number of unexamined documents, clustered presentation of search results represents an intuitively attractive possibility for reducing the cognitive resource demands on human users of information retrieval systems. However, clustered presentation of search results is sensible only to the extent that naturally occurring similarity relationships among documents correspond to topically coherent clusters. The Cluster Hypothesis posits just such a systematic relationship between document similarity and topical relevance. To date, experimental validation of the Cluster Hypothesis has proved problematic, with collection-specific results both supporting and failing to support this fundamental theoretical postulate. The present study consists of two computational information visualization experiments, representing a two-tiered test of the Cluster Hypothesis under adverse conditions. Both experiments rely on multidimensionally scaled representations of interdocument similarity matrices. Experiment 1 is a term-reduction condition, in which descriptive titles are extracted from Associated Press news stories drawn from the TREC information retrieval test collection. The clustering behavior of these titles is compared to the behavior of the corresponding full text via statistical analysis of the visual characteristics of a two-dimensional similarity map. Experiment 2 is a dimensionality reduction condition, in …
Date: May 2000
Creator: Sullivan, Terry
System: The UNT Digital Library

A Study of Graphically Chosen Features for Representation of TREC Topic-Document Sets

Access: Use of this item is restricted to the UNT Community
Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and …
Date: May 2000
Creator: Oyarce, Guillermo Alfredo
System: The UNT Digital Library

The Validity of Health Claims on the World Wide Web: A Case Study of the Herbal Remedy Opuntia

Access: Use of this item is restricted to the UNT Community
The World Wide Web has become a significant source of medical information for the public, but there is concern that much of the information is inaccurate, misleading, and unsupported by scientific evidence. This study analyzes the validity of health claims on the World Wide Web for the herbal Opuntia using an evidence-based approach, and supports the observation that individuals must critically assess health information in this relatively new medium of communication. A systematic search by means of nine search engines and online resources of Web sites relating to herbal remedies was conducted and specific sites providing information on the cactus herbal remedy from the genus Opuntia were retrieved. Validity of therapeutic health claims on the Web sites was checked by comparison with reports in the scientific literature subjected to two established quality assessment rating instruments. 184 Web sites from a variety of sources were retrieved and evaluated, and 98 distinct health claims were identified. 53 scientific reports were retrieved to validate claims. 25 involved human subjects, and 28 involved animal or laboratory models. Only 33 (34%) of the claims were addressed in the scientific literature. For 3% of the claims, evidence from the scientific reports was conflicting or contradictory. Of …
Date: May 2000
Creator: Veronin, Michael A.
System: The UNT Digital Library