Nonparametric Estimation of Receiver Operating Characteristic Surfaces Via Bernstein Polynomials (open access)

Nonparametric Estimation of Receiver Operating Characteristic Surfaces Via Bernstein Polynomials

Receiver operating characteristic (ROC) analysis is one of the most widely used methods in evaluating the accuracy of a classification method. It is used in many areas of decision making such as radiology, cardiology, machine learning as well as many other areas of medical sciences. The dissertation proposes a novel nonparametric estimation method of the ROC surface for the three-class classification problem via Bernstein polynomials. The proposed ROC surface estimator is shown to be uniformly consistent for estimating the true ROC surface. In addition, it is shown that the map from which the proposed estimator is constructed is Hadamard differentiable. The proposed ROC surface estimator is also demonstrated to lead to the explicit expression for the estimated volume under the ROC surface . Moreover, the exact mean squared error of the volume estimator is derived and some related results for the mean integrated squared error are also obtained. To assess the performance and accuracy of the proposed ROC and volume estimators, Monte-Carlo simulations are conducted. Finally, the method is applied to the analysis of two real data sets.
Date: December 2012
Creator: Herath, Dushanthi N.
System: The UNT Digital Library
Semi-supervised and Self-evolving Learning Algorithms with Application to Anomaly Detection in Cloud Computing (open access)

Semi-supervised and Self-evolving Learning Algorithms with Application to Anomaly Detection in Cloud Computing

Semi-supervised learning (SSL) is the most practical approach for classification among machine learning algorithms. It is similar to the humans way of learning and thus has great applications in text/image classification, bioinformatics, artificial intelligence, robotics etc. Labeled data is hard to obtain in real life experiments and may need human experts with experimental equipments to mark the labels, which can be slow and expensive. But unlabeled data is easily available in terms of web pages, data logs, images, audio, video les and DNA/RNA sequences. SSL uses large unlabeled and few labeled data to build better classifying functions which acquires higher accuracy and needs lesser human efforts. Thus it is of great empirical and theoretical interest. We contribute two SSL algorithms (i) adaptive anomaly detection (AAD) (ii) hybrid anomaly detection (HAD), which are self evolving and very efficient to detect anomalies in a large scale and complex data distributions. Our algorithms are capable of modifying an existing classier by both retiring old data and adding new data. This characteristic enables the proposed algorithms to handle massive and streaming datasets where other existing algorithms fail and run out of memory. As an application to semi-supervised anomaly detection and for experimental illustration, we …
Date: December 2012
Creator: Pannu, Husanbir Singh
System: The UNT Digital Library