Outliers and Regression Models (open access)

Outliers and Regression Models

The mitigation of outliers serves to increase the strength of a relationship between variables. This study defined outliers in three different ways and used five regression procedures to describe the effects of outliers on 50 data sets. This study also examined the relationship among the shape of the distribution, skewness, and outliers.
Date: May 1992
Creator: Mitchell, Napoleon
System: The UNT Digital Library
Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching. (open access)

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching.

Previous research implementing stratification on the propensity score has generally relied on using five strata, based on prior theoretical groundwork and minimal empirical evidence as to the suitability of quintiles to adequately reduce bias in all cases and across all sample sizes. This study investigates bias reduction across varying number of strata and sample sizes via a large-scale simulation to determine the adequacy of quintiles for bias reduction under all conditions. Sample sizes ranged from 100 to 50,000 and strata from 3 to 20. Both the percentage of bias reduction and the standardized selection bias were examined. The results show that while the particular covariates in the simulation met certain criteria with five strata that greater bias reduction could be achieved by increasing the number of strata, especially with larger sample sizes. Simulation code written in R is included.
Date: May 2010
Creator: Akers, Allen
System: The UNT Digital Library
The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression (open access)

The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression

There are several conditions for applying equal weights as an alternative to least squares weights. Psychometric parallelism, one of the conditions, has been suggested as a necessary and sufficient condition for equal-weights aggregation. The purpose of this study is to investigate the effect of psychometric parallelism among predictors on the efficiency of equal weights and least squares weights. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of psychometric parallelism. Five hundred samples with six ratios of observation to predictor = 5/1, 10/1, 20/1, 30/1, 40/1, and 50/1 were drawn from each population. The efficiency is interpreted as the accuracy and the predictive power estimated by the weighting methods. The accuracy is defined by the deviation between the population R² and the sample R² . The predictive power is referred to as the population cross-validated R² and the population mean square error of prediction. The findings indicate there is no statistically significant relationship between the level of psychometric parallelism and the accuracy of least squares weights. In contrast, the correlation between the level of psychometric parallelism and the accuracy of equal weights is significantly negative. Under different conditions, the minimum p value of χ² …
Date: May 1996
Creator: Zhang, Desheng
System: The UNT Digital Library
Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples (open access)

Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples

This study examined the bias and precision of four residualized variable validity estimates (C0, C1, C2, C3) across a number of study conditions. Validity estimates that considered measurement error, correlations among error scores, and correlations between error scores and true scores (C3) performed the best, yielding no estimates that were practically significantly different than their respective population parameters, across study conditions. Validity estimates that considered measurement error and correlations among error scores (C2) did a good job in yielding unbiased, valid, and precise results. Only in a select number of study conditions were C2 estimates unable to be computed or produced results that had sufficient variance to affect interpretation of results. Validity estimates based on observed scores (C0) fared well in producing valid, precise, and unbiased results. Validity estimates based on observed scores that were only corrected for measurement error (C1) performed the worst. Not only did they not reliably produce estimates even when the level of modeled correlated error was low, C1 produced values higher than the theoretical limit of 1.0 across a number of study conditions. Estimates based on C1 also produced the greatest number of conditions that were practically significantly different than their population parameters.
Date: May 2013
Creator: Nimon, Kim
System: The UNT Digital Library
Ability Estimation Under Different Item Parameterization and Scoring Models (open access)

Ability Estimation Under Different Item Parameterization and Scoring Models

A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy …
Date: May 2002
Creator: Si, Ching-Fung B.
System: The UNT Digital Library
Establishing the utility of a classroom effectiveness index as a teacher accountability system. (open access)

Establishing the utility of a classroom effectiveness index as a teacher accountability system.

How to identify effective teachers who improve student achievement despite diverse student populations and school contexts is an ongoing discussion in public education. The need to show communities and parents how well teachers and schools improve student learning has led districts and states to seek a fair, equitable and valid measure of student growth using student achievement. This study investigated a two stage hierarchical model for estimating teacher effect on student achievement. This measure was entitled a Classroom Effectiveness Index (CEI). Consistency of this model over time, outlier influences in individual CEIs, variance among CEIs across four years, and correlations of second stage student residuals with first stage student residuals were analyzed. The statistical analysis used four years of student residual data from a state-mandated mathematics assessment (n=7086) and a state-mandated reading assessment (n=7572) aggregated by teacher. The study identified the following results. Four years of district grand slopes and grand intercepts were analyzed to show consistent results over time. Repeated measures analyses of grand slopes and intercepts in mathematics were statistically significant at the .01 level. Repeated measures analyses of grand slopes and intercepts in reading were not statistically significant. The analyses indicated consistent results over time for reading …
Date: May 2002
Creator: Bembry, Karen L.
System: The UNT Digital Library
The Supply and Demand of Physician Assistants in the United States: A Trend Analysis (open access)

The Supply and Demand of Physician Assistants in the United States: A Trend Analysis

The supply of non-physician clinicians (NPCs), such as physician assistant (PAs), could significantly influence demand requirements in medical workforce projections. This study predicts supply of and demand for PAs from 2006 to 2020. The PA supply model utilized the number of certified PAs, the educational capacity (at 10% and 25% expansion) with assumed attrition rates, and retirement assumptions. Gross domestic product (GDP) chained in 2000 dollar and US population were utilized in a transfer function trend analyses with the number of PAs as the dependent variable for the PA demand model. Historical analyses revealed strong correlations between GDP and US population with the number of PAs. The number of currently certified PAs represents approximately 75% of the projected demand. At 10% growth, the supply and demand equilibrium for PAs will be reached in 2012. A 25% increase in new entrants causes equilibrium to be met one year earlier. Robust application trends in PA education enrollment (2.2 applicants per seat for PAs is the same as for allopathic medical school applicants) support predicted increases. However, other implications for the PA educational institutions include recruitment and retention of qualified faculty, clinical site maintenance and diversity of matriculates. Further research on factors affecting …
Date: May 2007
Creator: Orcutt, Venetia L.
System: The UNT Digital Library
The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices (open access)

The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The …
Date: May 1988
Creator: Dutschke, Cynthia F. (Cynthia Fleming)
System: The UNT Digital Library
A Comparison of Three Methods of Detecting Test Item Bias (open access)

A Comparison of Three Methods of Detecting Test Item Bias

This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three …
Date: May 1985
Creator: Monaco, Linda Gokey
System: The UNT Digital Library
A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas (open access)

A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas

Effective utilization of instructional technology can provide a valuable method for the delivery of a school program, and enable a teacher to individualize according to student needs. Implementation of such a program is costly and requires careful planning and adequate staff development for school personnel. This study examined the degree of commitment by Texas school districts to the use of the latest technologies in their efforts to revolutionize education. Quantitative data were collected by using a survey that included five informational areas: (1) school district background, (2) funding for budget, (3) staff, (4) technology hardware, and (5) staff development. The study included 137 school districts representing the 5 University Interscholastic League (UIL) classifications (A through AAAAA). The survey was mailed to the school superintendents requesting that the persons most familiar with instructional technology be responsible for completing the questionnaires. Analysis of data examined the relationship between UIL classification and the amount of money expended on instructional technology. Correlation coefficients were determined between teachers receiving training in the use of technology and total personnel assigned to technology positions. Coefficients were calculated between a district providing a plan fortechnology and employment of a coordinator for instructional technology. Significance was established at the …
Date: May 1990
Creator: Hiett, Elmer D. (Elmer Donald)
System: The UNT Digital Library
A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test (open access)

A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling …
Date: May 1987
Creator: Somboon Suriyawongse
System: The UNT Digital Library
A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables (open access)

A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.
Date: May 1987
Creator: Mullen, Jerry D. (Jerry Davis)
System: The UNT Digital Library
Willingness of Educators to Participate in a Descriptive Research Study as a Function of a Monetary Incentive (open access)

Willingness of Educators to Participate in a Descriptive Research Study as a Function of a Monetary Incentive

The problem considered involved assessing willingness of educators to participate in a study offering monetary incentives. Determination of willingness was implemented by sending educators a packet requesting return of a postcard to indicate willingness to participate. The purpose was twofold: to determine the effect of a monetary incentive upon willingness of educators to participate in a research study, and to analyze implications for mail questionnaire studies. A sample of 600 educators was chosen from directories of eleven public schools in north Texas. It included equal numbers of male and female teachers and male and female administrators. Subjects were assigned to one of twelve groups. No two from a school were assigned to different levels of the inducement variable.
Date: May 1984
Creator: Pittman, Doyle
System: The UNT Digital Library
Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty (open access)

Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty

The purpose of this study was to determine if leniency and halo error in student ratings could be reduced by training the student raters and by using a Behaviorally Anchored Rating Scale (BARS) rather than a Likert scale. Two hypotheses were proposed. First, the ratings collected from the trained raters would contain less halo and leniency error than those collected from the untrained raters. Second, within the group of trained raters the BARS would contain less halo and leniency error than the Likert instrument.
Date: May 1987
Creator: Cook, Stuart S. (Stuart Sheldon)
System: The UNT Digital Library
A Comparison of Three Correlational Procedures for Factor-Analyzing Dichotomously-Scored Item Response Data (open access)

A Comparison of Three Correlational Procedures for Factor-Analyzing Dichotomously-Scored Item Response Data

In this study, an improved correlational procedure for factor-analyzing dichotomously-scored item response data is described and tested. The procedure involves (a) replacing the dichotomous input values with continuous probability values obtained through Rasch analysis; (b) calculating interitem product-moment correlations among the probabilities; and (c) subjecting the correlations to unweighted least-squares factor analysis. Two simulated data sets and an empirical data set (Kentucky Comprehensive Listening Test responses) were used to compare the new procedure with two more traditional techniques, using (a) phi and (b) tetrachoric correlations calculated directly from the dichotomous item-response values. The three methods were compared on three criterion measures: (a) maximum internal correlation; (b) product of the two largest factor loadings; and (c) proportion of variance accounted for. The Rasch-based procedure is recommended for subjecting dichotomous item response data to latent-variable analysis.
Date: May 1991
Creator: Fluke, Ricky
System: The UNT Digital Library
An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation (open access)

An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation

This study sought to determine boundary conditions for robustness of the Tukey HSD statistic when the assumptions of homogeneity of variance were violated. Box's coefficient of variance variation, C^2 , was utilized to index the degree of variance heterogeneity. A Monte Carlo computer simulation technique was employed to generate data under controlled violation of the homogeneity of variance assumption. For each sample size and number of treatment groups condition, an analysis of variance F-test was computed, and Tukey's multiple comparison technique was calculated. When the two additional sample size cases were added to investigate the large sample sizes, the Tukey test was found to be conservative when C^2 was set at zero. The actual significance level fell below the lower limit of the 95 per cent confidence interval around the 0.05 nominal significance level.
Date: May 1980
Creator: Strozeski, Michael W.
System: The UNT Digital Library
An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Unequal Sample Sizes, Utilizing Kramer's Procedure and the Harmonic Mean (open access)

An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Unequal Sample Sizes, Utilizing Kramer's Procedure and the Harmonic Mean

This study sought to determine the effect upon Tukey's Honestly Significant Difference (HSD) statistic of concurrently violating the assumptions of homogeneity of variance and equal sample sizes. Two forms for the unequal sample size problem were investigated. Kramer's form and the harmonic mean approach were the two unequal sample size procedures studied. The study employed a Monte Carlo simulation procedure which varied sample sizes with a heterogeneity of variance condition. Four thousand experiments were generated. Findings of this study were based upon the empirically obtained significance levels. Five conclusions were reached in this study. The first conclusion was that for the conditions of this study the Kramer form of the HSD statistic is not robust at the .05 or .01 nominal level of significance. A second conclusion was that the harmonic mean form of the HSD statistic is not robust at the .05 and .01 nominal level of significance. A general conclusion reached from all the findings formed the third conclusion. It was that the Kramer form of the HSD test is the preferred procedure under combined assumption violations of variance heterogeneity and unequal sample sizes. Two additional conclusions are based on related findings. The fourth conclusion was that for …
Date: May 1976
Creator: McKinney, William Lane
System: The UNT Digital Library
A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty (open access)

A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty

The problem of this study was to examine the equivalency of two different types of criterion-referenced item-selection techniques on simulated data as item pools varied in degrees of item difficulty. A pretest-posttest design was employed in which pass-fail scores were randomly generated for item pools of twenty-five items. From the item pools, the two techniques determined which items were to be used to make up twelve-item criterion-referenced tests. The twenty-five items also were rank ordered according to the discrimination power of the two techniques.
Date: May 1974
Creator: Davis, Robbie G.
System: The UNT Digital Library
Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities (open access)

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities

The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.
Date: May 2003
Creator: Alexander, Erika D.
System: The UNT Digital Library
Using Posterior Predictive Checking of Item Response Theory Models to Study Invariance Violations (open access)

Using Posterior Predictive Checking of Item Response Theory Models to Study Invariance Violations

The common practice for testing measurement invariance is to constrain parameters to be equal over groups, and then evaluate the model-data fit to reject or fail to reject the restrictive model. Posterior predictive checking (PPC) provides an alternative approach to evaluating model-data discrepancy. This paper explores the utility of PPC in estimating measurement invariance. The simulation results show that the posterior predictive p (PP p) values of item parameter estimates respond to various invariance violations, whereas the PP p values of item-fit index may fail to detect such violations. The current paper suggests comparing group estimates and restrictive model estimates with posterior predictive distributions in order to demonstrate the pattern of misfit graphically.
Date: May 2017
Creator: Xin, Xin
System: The UNT Digital Library
Structural Validity and Item Functioning of the LoTi Digital-Age Survey. (open access)

Structural Validity and Item Functioning of the LoTi Digital-Age Survey.

The present study examined the structural construct validity of the LoTi Digital-Age Survey, a measure of teacher instructional practices with technology in the classroom. Teacher responses (N = 2840) from across the United States were used to assess factor structure of the instrument using both exploratory and confirmatory analyses. Parallel analysis suggests retaining a five-factor solution compared to the MAP test that suggests retaining a three-factor solution. Both analyses (EFA and CFA) indicate that changes need to be made to the current factor structure of the survey. The last two factors were composed of items that did not cover or accurately measure the content of the latent trait. Problematic items, such as items with crossloadings, were discussed. Suggestions were provided to improve the factor structure, items, and scale of the survey.
Date: May 2011
Creator: Mehta, Vandhana
System: The UNT Digital Library
Spatial Ability, Motivation, and Attitude of Students as Related to Science Achievement (open access)

Spatial Ability, Motivation, and Attitude of Students as Related to Science Achievement

Understanding student achievement in science is important as there is an increasing reliance of the U.S. economy on math, science, and technology-related fields despite the declining number of youth seeking college degrees and careers in math and science. A series of structural equation models were tested using the scores from a statewide science exam for 276 students from a suburban north Texas public school district at the end of their 5th grade year and the latent variables of spatial ability, motivation to learn science and science-related attitude. Spatial ability was tested as a mediating variable on motivation and attitude; however, while spatial ability had statistically significant regression coefficients with motivation and attitude, spatial ability was found to be the sole statistically significant predictor of science achievement for these students explaining 23.1% of the variance in science scores.
Date: May 2011
Creator: Bolen, Judy Ann
System: The UNT Digital Library
Construct Validation and Measurement Invariance of the Athletic Coping Skills Inventory for Educational Settings (open access)

Construct Validation and Measurement Invariance of the Athletic Coping Skills Inventory for Educational Settings

The present study examined the factor structure and measurement invariance of the revised version of the Athletic Coping Skills Inventory (ACSI-28), following adjustment of the wording of items such that they were appropriate to assess Coping Skills in an educational setting. A sample of middle school students (n = 1,037) completed the revised inventory. An initial confirmatory factor analysis led to the hypothesis of a better fitting model with two items removed. Reliability of the subscales and the instrument as a whole was acceptable. Items were examined for sex invariance with differential item functioning (DIF) using item response theory, and five items were flagged for significant sex non-invariance. Following removal of these items, comparison of the mean differences between male and female coping scores revealed that there was no significant difference between the two groups. Further examination of the generalizability of the coping construct and the potential transfer of psychosocial skills between athletic and academic settings are warranted.
Date: May 2017
Creator: Sanguras, Laila Y., 1977-
System: The UNT Digital Library