A Comparison of Three Methods of Detecting Test Item Bias (open access)

A Comparison of Three Methods of Detecting Test Item Bias

This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three …
Date: May 1985
Creator: Monaco, Linda Gokey
System: The UNT Digital Library
A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas (open access)

A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas

Effective utilization of instructional technology can provide a valuable method for the delivery of a school program, and enable a teacher to individualize according to student needs. Implementation of such a program is costly and requires careful planning and adequate staff development for school personnel. This study examined the degree of commitment by Texas school districts to the use of the latest technologies in their efforts to revolutionize education. Quantitative data were collected by using a survey that included five informational areas: (1) school district background, (2) funding for budget, (3) staff, (4) technology hardware, and (5) staff development. The study included 137 school districts representing the 5 University Interscholastic League (UIL) classifications (A through AAAAA). The survey was mailed to the school superintendents requesting that the persons most familiar with instructional technology be responsible for completing the questionnaires. Analysis of data examined the relationship between UIL classification and the amount of money expended on instructional technology. Correlation coefficients were determined between teachers receiving training in the use of technology and total personnel assigned to technology positions. Coefficients were calculated between a district providing a plan fortechnology and employment of a coordinator for instructional technology. Significance was established at the …
Date: May 1990
Creator: Hiett, Elmer D. (Elmer Donald)
System: The UNT Digital Library
A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test (open access)

A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling …
Date: May 1987
Creator: Somboon Suriyawongse
System: The UNT Digital Library
A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables (open access)

A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.
Date: May 1987
Creator: Mullen, Jerry D. (Jerry Davis)
System: The UNT Digital Library
Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty (open access)

Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty

The purpose of this study was to determine if leniency and halo error in student ratings could be reduced by training the student raters and by using a Behaviorally Anchored Rating Scale (BARS) rather than a Likert scale. Two hypotheses were proposed. First, the ratings collected from the trained raters would contain less halo and leniency error than those collected from the untrained raters. Second, within the group of trained raters the BARS would contain less halo and leniency error than the Likert instrument.
Date: May 1987
Creator: Cook, Stuart S. (Stuart Sheldon)
System: The UNT Digital Library
An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation (open access)

An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation

This study sought to determine boundary conditions for robustness of the Tukey HSD statistic when the assumptions of homogeneity of variance were violated. Box's coefficient of variance variation, C^2 , was utilized to index the degree of variance heterogeneity. A Monte Carlo computer simulation technique was employed to generate data under controlled violation of the homogeneity of variance assumption. For each sample size and number of treatment groups condition, an analysis of variance F-test was computed, and Tukey's multiple comparison technique was calculated. When the two additional sample size cases were added to investigate the large sample sizes, the Tukey test was found to be conservative when C^2 was set at zero. The actual significance level fell below the lower limit of the 95 per cent confidence interval around the 0.05 nominal significance level.
Date: May 1980
Creator: Strozeski, Michael W.
System: The UNT Digital Library
An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Unequal Sample Sizes, Utilizing Kramer's Procedure and the Harmonic Mean (open access)

An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Unequal Sample Sizes, Utilizing Kramer's Procedure and the Harmonic Mean

This study sought to determine the effect upon Tukey's Honestly Significant Difference (HSD) statistic of concurrently violating the assumptions of homogeneity of variance and equal sample sizes. Two forms for the unequal sample size problem were investigated. Kramer's form and the harmonic mean approach were the two unequal sample size procedures studied. The study employed a Monte Carlo simulation procedure which varied sample sizes with a heterogeneity of variance condition. Four thousand experiments were generated. Findings of this study were based upon the empirically obtained significance levels. Five conclusions were reached in this study. The first conclusion was that for the conditions of this study the Kramer form of the HSD statistic is not robust at the .05 or .01 nominal level of significance. A second conclusion was that the harmonic mean form of the HSD statistic is not robust at the .05 and .01 nominal level of significance. A general conclusion reached from all the findings formed the third conclusion. It was that the Kramer form of the HSD test is the preferred procedure under combined assumption violations of variance heterogeneity and unequal sample sizes. Two additional conclusions are based on related findings. The fourth conclusion was that for …
Date: May 1976
Creator: McKinney, William Lane
System: The UNT Digital Library
A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty (open access)

A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty

The problem of this study was to examine the equivalency of two different types of criterion-referenced item-selection techniques on simulated data as item pools varied in degrees of item difficulty. A pretest-posttest design was employed in which pass-fail scores were randomly generated for item pools of twenty-five items. From the item pools, the two techniques determined which items were to be used to make up twelve-item criterion-referenced tests. The twenty-five items also were rank ordered according to the discrimination power of the two techniques.
Date: May 1974
Creator: Davis, Robbie G.
System: The UNT Digital Library
Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities (open access)

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities

The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.
Date: May 2003
Creator: Alexander, Erika D.
System: The UNT Digital Library