The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices (open access)

The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The …
Date: May 1988
Creator: Dutschke, Cynthia F. (Cynthia Fleming)
System: The UNT Digital Library
A Comparison of Three Methods of Detecting Test Item Bias (open access)

A Comparison of Three Methods of Detecting Test Item Bias

This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three …
Date: May 1985
Creator: Monaco, Linda Gokey
System: The UNT Digital Library
A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test (open access)

A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling …
Date: May 1987
Creator: Somboon Suriyawongse
System: The UNT Digital Library
A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables (open access)

A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.
Date: May 1987
Creator: Mullen, Jerry D. (Jerry Davis)
System: The UNT Digital Library
Willingness of Educators to Participate in a Descriptive Research Study as a Function of a Monetary Incentive (open access)

Willingness of Educators to Participate in a Descriptive Research Study as a Function of a Monetary Incentive

The problem considered involved assessing willingness of educators to participate in a study offering monetary incentives. Determination of willingness was implemented by sending educators a packet requesting return of a postcard to indicate willingness to participate. The purpose was twofold: to determine the effect of a monetary incentive upon willingness of educators to participate in a research study, and to analyze implications for mail questionnaire studies. A sample of 600 educators was chosen from directories of eleven public schools in north Texas. It included equal numbers of male and female teachers and male and female administrators. Subjects were assigned to one of twelve groups. No two from a school were assigned to different levels of the inducement variable.
Date: May 1984
Creator: Pittman, Doyle
System: The UNT Digital Library
Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty (open access)

Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty

The purpose of this study was to determine if leniency and halo error in student ratings could be reduced by training the student raters and by using a Behaviorally Anchored Rating Scale (BARS) rather than a Likert scale. Two hypotheses were proposed. First, the ratings collected from the trained raters would contain less halo and leniency error than those collected from the untrained raters. Second, within the group of trained raters the BARS would contain less halo and leniency error than the Likert instrument.
Date: May 1987
Creator: Cook, Stuart S. (Stuart Sheldon)
System: The UNT Digital Library
An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation (open access)

An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation

This study sought to determine boundary conditions for robustness of the Tukey HSD statistic when the assumptions of homogeneity of variance were violated. Box's coefficient of variance variation, C^2 , was utilized to index the degree of variance heterogeneity. A Monte Carlo computer simulation technique was employed to generate data under controlled violation of the homogeneity of variance assumption. For each sample size and number of treatment groups condition, an analysis of variance F-test was computed, and Tukey's multiple comparison technique was calculated. When the two additional sample size cases were added to investigate the large sample sizes, the Tukey test was found to be conservative when C^2 was set at zero. The actual significance level fell below the lower limit of the 95 per cent confidence interval around the 0.05 nominal significance level.
Date: May 1980
Creator: Strozeski, Michael W.
System: The UNT Digital Library