States

58 Matching Results

Results open in a new window/tab.

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching. (open access)

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching.

Previous research implementing stratification on the propensity score has generally relied on using five strata, based on prior theoretical groundwork and minimal empirical evidence as to the suitability of quintiles to adequately reduce bias in all cases and across all sample sizes. This study investigates bias reduction across varying number of strata and sample sizes via a large-scale simulation to determine the adequacy of quintiles for bias reduction under all conditions. Sample sizes ranged from 100 to 50,000 and strata from 3 to 20. Both the percentage of bias reduction and the standardized selection bias were examined. The results show that while the particular covariates in the simulation met certain criteria with five strata that greater bias reduction could be achieved by increasing the number of strata, especially with larger sample sizes. Simulation code written in R is included.
Date: May 2010
Creator: Akers, Allen
System: The UNT Digital Library
Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities (open access)

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities

The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.
Date: May 2003
Creator: Alexander, Erika D.
System: The UNT Digital Library
An Application of Ridge Regression to Educational Research (open access)

An Application of Ridge Regression to Educational Research

Behavioral data are frequently plagued with highly intercorrelated variables. Collinearity is an indication of insufficient information in the model or in the data. It, therefore, contributes to the unreliability of the estimated coefficients. One result of collinearity is that regression weights derived in one sample may lead to poor prediction in another model. One technique which was developed to deal with highly intercorrelated independent variables is ridge regression. It was first proposed by Hoerl and Kennard in 1970 as a method which would allow the data analyst to both stabilize his estimates and improve upon his squared error loss. The problem of this study was the application of ridge regression in the analysis of data resulting from educational research.
Date: December 1980
Creator: Amos, Nancy Notley
System: The UNT Digital Library
An Empirical Comparison of Random Number Generators: Period, Structure, Correlation, Density, and Efficiency (open access)

An Empirical Comparison of Random Number Generators: Period, Structure, Correlation, Density, and Efficiency

Random number generators (RNGs) are widely used in conducting Monte Carlo simulation studies, which are important in the field of statistics for comparing power, mean differences, or distribution shapes between statistical approaches. Statistical results, however, may differ when different random number generators are used. Often older methods have been blindly used with no understanding of their limitations. Many random functions supplied with computers today have been found to be comparatively unsatisfactory. In this study, five multiplicative linear congruential generators (MLCGs) were chosen which are provided in the following statistical packages: RANDU (IBM), RNUN (IMSL), RANUNI (SAS), UNIFORM(SPSS), and RANDOM (BMDP). Using a personal computer (PC), an empirical investigation was performed using five criteria: period length before repeating random numbers, distribution shape, correlation between adjacent numbers, density of distributions and normal approach of random number generator (RNG) in a normal function. All RNG FORTRAN programs were rewritten into Pascal which is more efficient language for the PC. Sets of random numbers were generated using different starting values. A good RNG should have the following properties: a long enough period; a well-structured pattern in distribution; independence between random number sequences; random and uniform distribution; and a good normal approach in the normal …
Date: August 1995
Creator: Bang, Jung Woong
System: The UNT Digital Library
Establishing the utility of a classroom effectiveness index as a teacher accountability system. (open access)

Establishing the utility of a classroom effectiveness index as a teacher accountability system.

How to identify effective teachers who improve student achievement despite diverse student populations and school contexts is an ongoing discussion in public education. The need to show communities and parents how well teachers and schools improve student learning has led districts and states to seek a fair, equitable and valid measure of student growth using student achievement. This study investigated a two stage hierarchical model for estimating teacher effect on student achievement. This measure was entitled a Classroom Effectiveness Index (CEI). Consistency of this model over time, outlier influences in individual CEIs, variance among CEIs across four years, and correlations of second stage student residuals with first stage student residuals were analyzed. The statistical analysis used four years of student residual data from a state-mandated mathematics assessment (n=7086) and a state-mandated reading assessment (n=7572) aggregated by teacher. The study identified the following results. Four years of district grand slopes and grand intercepts were analyzed to show consistent results over time. Repeated measures analyses of grand slopes and intercepts in mathematics were statistically significant at the .01 level. Repeated measures analyses of grand slopes and intercepts in reading were not statistically significant. The analyses indicated consistent results over time for reading …
Date: May 2002
Creator: Bembry, Karen L.
System: The UNT Digital Library
Spatial Ability, Motivation, and Attitude of Students as Related to Science Achievement (open access)

Spatial Ability, Motivation, and Attitude of Students as Related to Science Achievement

Understanding student achievement in science is important as there is an increasing reliance of the U.S. economy on math, science, and technology-related fields despite the declining number of youth seeking college degrees and careers in math and science. A series of structural equation models were tested using the scores from a statewide science exam for 276 students from a suburban north Texas public school district at the end of their 5th grade year and the latent variables of spatial ability, motivation to learn science and science-related attitude. Spatial ability was tested as a mediating variable on motivation and attitude; however, while spatial ability had statistically significant regression coefficients with motivation and attitude, spatial ability was found to be the sole statistically significant predictor of science achievement for these students explaining 23.1% of the variance in science scores.
Date: May 2011
Creator: Bolen, Judy Ann
System: The UNT Digital Library
The Effectiveness of a Mediating Structure for Writing Analysis Level Test Items From Text Based Instruction (open access)

The Effectiveness of a Mediating Structure for Writing Analysis Level Test Items From Text Based Instruction

This study is concerned with the effect of placing text into a mediated structure form upon the generation of test items for analysis level domain referenced test construction. The item writing methodology used is the linguistic (operationally defined) item writing technology developed by Bormuth, Finn, Roid, Haladyna and others. This item writing methodology is compared to 1) the intuitive method based on Bloom's definition of analysis level test questions and 2) the intuitive with keywords identified method of item writing. A mediated structure was developed by coordinating or subordinating sentences in an essay by following five simple grammatical rules. Three test writers each composed a ten-item test using each of the three methodologies based on a common essay. Tests were administered to 102 Composition 1 community college students. Students were asked to read the essay and complete one test form. Test forms by writer and method were randomly delivered. Analysis of variance showed no significant differences among either methods or writers. Item analysis showed no method of item writing resulting in items of consistent difficulty among test item writers. While the results of this study show no significant difference from the intuitive, traditional methods of item writing, analysis level test …
Date: August 1989
Creator: Brasel, Michael D. (Michael David)
System: The UNT Digital Library
A Comparison of Traditional Norming and Rasch Quick Norming Methods (open access)

A Comparison of Traditional Norming and Rasch Quick Norming Methods

The simplicity and ease of use of the Rasch procedure is a decided advantage. The test user needs only two numbers: the frequency of persons who answered each item correctly and the Rasch-calibrated item difficulty, usually a part of an existing item bank. Norms can be computed quickly for any specific group of interest. In addition, once the selected items from the calibrated bank are normed, any test, built from the item bank, is automatically norm-referenced. Thus, it was concluded that the Rasch quick norm procedure is a meaningful alternative to traditional classical true score norming for test users who desire normative data.
Date: August 1993
Creator: Bush, Joan Spooner
System: The UNT Digital Library
Short-to-Medium Term Enrollment Projection Based on Cycle Regression Analysis (open access)

Short-to-Medium Term Enrollment Projection Based on Cycle Regression Analysis

Short-to-medium projections were made of student semester credit hour enrollments for North Texas State University and the Texas Public and Senior Colleges and Universities (as defined by the Coordinating Board, Texas College and University System). Undergraduate, Graduate, Doctorate, Total, Education, Liberal Arts, and Business enrollments were projected. Fall + Spring, Fall, Summer I + Summer II, Summer I were time periods for which projections were made. A new regression analysis called "cycle regression" which employs nonlinear regression techniques to extract multifrequential phenomena from time-series data was employed for the analysis of the enrollment data. The heuristic steps employed in cycle regression analysis are similar to those used in fitting polynomial models. A trend line and one or more sin waves (cycles) are simultaneously estimated using a partial F test. The process of adding cycle(s) to the model continues until no more significant terms can be estimated.
Date: August 1983
Creator: Chizari, Mohammad
System: The UNT Digital Library
Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty (open access)

Effect of Rater Training and Scale Type on Leniency and Halo Error in Student Ratings of Faculty

The purpose of this study was to determine if leniency and halo error in student ratings could be reduced by training the student raters and by using a Behaviorally Anchored Rating Scale (BARS) rather than a Likert scale. Two hypotheses were proposed. First, the ratings collected from the trained raters would contain less halo and leniency error than those collected from the untrained raters. Second, within the group of trained raters the BARS would contain less halo and leniency error than the Likert instrument.
Date: May 1987
Creator: Cook, Stuart S. (Stuart Sheldon)
System: The UNT Digital Library
A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty (open access)

A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty

The problem of this study was to examine the equivalency of two different types of criterion-referenced item-selection techniques on simulated data as item pools varied in degrees of item difficulty. A pretest-posttest design was employed in which pass-fail scores were randomly generated for item pools of twenty-five items. From the item pools, the two techniques determined which items were to be used to make up twelve-item criterion-referenced tests. The twenty-five items also were rank ordered according to the discrimination power of the two techniques.
Date: May 1974
Creator: Davis, Robbie G.
System: The UNT Digital Library
The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices (open access)

The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The …
Date: May 1988
Creator: Dutschke, Cynthia F. (Cynthia Fleming)
System: The UNT Digital Library
Cross Categorical Scoring: An Approach to Treating Sociometric Data (open access)

Cross Categorical Scoring: An Approach to Treating Sociometric Data

The purpose of this study was to use a cross categorical scoring method for sociometric data focusing upon those individuals who have made the selections. A cross category selection was defined as choosing an individual on a sociometric instrument who was not within one's own classification. The classifications used for this study were sex, race, and perceived achievement level. A cross category score was obtained by summing the number of cross category selections. The conclusions below are the result of this study. Cross categorical scoring provides a useful method of scoring sociometric data. This method successfully focuses on those individuals who make sociometric choices rather than those who receive them. Each category utilized provides a unique contribution. The categories used in this study were sex, race, and achievement level. These are, however, only reflective of any number of variables which could be used. The categories must be chosen to reflect the needs of the particular study in which they are included. Multiple linear regression analysis can be used in order to provide the researcher with enough scope to handle numerous nominal and ordinal independent variables simultaneously. The sociometric criterion or question does make a difference in the results on cross …
Date: December 1977
Creator: Ernst, Nora Wilford
System: The UNT Digital Library
A Comparison of Three Correlational Procedures for Factor-Analyzing Dichotomously-Scored Item Response Data (open access)

A Comparison of Three Correlational Procedures for Factor-Analyzing Dichotomously-Scored Item Response Data

In this study, an improved correlational procedure for factor-analyzing dichotomously-scored item response data is described and tested. The procedure involves (a) replacing the dichotomous input values with continuous probability values obtained through Rasch analysis; (b) calculating interitem product-moment correlations among the probabilities; and (c) subjecting the correlations to unweighted least-squares factor analysis. Two simulated data sets and an empirical data set (Kentucky Comprehensive Listening Test responses) were used to compare the new procedure with two more traditional techniques, using (a) phi and (b) tetrachoric correlations calculated directly from the dichotomous item-response values. The three methods were compared on three criterion measures: (a) maximum internal correlation; (b) product of the two largest factor loadings; and (c) proportion of variance accounted for. The Rasch-based procedure is recommended for subjecting dichotomous item response data to latent-variable analysis.
Date: May 1991
Creator: Fluke, Ricky
System: The UNT Digital Library
The Robustness of O'Brien's r Transformation to Non-Normality (open access)

The Robustness of O'Brien's r Transformation to Non-Normality

A Monte Carlo simulation technique was employed in this study to determine if the r transformation, a test of homogeneity of variance, affords adequate protection against Type I error over a range of equal sample sizes and number of groups when samples are obtained from normal and non-normal distributions. Additionally, this study sought to determine if the r transformation is more robust than Bartlett's chi-square to deviations from normality. Four populations were generated representing normal, uniform, symmetric leptokurtic, and skewed leptokurtic distributions. For each sample size (6, 12, 24, 48), number of groups (3, 4, 5, 7), and population distribution condition, the r transformation and Bartlett's chi-square were calculated. This procedure was replicated 1,000 times; the actual significance level was determined and compared to the nominal significance level of .05. On the basis of the analysis of the generated data, the following conclusions are drawn. First, the r transformation is generally robust to violations of normality when the size of the samples tested is twelve or larger. Second, in the instances where a significant difference occurred between the actual and nominal significance levels, the r transformation produced (a) conservative Type I error rates if the kurtosis of the parent population …
Date: August 1985
Creator: Gordon, Carol J. (Carol Jean)
System: The UNT Digital Library
A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data (open access)

A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data

Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988.
Date: December 1994
Creator: Graham, D. Scott
System: The UNT Digital Library
Boundary Conditions of Several Variables Relative to the Robustness of Analysis of Variance Under Violation of the Assumption of Homogeneity of Variances (open access)

Boundary Conditions of Several Variables Relative to the Robustness of Analysis of Variance Under Violation of the Assumption of Homogeneity of Variances

The purpose of this study is to determine boundary conditions associated with the number of treatment groups (K), the common treatment group sample size (n), and an index of the extent to which the assumption of equality of treatment population variances is violated (Q) with regard to user confidence in application of the one-way analysis of variance F-test for determining equality of treatment population means. The study concludes that the analysis of variance F-test is robust when the number of treatment groups is less than seven and when the extreme ratio of variances is less than 1:5, but when the violation of the assumption is more severe or the number of treatment groups is seven or more, serious discrepancies between actual and nominal significance levels occur. It was also concluded that for seven treatment groups confidence in the application of the analysis of variance should be limited to the values of Q and n so that n is greater than or equal to 10 In (1/2)Q. For nine treatment groups, it was concluded that confidence be limited to those values of Q and n so that n is greater than or equal to (-2/3) + 12 ln (1/2)Q. No definitive …
Date: December 1977
Creator: Grizzle, Grady M.
System: The UNT Digital Library
Influence of Item Response Theory and Type of Judge on a Standard Set Using the Iterative Angoff Standard Setting Method (open access)

Influence of Item Response Theory and Type of Judge on a Standard Set Using the Iterative Angoff Standard Setting Method

The purpose of this investigation was to determine the influence of item response theory and different types of judges on a standard. The iterative Angoff standard setting method was employed by all judges to determine a cut-off score for a public school district-wide criterion-reformed test. The analysis of variance of the effect of judge type and standard setting method on the central tendency of the standard revealed the existence of an ordinal interaction between judge type and method. Without any knowledge of p-values, one judge group set an unrealistic standard. A significant disordinal interaction was found concerning the effect of judge type and standard setting method on the variance of the standard. A positive covariance was detected between judges' minimum pass level estimates and empirical item information. With both p-values and b-values, judge groups had mean minimum pass levels that were positively correlated (ranging from .77 to .86), regardless of the type of information given to the judges. No differences in correlations were detected between different judge types or different methods. The generalizability coefficients and phi indices for 12 judges included in any method or judge type were acceptable (ranging from .77 to .99). The generalizability coefficient and phi index …
Date: August 1992
Creator: Hamberlin, Melanie Kidd
System: The UNT Digital Library
A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas (open access)

A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas

Effective utilization of instructional technology can provide a valuable method for the delivery of a school program, and enable a teacher to individualize according to student needs. Implementation of such a program is costly and requires careful planning and adequate staff development for school personnel. This study examined the degree of commitment by Texas school districts to the use of the latest technologies in their efforts to revolutionize education. Quantitative data were collected by using a survey that included five informational areas: (1) school district background, (2) funding for budget, (3) staff, (4) technology hardware, and (5) staff development. The study included 137 school districts representing the 5 University Interscholastic League (UIL) classifications (A through AAAAA). The survey was mailed to the school superintendents requesting that the persons most familiar with instructional technology be responsible for completing the questionnaires. Analysis of data examined the relationship between UIL classification and the amount of money expended on instructional technology. Correlation coefficients were determined between teachers receiving training in the use of technology and total personnel assigned to technology positions. Coefficients were calculated between a district providing a plan fortechnology and employment of a coordinator for instructional technology. Significance was established at the …
Date: May 1990
Creator: Hiett, Elmer D. (Elmer Donald)
System: The UNT Digital Library
Parent Involvement and Science Achievement: A Latent Growth Curve Analysis (open access)

Parent Involvement and Science Achievement: A Latent Growth Curve Analysis

This study examined science achievement growth across elementary and middle school and parent school involvement using the Early Childhood Longitudinal Study – Kindergarten Class of 1998 – 1999 (ECLS-K). The ECLS-K is a nationally representative kindergarten cohort of students from public and private schools who attended full-day or half-day kindergarten class in 1998 – 1999. The present study’s sample (N = 8,070) was based on students that had a sampling weight available from the public-use data file. Students were assessed in science achievement at third, fifth, and eighth grades and parents of the students were surveyed at the same time points. Analyses using latent growth curve modeling with time invariant and varying covariates in an SEM framework revealed a positive relationship between science achievement and parent involvement at eighth grade. Furthermore, there were gender and racial/ethnic differences in parents’ school involvement as a predictor of science achievement. Findings indicated that students with lower initial science achievement scores had a faster rate of growth across time. The achievement gap between low and high achievers in earth, space and life sciences lessened from elementary to middle school. Parents’ involvement with school usually tapers off after elementary school, but due to parent school …
Date: August 2011
Creator: Johnson, Ursula Yvette
System: The UNT Digital Library
Stratified item selection and exposure control in unidimensional adaptive testing in the presence of two-dimensional data. (open access)

Stratified item selection and exposure control in unidimensional adaptive testing in the presence of two-dimensional data.

It is not uncommon to use unidimensional item response theory (IRT) models to estimate ability in multidimensional data. Therefore it is important to understand the implications of summarizing multiple dimensions of ability into a single parameter estimate, especially if effects are confounded when applied to computerized adaptive testing (CAT). Previous studies have investigated the effects of different IRT models and ability estimators by manipulating the relationships between item and person parameters. However, in all cases, the maximum information criterion was used as the item selection method. Because maximum information is heavily influenced by the item discrimination parameter, investigating a-stratified item selection methods is tenable. The current Monte Carlo study compared maximum information, a-stratification, and a-stratification with b blocking item selection methods, alone, as well as in combination with the Sympson-Hetter exposure control strategy. The six testing conditions were conditioned on three levels of interdimensional item difficulty correlations and four levels of interdimensional examinee ability correlations. Measures of fidelity, estimation bias, error, and item usage were used to evaluate the effectiveness of the methods. Results showed either stratified item selection strategy is warranted if the goal is to obtain precise estimates of ability when using unidimensional CAT in the presence of …
Date: August 2009
Creator: Kalinowski, Kevin E.
System: The UNT Digital Library
A comparison of traditional and IRT factor analysis. (open access)

A comparison of traditional and IRT factor analysis.

This study investigated the item parameter recovery of two methods of factor analysis. The methods researched were a traditional factor analysis of tetrachoric correlation coefficients and an IRT approach to factor analysis which utilizes marginal maximum likelihood estimation using an EM algorithm (MMLE-EM). Dichotomous item response data was generated under the 2-parameter normal ogive model (2PNOM) using PARDSIM software. Examinee abilities were sampled from both the standard normal and uniform distributions. True item discrimination, a, was normal with a mean of .75 and a standard deviation of .10. True b, item difficulty, was specified as uniform [-2, 2]. The two distributions of abilities were completely crossed with three test lengths (n= 30, 60, and 100) and three sample sizes (N = 50, 500, and 1000). Each of the 18 conditions was replicated 5 times, resulting in 90 datasets. PRELIS software was used to conduct a traditional factor analysis on the tetrachoric correlations. The IRT approach to factor analysis was conducted using BILOG 3 software. Parameter recovery was evaluated in terms of root mean square error, average signed bias, and Pearson correlations between estimated and true item parameters. ANOVAs were conducted to identify systematic differences in error indices. Based on many …
Date: December 2004
Creator: Kay, Cheryl Ann
System: The UNT Digital Library
The Generalization of the Logistic Discriminant Function Analysis and Mantel Score Test Procedures to Detection of Differential Testlet Functioning (open access)

The Generalization of the Logistic Discriminant Function Analysis and Mantel Score Test Procedures to Detection of Differential Testlet Functioning

Two procedures for detection of differential item functioning (DIF) for polytomous items were generalized to detection of differential testlet functioning (DTLF). The methods compared were the logistic discriminant function analysis procedure for uniform and non-uniform DTLF (LDFA-U and LDFA-N), and the Mantel score test procedure. Further analysis included comparison of results of DTLF analysis using the Mantel procedure with DIF analysis of individual testlet items using the Mantel-Haenszel (MH) procedure. Over 600 chi-squares were analyzed and compared for rejection of null hypotheses. Samples of 500, 1,000, and 2,000 were drawn by gender subgroups from the NELS:88 data set, which contains demographic and test data from over 25,000 eighth graders. Three types of testlets (totalling 29) from the NELS:88 test were analyzed for DTLF. The first type, the common passage testlet, followed the conventional testlet definition: items grouped together by a common reading passage, figure, or graph. The other two types were based upon common content and common process. as outlined in the NELS test specification.
Date: August 1994
Creator: Kinard, Mary E.
System: The UNT Digital Library
A Comparison of IRT and Rasch Procedures in a Mixed-Item Format Test (open access)

A Comparison of IRT and Rasch Procedures in a Mixed-Item Format Test

This study investigated the effects of test length (10, 20 and 30 items), scoring schema (proportion of dichotomous ad polytomous scoring) and item analysis model (IRT and Rasch) on the ability estimates, test information levels and optimization criteria of mixed item format tests. Polytomous item responses to 30 items for 1000 examinees were simulated using the generalized partial-credit model and SAS software. Portions of the data were re-coded dichotomously over 11 structured proportions to create 33 sets of test responses including mixed item format tests. MULTILOG software was used to calculate the examinee ability estimates, standard errors, item and test information, reliability and fit indices. A comparison of IRT and Rasch item analysis procedures was made using SPSS software across ability estimates and standard errors of ability estimates using a 3 x 11 x 2 fixed factorial ANOVA. Effect sizes and power were reported for each procedure. Scheffe post hoc procedures were conducted on significant factos. Test information was analyzed and compared across the range of ability levels for all 66-design combinations. The results indicated that both test length and the proportion of items scored polytomously had a significant impact on the amount of test information produced by mixed item …
Date: August 2003
Creator: Kinsey, Tari L.
System: The UNT Digital Library