Partner

Degree Department

Degree Discipline

Country

States

Counties

Decade

Year

Month

58 Matching Results

Start Over Partner UNT Libraries Degree Discipline Educational Research

Results open in a new window/tab.

Results: 1 - 24 of 58 next

(open access)

Ability Estimation Under Different Item Parameterization and Scoring Models

A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy …

Date: May 2002

Creator: Si, Ching-Fung B.

System: The UNT Digital Library

(open access)

The Analysis of the Accumulation of Type II Error in Multiple Comparisons for Specified Levels of Power to Violation of Normality with the Dunn-Bonferroni Procedure: a Monte Carlo Study

The study seeks to determine the degree of accumulation of Type II error rates, while violating the assumptions of normality, for different specified levels of power among sample means. The study employs a Monte Carlo simulation procedure with three different specified levels of power, methodologies, and population distributions. On the basis of the comparisons of actual and observed error rates, the following conclusions appear to be appropriate. 1. Under the strict criteria for evaluation of the hypotheses, Type II experimentwise error does accumulate at a rate that the probability of accepting at least one null hypothesis in a family of tests, when in theory all of the alternate hypotheses are true, is high, precluding valid tests at the beginning of the study. 2. The Dunn-Bonferroni procedure of setting the critical value based on the beta value per contrast did not significantly reduce the probability of committing a Type II error in a family of tests. 3. The use of an adequate sample size and orthogonal contrasts, or limiting the number of pairwise comparisons to the number of means, is the best method to control for the accumulation of Type II errors. 4. The accumulation of Type II error is irrespective …

Date: August 1989

Creator: Powers-Prather, Bonnie Ann

System: The UNT Digital Library

(open access)

An Application of Ridge Regression to Educational Research

Behavioral data are frequently plagued with highly intercorrelated variables. Collinearity is an indication of insufficient information in the model or in the data. It, therefore, contributes to the unreliability of the estimated coefficients. One result of collinearity is that regression weights derived in one sample may lead to poor prediction in another model. One technique which was developed to deal with highly intercorrelated independent variables is ridge regression. It was first proposed by Hoerl and Kennard in 1970 as a method which would allow the data analyst to both stabilize his estimates and improve upon his squared error loss. The problem of this study was the application of ridge regression in the analysis of data resulting from educational research.

Date: December 1980

Creator: Amos, Nancy Notley

System: The UNT Digital Library

(open access)

Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability

Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared canonical correlation coefficient was most attenuated when compared to population Rc2 values. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on Rc2. The results from this Monte Carlo investigation clearly illustrated the importance of score reliability when interpreting study results. As evidenced by the outcomes presented, the more measurement error (lower reliability) present in the variables included in an analysis, the more attenuation experienced by the effect size(s) produced in the analysis, in this …

Date: August 2010

Creator: Wilson, Celia M.

System: The UNT Digital Library

(open access)

Bias and Precision of the Squared Canonical Correlation Coefficient under Nonnormal Data Conditions

This dissertation: (a) investigated the degree to which the squared canonical correlation coefficient is biased in multivariate nonnormal distributions and (b) identified formulae that adjust the squared canonical correlation coefficient (Rc2) such that it most closely approximates the true population effect under normal and nonnormal data conditions. Five conditions were manipulated in a fully-crossed design to determine the degree of bias associated with Rc2: distribution shape, variable sets, sample size to variable ratios, and within- and between-set correlations. Very few of the condition combinations produced acceptable amounts of bias in Rc2, but those that did were all found with first function results. The sample size to variable ratio (n:v)was determined to have the greatest impact on the bias associated with the Rc2 for the first, second, and third functions. The variable set condition also affected the accuracy of Rc2, but for the second and third functions only. The kurtosis levels of the marginal distributions (b2), and the between- and within-set correlations demonstrated little or no impact on the bias associated with Rc2. Therefore, it is recommended that researchers use n:v ratios of at least 10:1 in canonical analyses, although greater n:v ratios have the potential to produce even less bias. …

Date: August 2006

Creator: Leach, Lesley Ann Freeny

System: The UNT Digital Library

(open access)

Boundary Conditions of Several Variables Relative to the Robustness of Analysis of Variance Under Violation of the Assumption of Homogeneity of Variances

The purpose of this study is to determine boundary conditions associated with the number of treatment groups (K), the common treatment group sample size (n), and an index of the extent to which the assumption of equality of treatment population variances is violated (Q) with regard to user confidence in application of the one-way analysis of variance F-test for determining equality of treatment population means. The study concludes that the analysis of variance F-test is robust when the number of treatment groups is less than seven and when the extreme ratio of variances is less than 1:5, but when the violation of the assumption is more severe or the number of treatment groups is seven or more, serious discrepancies between actual and nominal significance levels occur. It was also concluded that for seven treatment groups confidence in the application of the analysis of variance should be limited to the values of Q and n so that n is greater than or equal to 10 In (1/2)Q. For nine treatment groups, it was concluded that confidence be limited to those values of Q and n so that n is greater than or equal to (-2/3) + 12 ln (1/2)Q. No definitive …

Date: December 1977

Creator: Grizzle, Grady M.

System: The UNT Digital Library

(open access)

The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The …

Date: May 1988

Creator: Dutschke, Cynthia F. (Cynthia Fleming)

System: The UNT Digital Library

(open access)

A Comparison of IRT and Rasch Procedures in a Mixed-Item Format Test

This study investigated the effects of test length (10, 20 and 30 items), scoring schema (proportion of dichotomous ad polytomous scoring) and item analysis model (IRT and Rasch) on the ability estimates, test information levels and optimization criteria of mixed item format tests. Polytomous item responses to 30 items for 1000 examinees were simulated using the generalized partial-credit model and SAS software. Portions of the data were re-coded dichotomously over 11 structured proportions to create 33 sets of test responses including mixed item format tests. MULTILOG software was used to calculate the examinee ability estimates, standard errors, item and test information, reliability and fit indices. A comparison of IRT and Rasch item analysis procedures was made using SPSS software across ability estimates and standard errors of ability estimates using a 3 x 11 x 2 fixed factorial ANOVA. Effect sizes and power were reported for each procedure. Scheffe post hoc procedures were conducted on significant factos. Test information was analyzed and compared across the range of ability levels for all 66-design combinations. The results indicated that both test length and the proportion of items scored polytomously had a significant impact on the amount of test information produced by mixed item …

Date: August 2003

Creator: Kinsey, Tari L.

System: The UNT Digital Library

(open access)

Comparison of Methods for Computation and Cumulation of Effect Sizes in Meta-Analysis

This study examined the statistical consequences of employing various methods of computing and cumulating effect sizes in meta-analysis. Six methods of computing effect size, and three techniques for combining study outcomes, were compared. Effect size metrics were calculated with one-group and pooled standardizing denominators, corrected for bias and for unreliability of measurement, and weighted by sample size and by sample variance. Cumulating techniques employed as units of analysis the effect size, the study, and an average study effect. In order to determine whether outcomes might vary with the size of the meta-analysis, mean effect sizes were also compared for two smaller subsets of studies. An existing meta-analysis of 60 studies examining the effectiveness of computer-based instruction was used as a data base for this investigation. Recomputation of the original study data under the six different effect size formulas showed no significant difference among the metrics. Maintaining the independence of the data by using only one effect size per study, whether a single or averaged effect, produced a higher mean effect size than averaging all effect sizes together, although the difference did not reach statistical significance. The sampling distribution of effect size means approached that of the population of 60 studies …

Date: December 1987

Creator: Ronco, Sharron L. (Sharron Lee)

System: The UNT Digital Library

(open access)

A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.

Date: May 1987

Creator: Mullen, Jerry D. (Jerry Davis)

System: The UNT Digital Library

(open access)

A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling …

Date: May 1987

Creator: Somboon Suriyawongse

System: The UNT Digital Library

(open access)

A Comparison of Three Correlational Procedures for Factor-Analyzing Dichotomously-Scored Item Response Data

In this study, an improved correlational procedure for factor-analyzing dichotomously-scored item response data is described and tested. The procedure involves (a) replacing the dichotomous input values with continuous probability values obtained through Rasch analysis; (b) calculating interitem product-moment correlations among the probabilities; and (c) subjecting the correlations to unweighted least-squares factor analysis. Two simulated data sets and an empirical data set (Kentucky Comprehensive Listening Test responses) were used to compare the new procedure with two more traditional techniques, using (a) phi and (b) tetrachoric correlations calculated directly from the dichotomous item-response values. The three methods were compared on three criterion measures: (a) maximum internal correlation; (b) product of the two largest factor loadings; and (c) proportion of variance accounted for. The Rasch-based procedure is recommended for subjecting dichotomous item response data to latent-variable analysis.

Date: May 1991

Creator: Fluke, Ricky

System: The UNT Digital Library

(open access)

A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data

Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988.

Date: December 1994

Creator: Graham, D. Scott

System: The UNT Digital Library

(open access)

A Comparison of Three Item Selection Methods in Criterion-Referenced Tests

This study compared three methods of selecting the best discriminating test items and the resultant test reliability of mastery/nonmastery classifications. These three methods were (a) the agreement approach, (b) the phi coefficient approach, and (c) the random selection approach. Test responses from 1,836 students on a 50-item physical science test were used, from which 90 distinct data sets were generated for analysis. These 90 data sets contained 10 replications of the combination of three different sample sizes (75, 150, and 300) and three different numbers of test items (15, 25, and 35). The results of this study indicated that the agreement approach was an appropriate method to be used for selecting criterion-referenced test items at the classroom level, while the phi coefficient approach was an appropriate method to be used at the district and/or state levels. The random selection method did not have similar characteristics in selecting test items and produced the lowest reliabilities, when compared with the agreement and the phi coefficient approaches.

Date: August 1988

Creator: Lin, Hui-Fen

System: The UNT Digital Library

(open access)

A Comparison of Three Methods of Detecting Test Item Bias

This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three …

Date: May 1985

Creator: Monaco, Linda Gokey

System: The UNT Digital Library

(open access)

A comparison of traditional and IRT factor analysis.

This study investigated the item parameter recovery of two methods of factor analysis. The methods researched were a traditional factor analysis of tetrachoric correlation coefficients and an IRT approach to factor analysis which utilizes marginal maximum likelihood estimation using an EM algorithm (MMLE-EM). Dichotomous item response data was generated under the 2-parameter normal ogive model (2PNOM) using PARDSIM software. Examinee abilities were sampled from both the standard normal and uniform distributions. True item discrimination, a, was normal with a mean of .75 and a standard deviation of .10. True b, item difficulty, was specified as uniform [-2, 2]. The two distributions of abilities were completely crossed with three test lengths (n= 30, 60, and 100) and three sample sizes (N = 50, 500, and 1000). Each of the 18 conditions was replicated 5 times, resulting in 90 datasets. PRELIS software was used to conduct a traditional factor analysis on the tetrachoric correlations. The IRT approach to factor analysis was conducted using BILOG 3 software. Parameter recovery was evaluated in terms of root mean square error, average signed bias, and Pearson correlations between estimated and true item parameters. ANOVAs were conducted to identify systematic differences in error indices. Based on many …

Date: December 2004

Creator: Kay, Cheryl Ann

System: The UNT Digital Library

(open access)

A Comparison of Traditional Norming and Rasch Quick Norming Methods

The simplicity and ease of use of the Rasch procedure is a decided advantage. The test user needs only two numbers: the frequency of persons who answered each item correctly and the Rasch-calibrated item difficulty, usually a part of an existing item bank. Norms can be computed quickly for any specific group of interest. In addition, once the selected items from the calibrated bank are normed, any test, built from the item bank, is automatically norm-referenced. Thus, it was concluded that the Rasch quick norm procedure is a meaningful alternative to traditional classical true score norming for test users who desire normative data.

Date: August 1993

Creator: Bush, Joan Spooner

System: The UNT Digital Library

(open access)

A Comparison of Two Criterion-Referenced Item-Selection Techniques Utilizing Simulated Data with Item Pools that Vary in Degrees of Item Difficulty

The problem of this study was to examine the equivalency of two different types of criterion-referenced item-selection techniques on simulated data as item pools varied in degrees of item difficulty. A pretest-posttest design was employed in which pass-fail scores were randomly generated for item pools of twenty-five items. From the item pools, the two techniques determined which items were to be used to make up twelve-item criterion-referenced tests. The twenty-five items also were rank ordered according to the discrimination power of the two techniques.

Date: May 1974

Creator: Davis, Robbie G.

System: The UNT Digital Library

(open access)

A Comparison of Two Differential Item Functioning Detection Methods: Logistic Regression and an Analysis of Variance Approach Using Rasch Estimation

Differential item functioning (DIF) detection rates were examined for the logistic regression and analysis of variance (ANOVA) DIF detection methods. The methods were applied to simulated data sets of varying test length (20, 40, and 60 items) and sample size (200, 400, and 600 examinees) for both equal and unequal underlying ability between groups as well as for both fixed and varying item discrimination parameters. Each test contained 5% uniform DIF items, 5% non-uniform DIF items, and 5% combination DIF (simultaneous uniform and non-uniform DIF) items. The factors were completely crossed, and each experiment was replicated 100 times. For both methods and all DIF types, a test length of 20 was sufficient for satisfactory DIF detection. The detection rate increased significantly with sample size for each method. With the ANOVA DIF method and uniform DIF, there was a difference in detection rates between discrimination parameter types, which favored varying discrimination and decreased with increased sample size. The detection rate of non-uniform DIF using the ANOVA DIF method was higher with fixed discrimination parameters than with varying discrimination parameters when relative underlying ability was unequal. In the combination DIF case, there was a three-way interaction among the experimental factors discrimination type, …

Date: August 1995

Creator: Whitmore, Marjorie Lee Threet

System: The UNT Digital Library

(open access)

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities

The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.

Date: May 2003

Creator: Alexander, Erika D.

System: The UNT Digital Library

(open access)

Construct Validation and Measurement Invariance of the Athletic Coping Skills Inventory for Educational Settings

The present study examined the factor structure and measurement invariance of the revised version of the Athletic Coping Skills Inventory (ACSI-28), following adjustment of the wording of items such that they were appropriate to assess Coping Skills in an educational setting. A sample of middle school students (n = 1,037) completed the revised inventory. An initial confirmatory factor analysis led to the hypothesis of a better fitting model with two items removed. Reliability of the subscales and the instrument as a whole was acceptable. Items were examined for sex invariance with differential item functioning (DIF) using item response theory, and five items were flagged for significant sex non-invariance. Following removal of these items, comparison of the mean differences between male and female coping scores revealed that there was no significant difference between the two groups. Further examination of the generalizability of the coping construct and the potential transfer of psychosocial skills between athletic and academic settings are warranted.

Date: May 2017

Creator: Sanguras, Laila Y., 1977-

System: The UNT Digital Library

(open access)

Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples

This study examined the bias and precision of four residualized variable validity estimates (C0, C1, C2, C3) across a number of study conditions. Validity estimates that considered measurement error, correlations among error scores, and correlations between error scores and true scores (C3) performed the best, yielding no estimates that were practically significantly different than their respective population parameters, across study conditions. Validity estimates that considered measurement error and correlations among error scores (C2) did a good job in yielding unbiased, valid, and precise results. Only in a select number of study conditions were C2 estimates unable to be computed or produced results that had sufficient variance to affect interpretation of results. Validity estimates based on observed scores (C0) fared well in producing valid, precise, and unbiased results. Validity estimates based on observed scores that were only corrected for measurement error (C1) performed the worst. Not only did they not reliably produce estimates even when the level of modeled correlated error was low, C1 produced values higher than the theoretical limit of 1.0 across a number of study conditions. Estimates based on C1 also produced the greatest number of conditions that were practically significantly different than their population parameters.

Date: May 2013

Creator: Nimon, Kim

System: The UNT Digital Library

(open access)

Cross Categorical Scoring: An Approach to Treating Sociometric Data

The purpose of this study was to use a cross categorical scoring method for sociometric data focusing upon those individuals who have made the selections. A cross category selection was defined as choosing an individual on a sociometric instrument who was not within one's own classification. The classifications used for this study were sex, race, and perceived achievement level. A cross category score was obtained by summing the number of cross category selections. The conclusions below are the result of this study. Cross categorical scoring provides a useful method of scoring sociometric data. This method successfully focuses on those individuals who make sociometric choices rather than those who receive them. Each category utilized provides a unique contribution. The categories used in this study were sex, race, and achievement level. These are, however, only reflective of any number of variables which could be used. The categories must be chosen to reflect the needs of the particular study in which they are included. Multiple linear regression analysis can be used in order to provide the researcher with enough scope to handle numerous nominal and ordinal independent variables simultaneously. The sociometric criterion or question does make a difference in the results on cross …

Date: December 1977

Creator: Ernst, Nora Wilford

System: The UNT Digital Library

(open access)

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching.

Previous research implementing stratification on the propensity score has generally relied on using five strata, based on prior theoretical groundwork and minimal empirical evidence as to the suitability of quintiles to adequately reduce bias in all cases and across all sample sizes. This study investigates bias reduction across varying number of strata and sample sizes via a large-scale simulation to determine the adequacy of quintiles for bias reduction under all conditions. Sample sizes ranged from 100 to 50,000 and strata from 3 to 20. Both the percentage of bias reduction and the standardized selection bias were examined. The results show that while the particular covariates in the simulation met certain criteria with five strata that greater bias reduction could be achieved by increasing the number of strata, especially with larger sample sizes. Simulation code written in R is included.

Date: May 2010

Creator: Akers, Allen

System: The UNT Digital Library