States

Language

Outliers and Regression Models (open access)

Outliers and Regression Models

The mitigation of outliers serves to increase the strength of a relationship between variables. This study defined outliers in three different ways and used five regression procedures to describe the effects of outliers on 50 data sets. This study also examined the relationship among the shape of the distribution, skewness, and outliers.
Date: May 1992
Creator: Mitchell, Napoleon
System: The UNT Digital Library
Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching. (open access)

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching.

Previous research implementing stratification on the propensity score has generally relied on using five strata, based on prior theoretical groundwork and minimal empirical evidence as to the suitability of quintiles to adequately reduce bias in all cases and across all sample sizes. This study investigates bias reduction across varying number of strata and sample sizes via a large-scale simulation to determine the adequacy of quintiles for bias reduction under all conditions. Sample sizes ranged from 100 to 50,000 and strata from 3 to 20. Both the percentage of bias reduction and the standardized selection bias were examined. The results show that while the particular covariates in the simulation met certain criteria with five strata that greater bias reduction could be achieved by increasing the number of strata, especially with larger sample sizes. Simulation code written in R is included.
Date: May 2010
Creator: Akers, Allen
System: The UNT Digital Library
The Generalization of the Logistic Discriminant Function Analysis and Mantel Score Test Procedures to Detection of Differential Testlet Functioning (open access)

The Generalization of the Logistic Discriminant Function Analysis and Mantel Score Test Procedures to Detection of Differential Testlet Functioning

Two procedures for detection of differential item functioning (DIF) for polytomous items were generalized to detection of differential testlet functioning (DTLF). The methods compared were the logistic discriminant function analysis procedure for uniform and non-uniform DTLF (LDFA-U and LDFA-N), and the Mantel score test procedure. Further analysis included comparison of results of DTLF analysis using the Mantel procedure with DIF analysis of individual testlet items using the Mantel-Haenszel (MH) procedure. Over 600 chi-squares were analyzed and compared for rejection of null hypotheses. Samples of 500, 1,000, and 2,000 were drawn by gender subgroups from the NELS:88 data set, which contains demographic and test data from over 25,000 eighth graders. Three types of testlets (totalling 29) from the NELS:88 test were analyzed for DTLF. The first type, the common passage testlet, followed the conventional testlet definition: items grouped together by a common reading passage, figure, or graph. The other two types were based upon common content and common process. as outlined in the NELS test specification.
Date: August 1994
Creator: Kinard, Mary E.
System: The UNT Digital Library
The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression (open access)

The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression

There are several conditions for applying equal weights as an alternative to least squares weights. Psychometric parallelism, one of the conditions, has been suggested as a necessary and sufficient condition for equal-weights aggregation. The purpose of this study is to investigate the effect of psychometric parallelism among predictors on the efficiency of equal weights and least squares weights. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of psychometric parallelism. Five hundred samples with six ratios of observation to predictor = 5/1, 10/1, 20/1, 30/1, 40/1, and 50/1 were drawn from each population. The efficiency is interpreted as the accuracy and the predictive power estimated by the weighting methods. The accuracy is defined by the deviation between the population R² and the sample R² . The predictive power is referred to as the population cross-validated R² and the population mean square error of prediction. The findings indicate there is no statistically significant relationship between the level of psychometric parallelism and the accuracy of least squares weights. In contrast, the correlation between the level of psychometric parallelism and the accuracy of equal weights is significantly negative. Under different conditions, the minimum p value of χ² …
Date: May 1996
Creator: Zhang, Desheng
System: The UNT Digital Library
Measurement Disturbance Effects on Rasch Fit Statistics and the Logit Residual Index (open access)

Measurement Disturbance Effects on Rasch Fit Statistics and the Logit Residual Index

The effects of random guessing as a measurement disturbance on Rasch fit statistics (unweighted total, weighted total, and unweighted ability between) and the Logit Residual Index (LRI) were examined through simulated data sets of varying sample sizes, test lengths, and distribution types. Three test lengths (25, 50, and 100), three sample sizes (25, 50, and 100), two item difficulty distributions (normal and uniform), and three levels of guessing (no guessing (0%), 25%, and 50%) were used in the simulations, resulting in 54 experimental conditions. The mean logit person ability for each experiment was +1. Each experimental condition was simulated once in an effort to approximate what could happen on the single administration of a four option per item multiple choice test to a group of relatively high ability persons. Previous research has shown that varying item and person parameters have no effect on Rasch fit statistics. Consequently, these parameters were used in the present study to establish realistic test conditions, but were not interpreted as effect factors in determining the results of this study.
Date: August 1997
Creator: Mount, Robert E. (Robert Earl)
System: The UNT Digital Library
Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples (open access)

Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples

This study examined the bias and precision of four residualized variable validity estimates (C0, C1, C2, C3) across a number of study conditions. Validity estimates that considered measurement error, correlations among error scores, and correlations between error scores and true scores (C3) performed the best, yielding no estimates that were practically significantly different than their respective population parameters, across study conditions. Validity estimates that considered measurement error and correlations among error scores (C2) did a good job in yielding unbiased, valid, and precise results. Only in a select number of study conditions were C2 estimates unable to be computed or produced results that had sufficient variance to affect interpretation of results. Validity estimates based on observed scores (C0) fared well in producing valid, precise, and unbiased results. Validity estimates based on observed scores that were only corrected for measurement error (C1) performed the worst. Not only did they not reliably produce estimates even when the level of modeled correlated error was low, C1 produced values higher than the theoretical limit of 1.0 across a number of study conditions. Estimates based on C1 also produced the greatest number of conditions that were practically significantly different than their population parameters.
Date: May 2013
Creator: Nimon, Kim
System: The UNT Digital Library
Ability Estimation Under Different Item Parameterization and Scoring Models (open access)

Ability Estimation Under Different Item Parameterization and Scoring Models

A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy …
Date: May 2002
Creator: Si, Ching-Fung B.
System: The UNT Digital Library
Establishing the utility of a classroom effectiveness index as a teacher accountability system. (open access)

Establishing the utility of a classroom effectiveness index as a teacher accountability system.

How to identify effective teachers who improve student achievement despite diverse student populations and school contexts is an ongoing discussion in public education. The need to show communities and parents how well teachers and schools improve student learning has led districts and states to seek a fair, equitable and valid measure of student growth using student achievement. This study investigated a two stage hierarchical model for estimating teacher effect on student achievement. This measure was entitled a Classroom Effectiveness Index (CEI). Consistency of this model over time, outlier influences in individual CEIs, variance among CEIs across four years, and correlations of second stage student residuals with first stage student residuals were analyzed. The statistical analysis used four years of student residual data from a state-mandated mathematics assessment (n=7086) and a state-mandated reading assessment (n=7572) aggregated by teacher. The study identified the following results. Four years of district grand slopes and grand intercepts were analyzed to show consistent results over time. Repeated measures analyses of grand slopes and intercepts in mathematics were statistically significant at the .01 level. Repeated measures analyses of grand slopes and intercepts in reading were not statistically significant. The analyses indicated consistent results over time for reading …
Date: May 2002
Creator: Bembry, Karen L.
System: The UNT Digital Library
Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability (open access)

Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability

Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared canonical correlation coefficient was most attenuated when compared to population Rc2 values. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on Rc2. The results from this Monte Carlo investigation clearly illustrated the importance of score reliability when interpreting study results. As evidenced by the outcomes presented, the more measurement error (lower reliability) present in the variables included in an analysis, the more attenuation experienced by the effect size(s) produced in the analysis, in this …
Date: August 2010
Creator: Wilson, Celia M.
System: The UNT Digital Library
The Supply and Demand of Physician Assistants in the United States: A Trend Analysis (open access)

The Supply and Demand of Physician Assistants in the United States: A Trend Analysis

The supply of non-physician clinicians (NPCs), such as physician assistant (PAs), could significantly influence demand requirements in medical workforce projections. This study predicts supply of and demand for PAs from 2006 to 2020. The PA supply model utilized the number of certified PAs, the educational capacity (at 10% and 25% expansion) with assumed attrition rates, and retirement assumptions. Gross domestic product (GDP) chained in 2000 dollar and US population were utilized in a transfer function trend analyses with the number of PAs as the dependent variable for the PA demand model. Historical analyses revealed strong correlations between GDP and US population with the number of PAs. The number of currently certified PAs represents approximately 75% of the projected demand. At 10% growth, the supply and demand equilibrium for PAs will be reached in 2012. A 25% increase in new entrants causes equilibrium to be met one year earlier. Robust application trends in PA education enrollment (2.2 applicants per seat for PAs is the same as for allopathic medical school applicants) support predicted increases. However, other implications for the PA educational institutions include recruitment and retention of qualified faculty, clinical site maintenance and diversity of matriculates. Further research on factors affecting …
Date: May 2007
Creator: Orcutt, Venetia L.
System: The UNT Digital Library
A Monte Carlo Study of the Robustness and Power of Analysis of Covariance Using Rank Transformation to Violation of Normality with Restricted Score Ranges for Selected Group Sizes (open access)

A Monte Carlo Study of the Robustness and Power of Analysis of Covariance Using Rank Transformation to Violation of Normality with Restricted Score Ranges for Selected Group Sizes

The study seeks to determine the robustness and power of parametric analysis of covariance and analysis of covariance using rank transformation to violation of the assumption of normality. The study employs a Monte Carlo simulation procedure with varying conditions of population distribution, group size, equality of group size, scale length, regression slope, and Y-intercept. The procedure was performed on raw data and ranked data with untied ranks and tied ranks.
Date: December 1984
Creator: Wongla, Ruangdet
System: The UNT Digital Library
The Effectiveness of a Mediating Structure for Writing Analysis Level Test Items From Text Based Instruction (open access)

The Effectiveness of a Mediating Structure for Writing Analysis Level Test Items From Text Based Instruction

This study is concerned with the effect of placing text into a mediated structure form upon the generation of test items for analysis level domain referenced test construction. The item writing methodology used is the linguistic (operationally defined) item writing technology developed by Bormuth, Finn, Roid, Haladyna and others. This item writing methodology is compared to 1) the intuitive method based on Bloom's definition of analysis level test questions and 2) the intuitive with keywords identified method of item writing. A mediated structure was developed by coordinating or subordinating sentences in an essay by following five simple grammatical rules. Three test writers each composed a ten-item test using each of the three methodologies based on a common essay. Tests were administered to 102 Composition 1 community college students. Students were asked to read the essay and complete one test form. Test forms by writer and method were randomly delivered. Analysis of variance showed no significant differences among either methods or writers. Item analysis showed no method of item writing resulting in items of consistent difficulty among test item writers. While the results of this study show no significant difference from the intuitive, traditional methods of item writing, analysis level test …
Date: August 1989
Creator: Brasel, Michael D. (Michael David)
System: The UNT Digital Library
A Hierarchical Regression Analysis of the Relationship Between Blog Reading, Online Political Activity, and Voting During the 2008 Presidential Campaign (open access)

A Hierarchical Regression Analysis of the Relationship Between Blog Reading, Online Political Activity, and Voting During the 2008 Presidential Campaign

The advent of the Internet has increased access to information and impacted many aspects of life, including politics. The present study utilized Pew Internet & American Life survey data from the November 2008 presidential election time period to investigate the degree to which political blog reading predicted online political discussion, online political participation, whether or not a person voted, and voting choice, over and above the predication that could be explained by demographic measures of age, education level, gender, income, marital status, race/ethnicity, and region. Ordinary least squares hierarchical regression revealed that political blog reading was positively and statistically significantly related to online political discussion and online political participation. Hierarchical logistic regression analysis indicated that the odds of a political blog reader voting were 1.98 the odds of a nonreader voting, but vote choice was not predicted by reading political blogs. These results are interpreted within the uses and gratifications framework and the understanding that blogs add an interpersonal communication aspect to a mass medium. As more people use blogs and the nature of the blog-reading audience shifts, continuing to track and describe the blog audience with valid measures will be important for researchers and practitioners alike. Subsequent potential effects …
Date: December 2010
Creator: Lewis, Mitzi
System: The UNT Digital Library
A Monte Carlo Analysis of Experimentwise and Comparisonwise Type I Error Rate of Six Specified Multiple Comparison Procedures When Applied to Small k's and Equal and Unequal Sample Sizes (open access)

A Monte Carlo Analysis of Experimentwise and Comparisonwise Type I Error Rate of Six Specified Multiple Comparison Procedures When Applied to Small k's and Equal and Unequal Sample Sizes

The problem of this study was to determine the differences in experimentwise and comparisonwise Type I error rate among six multiple comparison procedures when applied to twenty-eight combinations of normally distributed data. These were the Least Significant Difference, the Fisher-protected Least Significant Difference, the Student Newman-Keuls Test, the Duncan Multiple Range Test, the Tukey Honestly Significant Difference, and the Scheffe Significant Difference. The Spjøtvoll-Stoline and Tukey—Kramer HSD modifications were used for unequal n conditions. A Monte Carlo simulation was used for twenty-eight combinations of k and n. The scores were normally distributed (µ=100; σ=10). Specified multiple comparison procedures were applied under two conditions: (a) all experiments and (b) experiments in which the F-ratio was significant (0.05). Error counts were maintained over 1000 repetitions. The FLSD held experimentwise Type I error rate to nominal alpha for the complete null hypothesis. The FLSD was more sensitive to sample mean differences than the HSD while protecting against experimentwise error. The unprotected LSD was the only procedure to yield comparisonwise Type I error rate at nominal alpha. The SNK and MRT error rates fell between the FLSD and HSD rates. The SSD error rate was the most conservative. Use of the harmonic mean of …
Date: December 1985
Creator: Yount, William R.
System: The UNT Digital Library
The Analysis of the Accumulation of Type II Error in Multiple Comparisons for Specified Levels of Power to Violation of Normality with the Dunn-Bonferroni Procedure: a Monte Carlo Study (open access)

The Analysis of the Accumulation of Type II Error in Multiple Comparisons for Specified Levels of Power to Violation of Normality with the Dunn-Bonferroni Procedure: a Monte Carlo Study

The study seeks to determine the degree of accumulation of Type II error rates, while violating the assumptions of normality, for different specified levels of power among sample means. The study employs a Monte Carlo simulation procedure with three different specified levels of power, methodologies, and population distributions. On the basis of the comparisons of actual and observed error rates, the following conclusions appear to be appropriate. 1. Under the strict criteria for evaluation of the hypotheses, Type II experimentwise error does accumulate at a rate that the probability of accepting at least one null hypothesis in a family of tests, when in theory all of the alternate hypotheses are true, is high, precluding valid tests at the beginning of the study. 2. The Dunn-Bonferroni procedure of setting the critical value based on the beta value per contrast did not significantly reduce the probability of committing a Type II error in a family of tests. 3. The use of an adequate sample size and orthogonal contrasts, or limiting the number of pairwise comparisons to the number of means, is the best method to control for the accumulation of Type II errors. 4. The accumulation of Type II error is irrespective …
Date: August 1989
Creator: Powers-Prather, Bonnie Ann
System: The UNT Digital Library
The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices (open access)

The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The …
Date: May 1988
Creator: Dutschke, Cynthia F. (Cynthia Fleming)
System: The UNT Digital Library
A Comparison of Three Methods of Detecting Test Item Bias (open access)

A Comparison of Three Methods of Detecting Test Item Bias

This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three …
Date: May 1985
Creator: Monaco, Linda Gokey
System: The UNT Digital Library
The Effects of the Ratio of Utilized Predictors to Original Predictors on the Shrinkage of Multiple Correlation Coefficients (open access)

The Effects of the Ratio of Utilized Predictors to Original Predictors on the Shrinkage of Multiple Correlation Coefficients

This study dealt with shrinkage in multiple correlation coefficients computed for sample data when these coefficients are compared to the multiple correlation coefficients for populations and the effect of the ratio of utilized predictors to original predictors on the shrinkage in R square. The study sought to provide the rationale for selection of the shrinkage formula when the correlations between the predictors and the criterion are known and determine which of the three shrinkage formulas (Browne, Darlington, or Wherry) will yield the R square from sample data that is closest to the R square for the population data.
Date: August 1983
Creator: Petcharat, Prataung Parn
System: The UNT Digital Library
Comparison of Methods for Computation and Cumulation of Effect Sizes in Meta-Analysis (open access)

Comparison of Methods for Computation and Cumulation of Effect Sizes in Meta-Analysis

This study examined the statistical consequences of employing various methods of computing and cumulating effect sizes in meta-analysis. Six methods of computing effect size, and three techniques for combining study outcomes, were compared. Effect size metrics were calculated with one-group and pooled standardizing denominators, corrected for bias and for unreliability of measurement, and weighted by sample size and by sample variance. Cumulating techniques employed as units of analysis the effect size, the study, and an average study effect. In order to determine whether outcomes might vary with the size of the meta-analysis, mean effect sizes were also compared for two smaller subsets of studies. An existing meta-analysis of 60 studies examining the effectiveness of computer-based instruction was used as a data base for this investigation. Recomputation of the original study data under the six different effect size formulas showed no significant difference among the metrics. Maintaining the independence of the data by using only one effect size per study, whether a single or averaged effect, produced a higher mean effect size than averaging all effect sizes together, although the difference did not reach statistical significance. The sampling distribution of effect size means approached that of the population of 60 studies …
Date: December 1987
Creator: Ronco, Sharron L. (Sharron Lee)
System: The UNT Digital Library
A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas (open access)

A State-Wide Survey on the Utilization of Instructional Technology by Public School Districts in Texas

Effective utilization of instructional technology can provide a valuable method for the delivery of a school program, and enable a teacher to individualize according to student needs. Implementation of such a program is costly and requires careful planning and adequate staff development for school personnel. This study examined the degree of commitment by Texas school districts to the use of the latest technologies in their efforts to revolutionize education. Quantitative data were collected by using a survey that included five informational areas: (1) school district background, (2) funding for budget, (3) staff, (4) technology hardware, and (5) staff development. The study included 137 school districts representing the 5 University Interscholastic League (UIL) classifications (A through AAAAA). The survey was mailed to the school superintendents requesting that the persons most familiar with instructional technology be responsible for completing the questionnaires. Analysis of data examined the relationship between UIL classification and the amount of money expended on instructional technology. Correlation coefficients were determined between teachers receiving training in the use of technology and total personnel assigned to technology positions. Coefficients were calculated between a district providing a plan fortechnology and employment of a coordinator for instructional technology. Significance was established at the …
Date: May 1990
Creator: Hiett, Elmer D. (Elmer Donald)
System: The UNT Digital Library
An Empirical Investigation of Marascuilo's Ú₀ Test with Unequal Sample Sizes and Small Samples (open access)

An Empirical Investigation of Marascuilo's Ú₀ Test with Unequal Sample Sizes and Small Samples

The study seeks to determine the effect upon the Marascuilo Ú₀ statistic of violating the small sample assumption. The study employed a Monte Carlo simulation technique to vary the degree of sample size and unequal sample sizes within experiments to determine the effect of such conditions, Twenty-two simulations, with 1200 trials each, were used. The following conclusion appeared to be appropriate: The Marascuilo Ú₀ statistic should not be used with small sample sizes and it is recommended that the statistic be used only if sample sizes are larger than ten.
Date: August 1976
Creator: Milligan, Kenneth W.
System: The UNT Digital Library
Short-to-Medium Term Enrollment Projection Based on Cycle Regression Analysis (open access)

Short-to-Medium Term Enrollment Projection Based on Cycle Regression Analysis

Short-to-medium projections were made of student semester credit hour enrollments for North Texas State University and the Texas Public and Senior Colleges and Universities (as defined by the Coordinating Board, Texas College and University System). Undergraduate, Graduate, Doctorate, Total, Education, Liberal Arts, and Business enrollments were projected. Fall + Spring, Fall, Summer I + Summer II, Summer I were time periods for which projections were made. A new regression analysis called "cycle regression" which employs nonlinear regression techniques to extract multifrequential phenomena from time-series data was employed for the analysis of the enrollment data. The heuristic steps employed in cycle regression analysis are similar to those used in fitting polynomial models. A trend line and one or more sin waves (cycles) are simultaneously estimated using a partial F test. The process of adding cycle(s) to the model continues until no more significant terms can be estimated.
Date: August 1983
Creator: Chizari, Mohammad
System: The UNT Digital Library
A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test (open access)

A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling …
Date: May 1987
Creator: Somboon Suriyawongse
System: The UNT Digital Library
A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables (open access)

A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.
Date: May 1987
Creator: Mullen, Jerry D. (Jerry Davis)
System: The UNT Digital Library