Application of Alternative IRT models to IRT Assumption Violation

  • Authors

    • Yoon Jiyoung
    • Lee Yoonsun
    https://doi.org/10.14419/ijet.v8i1.4.25454
  • IRT assumption, MIRT, bi-factor model, testlet based model, second-order IRT model.
  • The purpose of this study is to investigate the most appropriate alternative IRT parameter estimation models among bi-factor model, testlet based model, and second-order IRT model when the IRT assumptions are not met. A simulation study was conducted to compare the alternative IRT parameter estimation models when the assumptions are unsatisfied. First, the comparison of the IRT models using the simulation data set without the satisfaction of the unidimensionality assumption indicated that bi-factor IRT model appeared the best fitting model. Second, when the local independency assumption was violated, the testlet based model appeared the best fitting model. The results of this study indicated it is necessary to estimate alternative IRT models by going through the process of anticipating the possibility of IRT assumptions violation due to the test forms and domains of their contents. This study also suggested that such a process will provide the basis for applying the IRT more precisely in order to estimate the capability of item and person characteristics

     

  • References

    1. [1] Adams, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multinominal logit model. Applied Psychological Measurement, 21(1), 1-23.

      [2] Bolt, D. (1999). Evaluating the effects of multidimensionality on IRT true-score equating. Applied Measurement in Education, 12, 383–407.

      [3] Cai, L. (2012). flexMIRT. Flexible multilevel item factor analysis and test scoring [Computer software]. Seattle, WA: Vector Psychometric.

      [4] Cai, L. (2013). Lord-Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing (CRESST report 830). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standard, and Student Testing (CRESST).

      [5] Choi, S. I. (2010). An application of full information item factor analysis to the reading comprehension of a TOEIC practice test. Journal of Educational Evaluation, 23(3), 709-734.

      [6] de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.

      [7] De Champlain, A. F. (1996). The effect of multidimensionality on IRT true-score equating for subgroups of examinees. Journal of Educational Measurement, 33, 181–201.

      [8] DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168.

      [9] DeMars, C. E. (2010). Item response theory. New York: Oxford Press.

      [10] Gibbons, R. D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423-436.

      [11] Goldstein, H. (1980). Dimensionality, bias independence, and measurement. British Journal of Mathematical and Statistical Psychology, 33, 234-246.

      [12] Jang, E. E. & Roussos, L. (2007). An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach. Journal of Educational Measurement, 44(1), 1-21.

      [13] Jannarone R. J. (1986). Conjunctive item response theory kernels. Psychometrika, 51(3), 357-373.

      [14] Janssen, R., Tuerlinckx, F., Meulders, M., & de Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25(3), 285-306.

      [15] Jiao, H., Kamata, A., Wang, S., & Jin, Y. (2012). Multilevel testlet model for dual local dependence. Journal of Educational Measurement, 49(1), 82-100.

      [16] Jiao, H., Wang, S., & He, W. (2013). Estimation methods for one-parameter testlet models. Journal of educational measurement, 50(2), 186-203.

      [17] Li, Y., Bolt, D. M., & Fu, J. (2006) A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3-21.

      [18] Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test score. Reading MA: Addison-Wesley.

      [19] Mair, P., & Hatzinger, R. (2007). Extended rasch modeling: the R package for the application of IRT models in R. Journal of statistical software, 20(9), 1-20.

      [20] McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.

      [21] Md Desa, Z. N. (2012). bi-factor multidimensional item response theory modeling for subscores estimation, reliability, and classification. Unpublished doctoral dissertation, University of Kansas.

      [22] Ochard, T., & Woodbury, M. A. (1972). A missing infromation principle: Theory and application. In L M. LeCam, J. Neyman, & E. L. Scott (Eds.), Proceedings of the sixth Berkerly symposium on mathematical statistics and probability (Vol. 1, pp. 697-715). Berkeley: University of California Press.

      [23] Oshima, T. C, & Miller, M. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16, 237–248.

      [24] Paek, I., & Cai, L. (2013). A comparison of item parameter standard error estimation procedures for unidimensioanl and multidimensional item response theory modeling. Educational and Psychological Measurement, XX(X), 1-19.

      [25] Park, C. (2010). A comparative study of IRT models for locally dependent reading test items by ESL leaners. Journal of Educational Evaluation, 23(2), 529-546.

      [26] Pommerich, M., & Segall, D. O. (2008). Local dependence in an operational CAT: Diagnosis and implications. Journal of Educational Measurement, 45(3), 201-223.

      [27] R Development Core Team. (2012). R: A language and environment for statistical computing[Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved October, 10, 2014, from http://www.R-project.org

      [28] Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9(4), 401-412.

      [29] Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.

      [30] Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the Testlet, and a Second-order multidimensional IRT model. Journal of Educational Measurement, 47(3), 361-372.

      [31] Schmid, J. & Leiman, J. M. (1957). The development of hierarichival factor solutions. Psychometrika, 22(1), 53-61.

      [32] Shedl, M., Gordon, A., Carey, P. A., & Tang, K. L. (1996). An analysis of the dimensionality of TOEFL reading comprehension items (TOEFL Research Report No. 53). Princeton, NJ: Educational Testing Service.

      [33] Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.

      [34] Tao, W. (2008). Using the score-based testlet method to handle local item dependence. Unpublished Doctoral Dissertation. University of Boston College. Department of Educational Research, Measurement, and Evaluation.

      [35] Tate, R. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17, 89–112.

      [36] Thurstone, L. (1947). Multiple-factor analysis. Technical report, University of Chicago, Chicago, IL.

      [37] Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Tests as an example. Applied Measurement in Education, 8, 157-186.

      [38] Wainer, H., & Wang, C. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.

      [39] Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.

      [40] Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general bayesian model for testlets: theory and applications (ETS RR-02-02). Princeton, NJ: Educational Testing Service.

      [41] Wiberg, M. (2012). Can a multidimensional test be evaluated with unidimensional item response theory? Educational Research and Evaluation: An International Journal on Theory and Practice. 18(4), 307-320.

      [42] Wilson, K. (2000). An exploratory dimensionality assessment of the TOEIC test (ETS RR-00-14). Princeton, NJ: Educational Testing Service.

      [43] Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.

      [44] Yi, H. S. (2005). A method for estimating classification consistency of alternate forms under equating situations. Unpublished Doctoral Dissertation. University of Iowa. Department of Educational Measurement, and Statistics.

      [45] Yoon, J. Y. (2017). Comparing alternative IRT parameter estimation models based on IRT assumption. International Journal of Internet of Things and Big Data. 2(2), 1-6. (Proceeding)

  • Downloads

  • How to Cite

    Jiyoung, Y., & Yoonsun, L. (2019). Application of Alternative IRT models to IRT Assumption Violation. International Journal of Engineering & Technology, 8(1.4), 437-447. https://doi.org/10.14419/ijet.v8i1.4.25454