Severity-Leniency in Writing Assessment and Its Causal Factors

Abstract

Objectivity in a writing assessment is a highly important matter because it determines the validity and reliability of the writing assessment itself. However, subjectivity in the writing assessment is inevitable. Rater subjectivity in the assessment can lead to assessment difference or variability (severity-leniency) eventually reducing the
validity and reliability of the assessment. Therefore, the concept of rater variability, severity-leniency, and its causal factors are required to perceive so that efforts to minimize assessment variability, for instance, severity-leniency, can be implemented. The research aimed at providing a brief description of rater variability, namely severity leniency, and the causal factors, namely rater’s background, criteria, method, and assessment scale. The method used in this research was literature study which source was originated from the articles of scientific journals and books related to the research topic, namely rater variability (severity-leniency) and its causal factors. The result of the discussion in this paper showed that the selection of experienced rater, coherent and cohesive assessment criteria, method appropriate to the aim of the research, and brief assessment scale were considered to be able to minimize the variability
(severity-leniency) of the rater.



Keywords: leniency, rubric, severity, variability, writing assessment

References
[1] Dunsmuir, S., & Clifford, V. (2003). Children’s Writing and the use of ICT. Educational Psychology in Practice, vol.19, pp. 171–187.

[2] Circular Letter of Dirjen Dikti No. 152/E/T/2012

[3] Crooks, Terry & Kane, Michael & Cohen, Allan. (1996). Threats to the Valid Use of Assessments. Assessment in Education: Principles, Policy & Practice. vol. 3, pp. 265-286.

[4] Messics, S. (May 1996). Validity and wahsback in language testing (Report No. RR-97-17, pp. 241-256). Educational Testing Service.

[5] Schaefer, B., Fricke, S., Szczerbinski, M., Fox-Boyer, A. V., Stackhouse, J., & Wells, B. (2009). Development of a test battery for assessing phonological awareness in German-speaking children. Clinical Linguistics and Phonetics, vol. 23, pp. 404-430.

[6] Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt: Peter Lang.

[7] Herrington, A., & Curtis, M. (2003). Writing development in the college years: By whose definition? College Composition and Communication, vol. 55, pp. 69–90.

[8] Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity / Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102

[9] Linacre, J. M. (1989). Many faceted Rasch measurement. Chicago: MESA Press. Linacre,

[10] Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity / Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102.

[11] Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity/Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102.

[12] Engelhard, G. (2012). Examining Rating Quality in Writing Assessment: Rater Agreement, Error, and Accuracy. Journal of Applied Measurement, vol. 13, pp. 321-35.

[13] Knoch, U., Read, J., & Randow, J. Von. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, vol. 12, pp. 26-43.

[14] Engelhard, G. Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, vol. 31, pp. 93–112.

[15] Engelhard, G. (2012). Examining Rating Quality in Writing Assessment: Rater Agreement, Error, and Accuracy. Journal of Applied Measurement, vol. 13, pp. 321-35.

[16] Knoch, U., Read, J., & Randow, J. Von. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, vol. 12, pp. 26-43.

[17] Huang, J., & Foote, C. J. (2010). Grading Between the Lines: What Really Impacts Professors ’ Holistic Evaluation of ESL Graduate Student Writing? Grading Between the Lines: What Really Impacts Professors’ Holistic Evaluation of ESL Graduate Student Writing? Language Assessment Quarterly, vol. 7, pp. 219–233

[18] Shohamy, E., Aviv, T., Gordon, C. M., Aviv, T., & Kraemer, R. (1992). The Effect of Raters ’ Background and Training on the Reliability of Direct Writing Tests.The Modern Language Journal, vol. 76, pp. 27-33.

[19] Mcnamara, D. S., & Kintsch, E. (2009). Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text Are Good Texts Always Better? Cognition and Instruction, vol. 14, pp. 1-43.

[20] Barrett, Steven. (2001). The impact of training on rater variability. International Education Journal, vol. 2, pp. 49-58.

[21] Pula, J. J., & Huot, B. A. (1993). A model of background influences on holistic raters. In M. M.Williamson and B. A. Huot, Validating holistic scoring for writing assessment. Theoretical and empirical founaiztions (pp. 23765). Cresskill, NJ: Hampton Press, Inc.

[22] Cumming, A. (1990). Language Testing Expertise in Evaluating Second Language Compositions. Language Testing, vol. 7, pp. 31-51.

[23] Attali, J., Benaissa, A., Soize, S., Kadziolka, K., Portefaix, C., & Pierot, L. (2014). Follow-up of intracranial aneurysms treated by flow diverter: comparison of three-dimensional time-of-flight MR angiography (3D-TOF-MRA) and contrast-enhanced MR angiography (CE-MRA) sequences with digital subtraction angiography as the gold stand. J NeuroIntervent Surg, pp. 1–6.

[24] Besterfield-Sacre, M., Gerchak, J., Lyons, M. R., Shuman, L. J., & Wolfe, H. (2004). Scoring concept maps: An integrated rubric for assessing engineering education. Journal of Engineering Education, vol. 93, pp. 105–115.

[25] Cumming, A. (1990). Language Testing Expertise in Evaluating Second Language Compositions. Language Testing, vol. 7, pp. 31-51.

[26] Wolfe, E. W. (1998). A two-parameter logistic rater model (2PLRM): Detecting rater harshness and centrality. Paper presented at the annual meeting of the American Educational Research Association. San Diego, CA.

[27] Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking and ESL students? Journal of Second Language Writing, vol. 5, pp. 163-182.

[28] Popham, W. J. (2010). Instructional sensitivity. In W. J. Popham (Ed.), Everything school leaders need to know about assessment. Thousand Oaks, CA: Sage.

[29] Brookhart, S. M. (2001). Developing Measurement Theory for Classroom Assessment Purposes and U s e s. ASCD: Virginia, USA.

[30] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research & Evaluation, vol. 7, pp. 1–10.

[31] Stevens, D. D., & Levi, A. J. (2005). Introduction to rubrics: An assessment tool to save grading time, convey effective feedback and promote student learning. Sterling, VA; Stylus Publishing.

[32] Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.

[33] Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.

[34] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research Evaluation, vol. 7, pp. 1–10.

[35] Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.

[36] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research Evaluation, vol. 7, pp. 1–10.

[37] Dunbar, N. E., Brooks, C. F., & Kubicka-miller, T. (2006). Oral Communication Skills in Higher Education: Using a Performance-Based Evaluation Rubric to Assess Communication Skills 1. Innovate Higher Education, vol. 31, pp. 115–128.

[38] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research Evaluation, vol. 7, pp. 1–10.

[39] Arter, J. & McTighe, J. (2001). Scoring Rubrics in the Classroom. Thousand Oaks, CA: Corwin Press.

[40] Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, vol. 2, pp. 130–144.

[41] Brookhart, S. M. (2001). Developing Measurement Theory for Classroom Assessment Purposes and U s e s. ASCD: Virginia, USA.

[42] Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.

[43] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research Evaluation, vol. 7, pp. 1–10.

[44] Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.

[45] Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research Evaluation, vol. 7, pp. 1–10.

[46] Stevens, D. D., & Levi, A. J. (2005). Introduction to rubrics: An assessment tool to save grading time, convey effective feedback and promote student learning. Sterling, VA; Stylus Publishing.

[47] Sumintono, B. (2014). Model Rasch untuk Penelitian Sosial Kuantitatif. Papers of Public Lecture at Department of Statistics, ITS Surabaya, 21 November 2014, pp. 1–9.

[48] Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting the focus from teaching to learning. Boston: Allyn and Bacon.

[49] Jacobs, H.L. et al. (1981). Testing ESL Composition: A Practical Approach. New York: Newbury House Publishers,

[50] Hamp-Lyons, Liz. (1992). Holistic Writing Assessment of LEP Students. Proceedings of the National Research Symposium on Limited English Proficient Student Issues (2nd, Washington, DC, September 4-6, 1991)

[51] Andrade, H. L., Du, Y., & Mycek, K. (2010). Assessment in Education: Principles, Policy & Practice Rubric − referenced self − assessment and middle school students ’ writing. Assessment in Education, vol.17, pp.199-214.

[52] McNamara, T. F. (1996). Measuring second language performance. London: Longman.

[53] Wang, Binhong. (2010). On Rater Agreement and Rater Training. English Language Teaching. English Language Teaching, vol. 3, pp. 108-112.

[54] Stevens, D. D., & Levi, A. J. (2005). Introduction to rubrics: An assessment tool to save grading time, convey effective feedback and promote student learning. Sterling, VA; Stylus Publishing.