Severity-Leniency in Writing Assessment and Its Causal Factors

Nur  Azizah; Muchlas  Suseno; Bahrul   Hayat

doi:10.18502/kss.v4i14.7870

Authors

Nur Azizah Educational Research and Evaluation Program, State University of Jakarta (UNJ), Rawamangun, Jakarta
Muchlas Suseno Department of English Language and Literature, State University of Jakarta (UNJ), Rawamangun, Jakarta
Bahrul Hayat Departement of Psychology, UIN Syarif Hidayatullah Jakarta, Jakarta

DOI:

https://doi.org/10.18502/kss.v4i14.7870

Abstract

Objectivity in a writing assessment is a highly important matter because it determines the validity and reliability of the writing assessment itself. However, subjectivity in the writing assessment is inevitable. Rater subjectivity in the assessment can lead to assessment difference or variability (severity-leniency) eventually reducing the
validity and reliability of the assessment. Therefore, the concept of rater variability, severity-leniency, and its causal factors are required to perceive so that efforts to minimize assessment variability, for instance, severity-leniency, can be implemented. The research aimed at providing a brief description of rater variability, namely severity leniency, and the causal factors, namely rater’s background, criteria, method, and assessment scale. The method used in this research was literature study which source was originated from the articles of scientific journals and books related to the research topic, namely rater variability (severity-leniency) and its causal factors. The result of the discussion in this paper showed that the selection of experienced rater, coherent and cohesive assessment criteria, method appropriate to the aim of the research, and brief assessment scale were considered to be able to minimize the variability
(severity-leniency) of the rater.

Keywords: leniency, rubric, severity, variability, writing assessment

References

Dunsmuir, S., & Clifford, V. (2003). Children’s Writing and the use of ICT. Educational Psychology in Practice, vol.19, pp. 171–187.

Circular Letter of Dirjen Dikti No. 152/E/T/2012

Crooks, Terry & Kane, Michael & Cohen, Allan. (1996). Threats to the Valid Use of Assessments. Assessment in Education: Principles, Policy & Practice. vol. 3, pp. 265-286.

Messics, S. (May 1996). Validity and wahsback in language testing (Report No. RR-97-17, pp. 241-256). Educational Testing Service.

Schaefer, B., Fricke, S., Szczerbinski, M., Fox-Boyer, A. V., Stackhouse, J., & Wells, B. (2009). Development of a test battery for assessing phonological awareness in German-speaking children. Clinical Linguistics and Phonetics, vol. 23, pp. 404-430.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt: Peter Lang.

Herrington, A., & Curtis, M. (2003). Writing development in the college years: By whose definition? College Composition and Communication, vol. 55, pp. 69–90.

Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity / Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102

Linacre, J. M. (1989). Many faceted Rasch measurement. Chicago: MESA Press. Linacre,

Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity / Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102.

Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity/Leniency and Teacher Assessment. JALT Journal, vol. 34, pp. 79-102.

Engelhard, G. (2012). Examining Rating Quality in Writing Assessment: Rater Agreement, Error, and Accuracy. Journal of Applied Measurement, vol. 13, pp. 321-35.

Knoch, U., Read, J., & Randow, J. Von. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, vol. 12, pp. 26-43.

Engelhard, G. Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, vol. 31, pp. 93–112.

Engelhard, G. (2012). Examining Rating Quality in Writing Assessment: Rater Agreement, Error, and Accuracy. Journal of Applied Measurement, vol. 13, pp. 321-35.

Knoch, U., Read, J., & Randow, J. Von. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, vol. 12, pp. 26-43.

Huang, J., & Foote, C. J. (2010). Grading Between the Lines: What Really Impacts Professors ’ Holistic Evaluation of ESL Graduate Student Writing? Grading Between the Lines: What Really Impacts Professors’ Holistic Evaluation of ESL Graduate Student Writing? Language Assessment Quarterly, vol. 7, pp. 219–233

Shohamy, E., Aviv, T., Gordon, C. M., Aviv, T., & Kraemer, R. (1992). The Effect of Raters ’ Background and Training on the Reliability of Direct Writing Tests.The Modern Language Journal, vol. 76, pp. 27-33.

Mcnamara, D. S., & Kintsch, E. (2009). Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text Are Good Texts Always Better? Cognition and Instruction, vol. 14, pp. 1-43.

Barrett, Steven. (2001). The impact of training on rater variability. International Education Journal, vol. 2, pp. 49-58.

Pula, J. J., & Huot, B. A. (1993). A model of background influences on holistic raters. In M. M.Williamson and B. A. Huot, Validating holistic scoring for writing assessment. Theoretical and empirical founaiztions (pp. 23765). Cresskill, NJ: Hampton Press, Inc.

Cumming, A. (1990). Language Testing Expertise in Evaluating Second Language Compositions. Language Testing, vol. 7, pp. 31-51.

Attali, J., Benaissa, A., Soize, S., Kadziolka, K., Portefaix, C., & Pierot, L. (2014). Follow-up of intracranial aneurysms treated by flow diverter: comparison of three-dimensional time-of-flight MR angiography (3D-TOF-MRA) and contrast-enhanced MR angiography (CE-MRA) sequences with digital subtraction angiography as the gold stand. J NeuroIntervent Surg, pp. 1–6.

Besterfield-Sacre, M., Gerchak, J., Lyons, M. R., Shuman, L. J., & Wolfe, H. (2004). Scoring concept maps: An integrated rubric for assessing engineering education. Journal of Engineering Education, vol. 93, pp. 105–115.

Cumming, A. (1990). Language Testing Expertise in Evaluating Second Language Compositions. Language Testing, vol. 7, pp. 31-51.

Wolfe, E. W. (1998). A two-parameter logistic rater model (2PLRM): Detecting rater harshness and centrality. Paper presented at the annual meeting of the American Educational Research Association. San Diego, CA.

Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking and ESL students? Journal of Second Language Writing, vol. 5, pp. 163-182.

Popham, W. J. (2010). Instructional sensitivity. In W. J. Popham (Ed.), Everything school leaders need to know about assessment. Thousand Oaks, CA: Sage.

Brookhart, S. M. (2001). Developing Measurement Theory for Classroom Assessment Purposes and U s e s. ASCD: Virginia, USA.

Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Mertler, Craig A. Research & Evaluation, vol. 7, pp. 1–10.

Stevens, D. D., & Levi, A. J. (2005). Introduction to rubrics: An assessment tool to save grading time, convey effective feedback and promote student learning. Sterling, VA; Stylus Publishing.

Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, N.J.: Pearson Merrill Prentice Hall.