MODIFICATION OF SCORING SCHEMES USING DECOMPOSITION PROCEDURES ON STATISTICAL DATA

This paper presents a method of modifying original scores to obtain independent random variables. It includes an analysis of the consequences of using such a method. The paper also describes the mathematical background of the method in detail and discusses the possible use of the method in identifying student or participant assessments that are overor underrated. The method distinguishes performances of students and assesses their written solutions using a scoring scheme. In this study, it is used to analyze the competence of participants in the Physics Olympiad competition. Scoring schemes that are appropriately set by an author for a physics problem present the participant scores as independent random variables. The assessment solutions are analyzed using analytical tools (such as covariant matrix) for the dependence of random variables. The evaluators of the participants’ solutions were highly qualified professionals. Nevertheless, the study found statistical evidence of minor distortion in the evaluations, though this was found to only marginally affected the ranking of participants. UDC Classification: 37.01/.02; DOI: http://dx.doi.org/10.12955/cbup.v5.997


Introduction
Assessment of students plays an important role in physics education as it provides an essential feedback for all its participants, including instructors and students, as well as participants in various physics competitions. One of the most-often used assessment methods is evaluating written solutions of physics problems and tasks. According to Gaigher (2007), problem-solving is considered a reliable way to demonstrate conceptual understanding in physics for purposes of evaluation. Physics Olympiad (PhO) is a worldwide competition where PhO participants solve physics problems, and their performance is evaluated according to their written solutions. Its history in Slovakia started in the school year 1958-59 and has been an important part of Slovakia's education system since then. The main objectives of PhO in Slovakia, as defined by the Committee of PhO, are to develop the problem-solving and experimental skills of primary and secondary school students talented in physics, create competitive environments for them, and encourage them to study physics or a related science (Slovak Committee of Physics Olympiad, 2010). An assessment of each PhO participant is crucial as it influences the participant's results. Currently, the PhO participants are rated by points (maximum of 10 for a physics problem), which they achieve according to the scoring scheme proposed by the authors of the physics problems. Each physics problem is divided into tasks that are scored separately. For example, a score for the solution of theth task is defined as a random variable and scored as either '0' for an incorrect or '1' for the correct answer. In general case, ∈ [0,1]for example, one can assign the value 0.5 to any partially correct solution which is nor absolutely incorrect nor absolutely correct. This scaling is used rather than specific scores because of its universality. This paper describes the basics of a statistical method to quantify participant assessments using data and results of solutions provided by the participants of the competition, PhO.

Data and Methodology
The first step was to analyze a case using a proposed modification to the scoring of a single physics problem. That is, we analyzed the effect on scores and rankings of a particular PhO participant in regard to solving one specific physics problem. The modified overall scores gained for solving all four physics problems in each category were determined as the sum of modified scores gained for particular physics problems. We defined scores of for solving the -th task (upper index) from a set of ' ' tasks as random (column) vector variable, where the -th element is a random variable for solving the j-th task by the -th participant (lower index). We assumed that the random variables, , were defined on the same probability space (Ω, , ), where Ω denoted the set of all possible outcomes (containing all possible solutions of the task, including all incorrect, incomplete, and correct solutions). Therefore, it was accepted that Ω was the same for all tasks and all participants. A set of subsets (events) of Ω was defined as, , i.e., an event was a particular solution of the task. The measure of probability, , mapping any event into the interval [0,1], i.e., (ℱ) ∈ [0,1] for any ℱ ∈ . In practice, the participants were a limited ensemble with their solutions assessed by scores, where X k j was (ℱ ) , where ℱ was the event, (ℱ ), representing the solution of -th task by the -th participant, and was the probability measure by the author of the scoring scheme and evaluators. The probability of the measure was distorted by inadequacies of the authors scoring scheme and by unsatisfactory decisions of evaluators. In the case of the PhO, this influence was relatively small due to the professionality of authors and evaluators. Mathematical statistical tools identified strong evidence of the limitations in the author's scoring scheme, and in some cases, probable anomalies in the assessment of the evaluators (Hanáková, & Teleki, 2017). The basic concept was that the random vector, , as an independent variable provided objective scoring for the PhO participants with the same knowledge or skills. However, Pearson's correlation coefficients (Evans, 1996;Markechová, Stehlíková, & Tirpáková, 2011;Spiegel, 1998) of random vectors showed that many were dependent variables. Following this, a covariant matrix, Σ , was created to analyze variables and . This method was selected in place of the Pearson's correlation coefficients to avoid nonlinear equations in the optimization procedure. In the optimization procedure, the scoring scheme was modified to obtain independent random variables, . The modified random variables corresponded to the modified tasks. However, examining these modified tasks in their explicit forms was a highly complex problem and beyond the purpose of this article. The covariance matrix, , was defined by matrix elements as follows: where, E[X j ] denoted the expected value of , and We estimated E[X j ] by the average of 1 ∑ =1 and therefore, the covariance matrix elements were estimated using Equation 3.
where, = number of assessed pupils and students. The covariance matrix , is a real and symmetric matrix, i.e., it is a Hermitian matrix (Horn and Johnson, 1985) and therefore is diagonalizable by an orthogonal transformation , The transformation, A, was not unique in the manner described below and defined the above mentioned new probability vectors, (Equation 5a). = ∑ =1 (5a) The inverse transformation = −1 = , ( ) = was = Kronecker-delta ( = 1 and = 0 for ≠ 0).
It was convenient to save a different notation for the matrix and its inverse and show, that The diagonalization procedure defined the new random vectors, , with a chance that they were statistically independent (Σ = 0 for all ≠ ). It was assumed that they were independent random vectors.
The transformation matrix was obtained by the Schur decomposition procedure (Horn and Johnson, 1985), which solved the eigenvalue equation: where was a normalized eigenvector (| | = 1) associated with eigenvalue . The set of eigenvectors formed an orthonormal basis in the s -dimensional vector space. Multiplying Equation 9 by provided the matrix elements of the diagonal covariance matrix (Equation 10).
The following defined the transformation matrix : The normalized eigenvectors are given up to a multiplicative factor , (| | = 1). Also, the order of the eigenvectors is arbitrary.
(15) At this point the transformation matrix and its inverse was unambiguous.
The new score of the -th participant gained in the -th task was calculated (Equation 16).

= ∑ ∈
The full score, Ξ , for the k-th participant was formed from Equation 17.
Notably, the probability vectors, , were scalable in the following meaning: the new probability vectors, as defined by Equation 18, had also diagonal covariant matrix, cov( , ) and where the -s represented independent probability variables, -s, represented independent probability variables too. = This scaling property was critical in the forming the new scoring schemes. A given scoring scheme can motivate participants (or students) by lowering the maximal value of the scoring in tasks, which were excessively difficult for the participant. Another scoring scheme could emphasize the participants with the highest competency. The above-described tool was tested successfully on participants of PhO in 2016, and the rescaling of was defined as follows: In this equation, 10/ ( ) was a scaling factor with 10 the maximum original score for complete and accurate solving of a physics problem. The appropriate choice of the value, ( ), guaranteed the same maximum of 10 for the modified assessment score, calculated by using -s, (i.e., ∑ max = 10).
The influence of modifying the overall ranking of PhO participants was analyzed by calculating the differences between the original and the modified scores (Δ ) and rankings of the PhO participants (Δ ). Linear correlations between random variables X j and X k were identified from the values of the covariance matrix in Equation 1. The described modification procedure was applied to obtain independent random variables (modified scores of PhO participants). The effects of modification on results of particular PhO participants showed differences in scores (Δ ) and ranking (order) (Δ ). Figure 1 shows the maximum differences in scores corresponding to maximum differences in ranking and the frequency of modified cases with non-zero differences in scores 1 (Δ ) or ranking 1 (Δ ) per one participant. The effects of modification on results of particular PhO participants were analyzed to observe: ▪ differences in scores ∆ and ranking (order) ∆ ; ▪ whether the maximum differences in scores correspond to maximum differences in ranking; ▪ frequency of modified cases with non-zero differences in scores 1 (Δ ) or ranking 1 (Δ ) per one participant.

Results and Discussion
Quantitative results of characteristics of the analysis for four physics problems solved in D and E categories are presented in Figure 1 and Figure 2, and in more detail, can be found in Hanáková and Teleki (2017). The PhO participants were ranked according to their original overall scores, with a score of one denoting the best.  The maximum difference in scores in E category is ∆ = +3, and the maximum difference in ranking ( Figure 2) is ∆ = +20 (which indicates an improvement for the PhO participant after modification of scores). The frequencies characterized above have the values 1 (Δ ≥ 0)= 2.8; 1 (Δ ≥ 0) = 1.7. Non-zero differences in ranking were observed for each physics problem. These results show that in observed cases a higher frequency of differences in the scores 1 (Δ ≥ 0) resulted in a higher frequency of differences in the ranking 1 (Δ ≥ 0). The modified overall scores of the PhO participants in D and E categories increased when compared with the original scores, in all cases (Figure 3). We assume that this could be explained by non-zero linear correlations between original scores (random variables).

Conclusion
The performance of the PhO participants was quantified and then compared according to the scoring scheme applied for assessing their written solutions to the physics problems. The main objective of this article was to underline the need for improving the scoring scheme and to present and describe a statistical tool that could provide a more objective assessment of participant solutions to the physics problems. We found non-zero linear correlations between the probability vectors and that were determined using a covariance matrix. This result was considered relevant towards proposing a modification of the scores to provide independent random variables. The study identified certain cases of original scores changing after modification in the range of -0.4 to 3.0 points for individual physics problems. The ranking of the PhO participants also changed after modification in the range of -12 to +20 places for particular physics problems. As the modification by way of the proposed tool was focused on the results of PhO participants, the context of the modified random variables remains a matter for future research. This paper describes, in detail, a proposed statistical tool as a basis for developing a suitable method of modifying scores as well as analyzing over-or underrating of performance in solving physics problems. Finally, because of its universality, this tool can be applied not only to the Physics Olympiad but also to other cases where solutions to physics problems are assessed.