STATISTICAL PERFORMANCE OF ?̅? AND R CONTROL CHARTS FOR SKEWED DISTRIBUTION-CASE STUDY

The purpose of the article is to determine the Type I error and Average Run Length values for charts ?̅? and R, for which control limits have been determined based on the Skewness Correction method (SC method), with an unknown probability distribution of the qualitative feature being tested. The study also used the Monte Carlo Simulation, in which two sampling methods were used to obtain random input scenarios matching theoretical distributions (selected skewed distributions) and bootstrap resampling based on a manufacturing company’s measurement data. The presented article is a continuation of Czabak-Górska's (2016) research. The purpose of the article was to determine Type I error value and ARL type A for chart ?̅? and R, for which the control limits were determined based on the skewness correction method. For this purpose, measurement data from a company producing car seat frames. Presented case study showed that the chart determined using the skewness correction method works better for the data described by the gamma or log-normal distribution. This, in turn, may suggest that appropriate distribution was selected for the presented data, thanks to which it is possible to determine the course and nature of the process, which is important from the point of view of its further analysis, e.g. in terms of the process capability. UDC Classification: 338.3; DOI: http://dx.doi.org/10.12955/cbup.v6.1293


Introduction
Control charts are still an easy and effective tool in Statistical Process Control (SPC). The most common charts used in the industry are still the mean ( ̅ ) and range (R) charts. Traditional control charts are based on the assumption that the distribution of the controlled feature / characteristic is Gaussian (normal). As a consequence, Type I error, within the natural variability of the ±3σ process, in the case of the controlling process for this type of charts, it amounts to 0.27% and 0.46% for a five-element sample according to the research of Chan and Cui (2003). However, Karagöz and Hamurkaroglu (2012) pointed out that the use of control limits calculated on the basis of formulas proposed by Shewhart, in the case of skewed variation of the examined feature / characteristic, increases the Type I Error defined as the probability of a false signal about the destabilization of the process in the case of the controlled processes. This, in turn, may lead to a situation in which unnecessary corrective actions will be introduced, which may actually cause its deregulation. The purpose of the article is to determine the values of Type I error and Average Run Length (ARL) for chart ̅ and R, for which control limits have been set based on the Skewness Correction method (SC method), with an unknown probability distribution of the examined qualitative feature. The study also uses the Monte Carlo Simulation method, in which two sampling methods were used to obtain random input scenarios -matching theoretical distributions (selected skewed distributions) and bootstrap resampling, based on measurement data from a production company. The presented article is a continuation of Czabak-Górska's research on SPC in the event that the measurement data are skewed (Czabak-Górska, 2016).

Assumptions and study method
The study was carried out in an enterprise that manufactures car seat frames. The measurements concerning the analyzed characteristics were taken in accordance with internal procedures for six consecutive days and concerned the length of the pipe after its formation (for more details on the course of the process analysis, see Czabak-Górska's (2016) work. In the first stage of the study, the measurement data were analyzed in terms of the possibility of their description with the use of a normal distribution and characteristics which show the nature of the skewed distribution were selected. Next, control limits were determined according to the method of skewness correction in accordance with the assumptions of the stabilization method, which was described, for example, by Łuszczak and Matuszak-Flejszman (2007), thanks to which it was guaranteed that the process is statistically stable (there are no assignable causes, only natural causes of variation). In addition, one rule was adopted, which indicates the appearance of a signal of possible deregulation of the process, in the form of crossing the control limits (other rules have been described, e.g. by Jamali and Jinlin (2006)). Next, a Monte Carlo Simulation (MMC) was performed, which included the following steps: 1. Adjustment of theoretical distributions to measurement data: gamma, Burr, log-normal. 2. Random generation based on the parameters obtained from step one of the simulation series of measurements according to the adjusted distributions (120 measurements). 3. Determination of the sample mean ̅ and sample ranges R (for the appropriate number of observations k = 3 in each sample). 4. Checking the number of signals for exceeding the control limits for the chart ̅ and R based on pre-determined control limits. 5. Determination of the probability value so that the measurement exceeds the control limits (Type I error). 6. Repetition of steps 2-5 20,000 times. 7. Determination of the average Type I error. 8. Determination of the ARL value.

Assumptions and study method
According to Mielczarek (2007), the Monte Carlo simulation method (MMC) is one of the most popular methods of building a stochastic simulation model that is used to study the behavior of the actual process. It is also worth stressing that in the analyzed courses of the process (e.g. production), some of its elements are characterized by a random course, which means that there is no 100% certainty that the further course of the phenomenon or process under investigation will be consistent with the adopted model. Mielczarek (2007), emphasizes that the key element in the construction of a stochastic simulation is a proper (the most probable) selection of an input scenario. The sampling method, i.e. matching the theoretical distribution, quasi-random sampling, bootstrap resampling, etc., may also have an impact on the obtained simulation result. In qualitative applications, the method of bootstrapping, in addition to the theoretical adjustment, may be useful, based on the assumption that the process will take place just like in the past. According to Kuhl et.al. (2006), this approach boils down to generating further data directly from historical data. Thanks to this, it is possible to avoid problems related to, for example, choosing the number of classes or finding the best match. However, despite the predominance of bootstrap sampling over matching theoretical distributions, in situations where a representative sample is not collected, this approach turns out to be ineffective (this was indicated by the results of the simulation carried out by the author). Due to the fact that available measurement data do not meet this criterion, in subsequent analysis, it was decided that the adjustment with theoretical distributions would be used. Chan and Cui (2003) suggest that it is possible to evaluate performance of Control using a Type I error, which determines the probability that a single signal will exceed predetermined control limits for the chart ̅ and R (signal with lack of statistical process regulation):

Type I error and Average Run Length
According to Govindaraju (2005) for Shewhart control charts with control limits within the ± 3σ variation range, the α probability is assumed to be constant, as opposed to, e.g. the CUSUM card, where small fluctuations of this size are acceptable. However, α cannot be very variable for any procedure for the designation of control charts. For controllable processes, α should be small or, if possible, permanent. Montgomery (2009) indicates that for the controlled process, it is also possible to determine the average number of observations needed to detect process dysregulation referred to as the ARL index and expressed as the inverse of Type I error: Research carried out by Govindaraju (2005) shows that the expression (2) refers to the use of control charts in the case of production with long and very long production runs and has been referred to as ARL type B, while in the case of short and medium production runs, to determine the average number of observations needed to detect the process disruption, ARL type A should be used: where: s -length of the production run.

Case study
The analysis began by examining whether the measurements made are normal or approximately normal (the distribution of the data studied is unknown). For this purpose, the Shapiro-Wilk test at a significance level α = 0.05 was used (Tab. 1). Test results presented in Table 1 clearly indicate that the length of the pipe after forming cannot be described by normal distribution (p <α). For this reason, the control limits were determined using the skewness correction method (Figure 2), which was described, among others, by Karagöz and Hamurkaroglu (2012). In addition, basic descriptive statistics were also calculated ( Table 2).

Source: Authors
The determined skewness coefficient from Table 2 indicates that the distribution is skewed right, which means that in most cases the length of the pipe after forming takes values lower than the mean (405.06 mm).  (Figure 1) has two points exceeding the control limits (the upper one -measurement 17 and the lower one -measurement 38). For the adopted rule of assessment of process stability and analysis of the course of the process using two control charts (Figure 1), the process should be considered to run in an unstable manner (more on the interpretation of control charts can be found in the work of, Greber (2000)). Since there are single signals, according to the idea of the design method presented by Łuczak, Matuszak-Flejszman (2007), measurements were eliminated that exceeded the control limits and were recalculated (Figure 2). In this way, control limit values have been obtained for the control process. Figure 2: Chart ̅ and R for the length of the pipe after forming determined using the SC method after removing the signals (Matlab)

Source: Authors
The next stage of the analysis included the adjustment of the theoretical distributions based on empirical data, which consisted of the results of the measurement data (pipe length after forming). Due to the fact that the measurement data show the nature of a skewed distribution, the following distributions were used: gamma, Burr and log-normal. The probability density of the adjusted distributions together with the mean value (μ) and the standard deviation (σ) are presented in Figure 3. It should also be emphasized that the gamma and log-normal distribution curves overlap and constitute the best match, due to the fact that the position parameters μ and the shape σ coincide with the histogram. Next, using the MMC, another series of one hundred and twenty-three-sample samples were generated separately for the gamma distribution, the Burr distribution and the log-normal distribution. Then, the mean and sample ranges were counted. Next, the amount of points by which the control limits are exceeded on the chart ̅ and R, respectively, was noted, and then the probability of the appearance of a false signal on the control chart was calculated; also, ARL type A value was determined for s = 3,000 (Tab. 3). At this point it is worth mentioning that according to Govindaraju (2005), in the case of performing several simulations, values provided in Table 3 are constant. It is also visible, that for both gamma and log-normal distributions, practically the same results were obtained (slight differences result from the fact that rounding of all determined values to two decimal places was performed), which means that the measurement data may have the character of these distributions. In the case of the Burr distribution, there was a slight difference between the Type I error for the mean chart ̅ compared to other control charts, in contrast to the value of this error for the R chart.
In Table 3, it can be read that, for example, for the gamma distribution for a Type I error mean chart, the error amounts to 0.0156, which means that, on average, a false signal will appear about 15-16 times per 1000 measurements. The ARL value for this chart, in turn, suggests that in order to detect the process deregulation, an average of 64 three-element samples should be taken. In addition, for the range chart, Type I error amounts to 0.0058, so a false signal will appear about 5-6 times per 1000 measurements, which entails the need to take 172 three-element samples so that the process's deregulation is detected. It can be concluded that the control chart presented in the paper works better when data can be described using a gamma or log-normal distribution, which can be an additional criterion confirming the appropriate selection of the distribution to the presented data. This, in turn, in a sense gives a more complete picture of the course and nature of the process, thanks to which it will be possible to perform its deeper analyzes, e.g. in terms of the qualitative ability.

Conclusion
The purpose of the article was to determine Type I error value and ARL type A for chart ̅ and R, for which the control limits were determined based on the skewness correction method. For this purpose, measurement data from a company producing car seat frames of the length of the pipe after forming, as well as the adjustment using selected skewed distributions (gamma, Burra and log-normal) were used, which made it possible to simulate the further course of the process. For the gamma distribution, the following Type I error values and ARL type A functions were obtained: