Statistical Modelling of Extreme Data of Air Pollution in Pekanbaru City

Air pollution is a phenomenon that is often discussed, especially regarding air quality in urban areas. This has become a major contributor to health problems and environmental issues in Asian countries, such as Indonesia, especially Riau Province. The event of forest fires is one of the many events that occurred in Indonesia, especially Riau Province which harmed the population of Indonesia and neighboring countries. The phenomenon of forest forestry generally occurs due to a shift in the season towards drought and can occur in areas prone to forest fires. Therefore, it is necessary to know the model of air pollution distribution by Particulate Matter (PM10) in Pekanbaru City. This study aims to obtain the distribution model for daily air pollution PM10 in Pekanbaru City from 2014 to February 2015. Data were taken from three stations i.e. Sukajadi Station, Tampan Station, and Kulim Station. Four distributions will be tested i.e. Log Pearson III distribution, Gumbel distribution, Generalized Pareto Distribution, and Generalized Extreme Value (GEV) distribution. We test the goodness of fit from these distribution using the Kolmogorov-Smirnov and the Anderson-Darling tests. The result shows that the Generalized Extreme Value (GEV) distribution model was better than the Log Pearson III, Gumbel and Generalized Pareto distribution models for modeling city air pollution data Pekanbaru with three stations namely Sukajadi, Tampan, and Kulim.


INTRODUCTION
Air is the most important substance after water in providing life on earth [1]. Allah the Almighty has created the air between the winds, as explained in the Quran Surah Ar-Rum (48): "Allah sends the winds that stir up clouds and then He spreads them in the sky as He pleases and splits them into different fragments, whereafter you see drops of rain pouring down from them. He then causes the rain to fall on whomsoever of His servants He pleases, and lo, they rejoice at it".
Air pollution can cause various kinds of diseases to humans such as lung disease, asthma, anemia, and other diseases. The main air pollutant gases are carbon monoxide, carbon dioxide, nitrogen oxides, nitrogen dioxide, particulate matter (PM10) and so on. PM10 is ash or dust less than 10 µm in diameter which can have a more serious effect on human, animal and plant health risks compared to larger particles that are generally formed from immovable sources such as vehicles (vehicle eczos) [2].
BNPB data also states, as many as 49.591 people in Riau suffer from smoke-related diseases such as upper respiratory tract infections (URI/URTI), pneumonia, asthma, eye, and skin irritation. In addition to causing illness, haze in Riau Province, especially Pekanbaru City, has disrupted community activities, such as all educational activities in Riau Province, especially Pekanbaru City, were stopped because of this pollution. One of the universities that stop academic activities was Sultan Syarif Kasim Riau State Islamic University for 4 days. Besides, visibility on the highway is only ± 200 meters, which can disturb the driver's activity.
Other impacts of air pollution are ozone depletion, smoke, acid rain and global warming [3]. Also, ash, smoke, fog, steam or other materials produced by air pollution can obstruct eyesight. Besides the impact on humans, negative impacts also occur on plants and animals. The impact on plants is to cause stunted plant growth, this is caused by obstruction of sunlight to get to the leaves so that the process of photosynthesis is reduced and the level of carbon dioxide uptake is reduced. In animals can cause interference with the respiratory system of animals. Animals that eat fluoridated grass and leaves will cause abnormal bone shape [2].
The problem of air pollution is a phenomenon that is often discussed. Air quality in the urban environment is increasingly tapered and is a major contributor to health problems and environmental issues in Asian. Some mathematical models have been widely used to determine the patterns of movement from air pollution data. In this paper, we will explore some statistical models using some distribution functions i.e. Pearson III log Gumbel generalized Pareto and Generalized Extreme Value (GEV) distribution that appropriate to determine the patterns of movement of PM10 in Pekanbaru City.

METHOD
The data used in the study is air pollution data particularly data related to particulate matter (PM10). This data is collected daily from January 2014 to February 2015 from Environmental services Pekanbaru city, with three monitoring stations, i.e. Sukajadi, Kulim, and Tampan. We use the distribution of Log Pearson III, Gumbel, Generalized Pareto, and Generalized Extreme Value (GEV) to model the distribution of this air pollution of particulate matter by using easyfit software. The steps in conducting this research are summarized as follows: I. Select Log Pearson III, Gumbel, Generalized Pareto, and Generalized Extreme Value (GEV) distributions: Log Pearson III distribution is one of the Pearson distribution family. Log Pearson III distribution refers to Gamma distribution. This distribution is very close to Generalized Extreme Value where it uses three parameters, i.e. scale, location, and shape parameters [4]. The probability density function of Log Pearson III distribution can be expressed as follows [5]: Gumbel distribution is firstly introduced by Emil Gumber, a mathematician from Germany. Gumbel distribution is a special case of extreme value distribution where the location parameter is equal to zero [7]. The probability density function of Gumbel distribution can be written as [6]: Generalized Pareto distribution is one of the continuous distributions that have a probability density function. Muraleedharan and Soares [7] define the density function of Generalized Pareto distribution as follows: In general, GEV distribution is used to model extreme data that used within the maximum range of specific periods such as on a daily, monthly, and yearly basis. In reality, maximum extreme data is really useful to be used as a reference in preventing the future extreme value [8]. Generalized Extreme Value (GEV) distribution has three parameters, they are scale (σ), location (µ), and shape (ξ) parameters. The probability density function of Generalized Extreme Value (GEV) is expressed as follows: The model is said to fit the data well if the statistic value of Kolmogorov Smirnov and Anderson-Darling tests are minima [9].

Descriptive Statistics for Air Pollution Data in Pekanbaru City
The number of air pollution can increase and decrease at any time. From January 2014 to February 2015, the number of air pollution in Pekanbaru city experiences a variation of decreasing and increasing from month to month. For more details, the following Figure 1 displays the number of daily air pollution in Pekanbaru City. Figure 1 shows that the number of daily air pollution in Pekabaru City from January 2014 to February 2015 for Sukajadi, Tampan, and Kulim stations where the summary of the descriptive statistics presented in Table 1. Table 1 indicates that the lowest number of air pollution occurs in Sukajadi station is 0.01 and reaches the highest to 617.24 in Sukajadi station as well. Furthermore, the means of air pollution in Sukajadi, Tampan, and Kulim stations are 58.55, 28.996, and 86.401, respectively with their corresponding standard deviations are 82.409, 68.977, and 78.42, respectively. Both Figure 1 and Table 1 gives us an overall description of the air pollution data in Pekanbaru city.

Parameter Estimation
The parameter estimation using the maximum likelihood method for Log Pearson III, Gumbel, Generalized Pareto, and Generalized Extreme Value (GEV) distributions are shown in Table 2

Modelling Air Pollution in Pekanbaru City
Based on Table 2, the air pollution model for Log Pearson III Distribution for the three stations are a. Sukajadi station We depict the Log Pearson III distribution for the three stations in Figure 2. Figure 2 shows the histogram of PM10 data and the line expresses the Log Pearson III distribution for three stations based on equation (1) We depict the Gumbel distribution for the three stations in Figure 3. We depict the Gumbel distribution for the three stations in Figure 3. Figure 3 shows the histogram of PM10 data and the line expresses the Gumbel distribution for three stations based on equation (4) - (6). From these figures, we can see that the line of the Gumbel model lower than the histogram of PM10 data.
We can depict the distribution for the three stations as shown in Figure 4. We depict the Generalized Pareto distribution for the three stations in Figure 4. Figure 4 shows the histogram of PM10 data and the line expresses the Generalized Pareto distribution for three stations based on equation (7)  We can depict the distribution for the three stations as shown in Figure 5. We depict the GEV distribution for the three stations in Figure 5. Figure 5 shows the histogram of PM10 data and the line expresses the GEV distribution for three stations based on equation (10) -(12). From these figures, we can see that the line of the GEV model more than the histogram of PM10 data.

The goodness of Fit Test
To examine the distribution models that fit the observed data among the four distribution models, we use Kolmogorov-Smirnov and Anderson-Darling tests. By using easyfit software, we obtain the results of the goodness of fit test as presented in Table 3, Table 4, and Table 5. According to Table 3, Table 4, and Table 5, it can be seen the statistic values from the Kolmogorov-Smirnov test and Anderson-Darling test. Based on these statistics, the GEV has the smallest statistic than the others. Therefore, we conclude that the best-fitted model distribution for air pollution in Pekanbaru city is GEV distribution.