Estimation Parameter d in Autoregressive Fractionally Integrated Moving Average Model in Predicting Wind Speed

AbstractWind speed is one of the most important weather factors in the landing and takeoff process of airplane because it can affect the airplane's lift. Therefore, we need a model to predict the wind speed in an area. In this research, the wind speed forecast using the ARIMA model is discussed which has differencing parameters in the form of fractions. This model is called the ARFIMA model. In estimating differencing parameters two methods are considered, namely parametric and semiparametric methods. Exact Maximum Likelihood (EML) is used under parametric method. Meanwhile, four methods semiparametric estmation are used, i.e Geweke and Porter-Hudak (GPH), Smooth GPH (Sperio), Local Whittle and Rescale Range (R/S). The result shows the best estimation method is GPH with the selected model is ARFIMA (2,0.334,0).Keywords: ARFIMA, Parametric Method, Semiparametric Method. AbstrakKecepatan angin merupakan salah satu faktor cuaca yang penting dalam proses pendaratan dan tinggal landas pesawat karena dapat mempengaruhi daya angkat pesawat. Oleh karena itu, diperlukan suatu model untuk memprakirakan kecepatan angin di suatu wilayah. Artikel ini membahas prakiraan kecepatan angin dengan menggunakan model ARIMA yang memiliki parameter differencing berupa bilangan pecahan. Model ini disebut model ARFIMA. Pada estimasi parameter differencing terdapat dua metode yang digunakan pada penelitian ini, yaitu metode parametrik dan metode semiparametrik. Metode parametrik yang digunakan adalah Exact Maximum Likelihood (EML) dan empat metode semiparametrik yang digunakan adalah Geweke and Porter-Hudak (GPH), Smooth GPH (Sperio), Local Whittle dan Rescale Range (R/S). Hasil analisis menunjukkan pada kasus ini metode estimasi terbaik adalah GPH dengan model terpilih adalah ARFIMA(2,0.334,0).Kata kunci: ARFIMA, Metode Parametrik, Metode Semiparametrik.


INTRODUCTION
Wind is one of the weather elements that has important role in determining the weather and climate conditions in a particular area. Wind energy benefits can be obtained depending on the wind speed and geographical conditions of an area. Several studies has been conducted to determine the 2. METHODS

1. Long Memory Process
A time series is said to be a process with long-term memory if the autocorrelation function decays slowly to zero, showing that between far apart observations are still strongly correlated [12]. This condition of long-term memory can be seen from the value of Hurst (H) which can be obtained from the statistic R/S [12]. The Hurst value is determined by computing the mean ̅ = 1 ∑ =1 , adjusted mean = − ̅, cumulative deviation * = ∑ =1 , range of cumulative deviation If the computed H is equal to 0.5 then the series are random, if 0 < < 0.5 then the series shows short-term memory, and if 0.5 < < 1 then the series shows long-term memory.

ARFIMA model
Autoregressive Fractionally Integrated Moving Average (ARFIMA) model is one of the most appropriate model for time series data with long-term memory that has been developed by Granger and Joyeux [9], and also Hoskings [7]. ARFIMA ( , , ) can be expressed as follows [13]: where { ′ 0««« } is white noise process, ( ) is AR polynomial equation of order , ( ) is MA polynomial equation of order , and (1 − ) is fractional difference operator.
According to Hoskings [7], fractional difference operator on ARFIMA( , , ) is a generalization from an infinite binomial series [14]: Where B is a backward shift operator, Γ( ) is a gamma function, and ( ) = is a binomial coefficient. Several characteristic of fractionally integrated series for various values of d are as follow [15]: a. If = 0, then the process shows autocorrelation function with exponential decay as an ARMA process, b. If ∈ (0 , 0.5), then the series is correlated with long memory having positive dependency between distant observations denoted by positive autocorrelation and slow-decaying and also have moving average representation of infinite order, c. If ∈ (−0.5 , 0), then the series is correlated with long memory having negative dependency denoted by negative autocorrelation and slow-decaying and also have autoregressive representation of infinite order, d. If | | ≥ 0.5, maka proses panjang tidak stasioner.

3. Estimation of Fractional Difference Parameter with Parametric Method
Parametric method is able to estimate all parameters in the ARFIMA model in one step [16]. In this study, the parametric method used is Exact Maximum Likelihood (EML) method introduced by Sowell (1992). This method uses the likelihood principal to estimate , , dan in the ARFIMA model. Given the general form of ARFIMA ( , , ) model as follows: where ~(0, 2 ). The probability density function of = ( 1 , 2 , . . , ) is defined as: The likelihood function can be written as follows: Estimation of , , , can be obtained by maximizing equation (1) and this is referred as maximum likelihood estimation [4].

4. Estimation of Fractional Difference Parameter with Semiparametric Methods
Estimation of fractional difference parameter with semiparametric methods is carried out through two steps. The first step is estimating the fractional difference parameter (d) and the second step is estimating AR and MA parameter [16]. The most popular semiparametric method used is Geweke dan Porter-Hudak (GPH). GPH method is performed by forming spectral density function or spectral equation of ARFIMA model through spectral regression equation ( ( )) with log-periodogram as the dependent variable and the series of autocovariance as pair of Fourier transformation: step of GPH method is build ARMA model by using Box-Jenkins method after the estimated fractional difference parameter is obtained from the GPH method (̂ℎ).
The next semiparametric method is called Sperio method introduced by Reisen and Lopes (1999). It is a modification from GPH method by replacing the periodogram with the smoothed spectral density. Reisen and Lopes (1999) proposed to use Blackman-Tukey type of estimation for the spectral density [17]: This estimated smoothed periodogram is denoted by ̂.
The third semiparametric method is Local Whittle estimation that is also commonly used for estimation of fractional difference parameter. This method was proposed by Kuensch (1987) and was modified by Robinson (1995). Local Whittle estimation of fractional difference parameter, denoted by ̂ℎ , is obtained by maximizing the likelihood of log Local Whittle on Fourier frequency that goes to zero [18]: The last semiparametric method considered in this study is Rescaled Range Statistic (R/S) or often called as Hurst statistic test. The last semiparametric method is Rescaled Range Statistic (R/S) or Hurst test. Besides being used to see indication of long-term memory in time series data, R/S statistic can also be used to estimate the fractional difference parameter with the following equation:

5. Model Diagnostic Checking
Diagnostic checking is carried out to check the adequacy of fitted model to the observed data in order to reveal model inadequacies and to achieve model improvement. The diagnostic checking is done by observing if the model residual follows a white noise process or not, that is checking if the residuals are independent by using Ljung Box-Pierce test [4] and also checking if the residuals are normally distributed by using Jarque-Bera test [16].

6. Selection of Best ARFIMA Model
Selection of best fitted model can be determined by Akaike Information Criteria (AIC) [19]. The AIC values takes into account how well the model fits the observed data and the number of parameters used the fitted model. It can be computed by using the following formula: where = + + 1 if the model contains intercept and = + if the model does not cointain intercept [19]. A good model is considered and expected to be the best model for fitting data in sample and at the same time it is also a good model for forecasting out sample data. MAPE (Mean Absolute Percentage Error) is one of many criteria to test for the validity of the fitted model and will be used in this study. It is defined as the mean of the sum absolute deviation of predicted and observed value dividing by the observed value [20]: where is the actual series, ̂ is the predicted series, and N is the number of data sample. Figure 1 displays the trend of wind speed at Soekarno-Hatta airport on a daily basis. It can be seen that the series are not stationer in variance as the fluctuations of the data tend to change over time or are not constant. A formal test is performed by using Box-Cox transformation to evaluate if transformation is needed to make the variance stationary in time.  Figure 2 indicate that the rounded value of optimal is not close to 1 and the range of lower and upper limit do not contain 1. According to this plot, the data needs to be transformed using square root transformation of (√ ). Afterwards, the stationary test in the mean is also performed by using ADF (Augmented Dickey Fuller) test. The result shows that we have strong evidence to reject the null hypothesis of non-stationary data since the p-value is less than 0.05 (p=0.01). Therefore, we can conclude that the wind speed data is already stationary in mean. To identify if there is a long-term dependency, Hurst (H) statistic is calculated to the observed data.

RESULTS
The computed = log( / ) log ( ) = 0.738 indicates that the transformed wind speed data has long-term dependency, thus ARFIMA(p,d,q) is the most appropriate model to be fitted to the observed data.
Diagnostic model is performed to the selected model by evaluating the assumption of the residu als that follow normal distribution and whether they are independent. The Jarque-Bera test as indicat ed in Table 5 shows that the residuals do not violate the normality assumption since the p-values are greater than 0.05. The Ljung-Box test to examine the assumption of independent indicates that the r esiduals do not correlate since the p-value is greater than 0.05. Thus, the residuals follow white noise process. Table 4. Comparison of ARFIMA model using semiparametric method.  Diagnostic model checking reveals that the candidate model based parametric and semiparametric methods show a good fit model since none of the assumptions are violated. Next, we examine all these five models in terms of accuracy by using MAPE. Table 6 shows that the smallest MAPE value is for ARFIMA(2, , 0) with ̂ℎ = 0.334. This model has MAPE of 17.760, showing that the model has relatively good forecasting ability.  where (1 − ) 0.334 can be written as: The results of the forecasted wind speed in Soekarno-Hatta airport from period of December 1 st , 2018 to December 14 th , 2018 using ARFIMA (2,0.334,0) can be seen in Table 7.
From the above equation, it can be seen that the wind speed at Soekarno-Hatta airport have long-ter m memory. This might be due to the tendency of repeated wind cycles over time. The forecasted val ues in the next 14 days in the beginning of December 2018 show very little increase in wind speed.