The modelling of time-series and the evaluation of forecasts for the future: the case of the number of persons per physician in turkey between 1928 and 2010
Accepted Date: April 03, 2016
Objectives: Health professionals are very important for improving the health status of the society and maintaining a healthy life. The aim of the present study is to model the number of persons per physician via Box-Jenkins and exponential smoothing methods and trend models, to compare these models, and to make estimations for the future. ARIMA or Box-Jenkins models are the combinations of AR and MA models administered to the series differenced at degree d.
Methods: The research material consists of data regarding the number of persons per physician between 1928 and 2010. The data were obtained from STATISTICAL INDICATORS Journal published by the Turkish Statistical Institute. 1928-2010 the number of persons per physician data ARIMA, exponential smoothing, and then modeled by Moving Average methods for future studies (2020) model performance is evaluated.
Results: The goodness of fit criteria of the relevant models. It is seen that the ARIMA (0,1,0) model has the best values except for Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE), but it is the Holt model which has lower mean error.
Conclusion: All administrative works and functions such as planning, organization, management, and rearrangement of healthcare services should be based on the data/evidence to be provided by this institution. Likewise, the problems and effects of healthcare services should be evaluated based on the data of this institution.
Time series, Forecasting consistency, ARIMA models, Exponential smoothing methods, Estimation trend.
ARIMA: Autoregressive Integrated Moving Average; RMSE: The Square Root of Mean Square Errors; MAPE: Mean Absolute Percentage Error; MAE: Mean Average Error; MaxAPE: Maximum Absolute Percentage Error; MaxAE: Maximum Absolute Error; Norm. BIC: Normalized Bayesyan Information Criterion.
Health professionals are very important for improving the health status of the society and maintaining a healthy life. Therefore, the number of employees working in the field of health, their education, the place they receive education, and the units or departments where they provide service are of great importance. Effective and productive healthcare services require sufficient number of health professionals, their training in accordance with contemporary criteria, and their balanced distribution across the country through a good planning.
According to the statistical indicators (1923-2011) data published by the Turkish Statistical Institute (TUİK), a considerable progress has been made in the field of healthcare services and society’s health status in Turkey from the Republic period to the present time. While there were 6437 sickbeds in 86 establishments with bed in the first year of the Republic, the number of establishments with bed rose to 1198 and that of sickbeds increased to 192685 in 2005. In other words, while there were 5.1 beds per 10000 people in 1923, there came to be 26.7 beds per 10000 people in 2005 .
Data regarding the number of individuals per physician, which is an important indicator of development in healthcare services, indicate that there has been a regular fall in the number of individuals per physician in Turkey (i.e. healthcare services in Turkey have been going through a positive change both quantitatively and qualitatively). The number of persons per physician across Turkey was 12,841 in 1928, 12,217 in 1930, 2799 in 1960, 1088 in 1990, 693 in 2002, 591 in 2010, and 573 in 2013 (Table 1). The quantitative distribution of physicians in Turkey on the basis of province shows that Istanbul contains the biggest number of physicians. Of 95,190 physicians in Turkey in 2002, 20.2% served in Istanbul, 12.5% in Ankara, and 9% in Izmir. The number of physicians in these three provinces was 39,684. That is to say, 41.7% of the physicians in Turkey served in these provinces. The provinces having the fewest number of physicians were Bayburt, Hakkari, Tunceli,Ardahan, Iğdır, Şırnak, and Kilis, and the number of physicians in each one of these provinces corresponded to a very low percentage of the total number of physicians across the country like 1% .
Table 1: The number of persons per physician between 1928 and 2010.
The aim of the present study is to model the number of persons per physician via Box-Jenkins and exponential smoothing methods and trend models, to compare these models, and to make estimations for the future.
Material and Methods
The research material consists of data regarding the number of persons per physician between 1928 and 2010. The data were obtained from STATISTICAL INDICATORS Journal published by the Turkish Statistical Institute .
Autoregressive Integrated Moving Average (ARIMA) method, which is used for forecasting time-series events, was developed by Box and Jenkins . ARIMA modeling approach is limited to the assumption that there is linearity between the variables. On the other hand, researchers have developed alternative modeling perspectives for forecasting time-series events where linearity assumption is not fulfilled.
ARIMA or Box-Jenkins models are the combinations of AR and MA models administered to the series differenced at degree d. The essence of the Box-Jenkins method is the choice of an ARIMA model that is the most suitable one among various models based on the structure of the current data but contains limited number of parameters. As a whole, these models, which are non-seasonal, are represented as ARIMA (p, d, q).
In the models ,
p: Degree of autoregressive model,
q: Order of moving average model,
d: Degree of non-seasonal differencing.
The expression of ARIMA (p, d, q) model can be defined as indicated in equation (1):
Here: Parameter values for autoregressive operator; at: Error term coefficients; θq: Parameter values for moving average operator; Zt: Time series of the original series differenced at degree d.(2)
The first differences series is defined as given in the equation (2).
Wt=The first differences series,
Yt=The random variables subset of the original time series.
If the first differences series is not stationary, stationary is checked by differencing the first time series again. This is modeled as given in equation (3).(3)
When the degree of differencing is d=0 (that means that the original series is stationary), ARIMA model will be AR, MA, or ARMA model. Due to this feature, it can be said that ARIMA models incorporate all of the Box-Jenkins models .
Seasonal Box-Jenkins models are represented as ARIMA (p,q,d)(P,D,Q)s. Here, P is the degree of Seasonal Autoregressive (SAR) model; D is the number of seasonal differencing operations; Q is the order of Seasonal Moving Average (SMA) model; and s is the period. In a combined autoregressive moving average model, the future value of a variable is assumed to be a linear function of past observations and random errors . Seasonal ARIMA (p,q,d) (P,D,Q)s models ARIMA (p,d,q) models relationship is represented in Equation (4). They are SARIMA models .at (4)
The model establishment process involves certain repetitive steps . These steps are indicated in the flow chart given in Figure 1.
In determining the model, a model is selected from model classes such as AR, MA, ARMA, ARIMA, and SARIMA.
Then the parameters of the transient model are forecasted by use of efficient statistical techniques, and the standard errors of coefficients are calculated to test whether or not they are significant. In the last stage, compliance of the model is checked for forecasting. To this end, the autocorrelation function of the model is examined by drawing the graph of the autocorrelation coefficients of the errors of the transient model that is assumed to be compliant. If this function displays a particular shape, it is concluded that errors are not random. This kind of a finding means that the determined transient model is not compliant. Therefore, one turns to the second step again, and this process is repeated until the compliant model is determined through a new transient model. The model passing the compliance check is now ready to be used for forecasting .
Forecasting methods based on exponential smoothing and moving averages are also used in forecasting. Simple exponential smoothing method was derived from moving averages and is expressed as indicated in Equation (5) .(5)
Here, refers to the forecast value for the forthcoming period; α refers to smoothing coefficient (it takes a value in the range of refers to true index value in period t or new observation; and refers to former smoothed value. The most important point to consider at this point is the determination of α so that mean square errors are minimized. The seasonal ARIMA model or SARIMA model is an expanded form of ARIMA, which allows for seasonal factors to be reflected [10-13]. Holt’s two-parameter linear exponential smoothing method equation is indicated in Equation (6) .
Here, In addition, β and 1-β are the parameters of the method and take a value between 0 and 1.
Brown’s exponential smoothing method is expressed as follows:
t is a value observed at time Yt; t is a seasonal component; bt is the smoothing component of the trend of t; L is the number of periods in a season; Ft+m is one forecast ahead of m periods; m is the number of forecasted periods; α is parameter smoothing; β is seasonal smoothing parameter; and γ is the smoothing parameter of the trend .
Goodness of fit criteria of the obtained models is evaluated through comparison with one another. R2 is a commonly known criterion. It is the goodness of fit criterion of the linear model. It is also known as coefficient of determination. It is in the range of 0-1 and smaller values indicate that the model does not have a good fit for the data. Stationary R2 is a criterion that compares the stationary part of the model and the basic model. It is preferred when there is a trend or seasonal pattern. RMSE is the square root of mean square errors. It is used for indicating how different dependent series are from the level forecasted by the model. Smaller values show that model forecasting is better. MAPE refers to mean absolute percentage error, is independent of the units of the series, and thus can be used in the comparison of different series. MAE refers to mean average error and is expressed with the series’ own units. MaxAPE is the maximum absolute percentage error measure. It indicates the highest error occurring among the forecasted values, is expressed in percentage, and thus unit independent. It is a measure that can be used for the worst scenarios among the forecasts. MaxAE is the maximum absolute error and is expressed in the same unit as the dependent series. Norm. BIC (Normalized Bayesyan Information Criterion) is the general measure of the total fit of the model. This measure is used for making a comparison between different models when the series are the same, and smaller values indicate a better model [16-18].
MAPE values in the Tables 2 and 3 show that the best forecasting model is the Holt model among exponential smoothing and trend methods .
|Years||The Observed Number of Patients||Simple Exponential Smoothing||Holt Model||Brown Model||Linear Trend||Quadratic Trend||Exponential Trend||S curve|
Table 2: Exponential smoothing methods forecasts about the number of persons per physician between 2011 and 2020.
|Model||Simple Exponential Smoothing||Holt Model||Brown Model||Linear Trend||Quadratic Trend||Exponential Trend||S curve|
Table 3: The comparison of exponential smoothing and trend models.
The Table 2, Table 3 and Figure 2 demonstrates that forecasting values based on the simple exponential smoothing method are appropriate for the period between 2010 and 2015, but reliability of forecasts falls for the period between 2016 and 2020.
The Figure 3 presents a non-stationary display both in the variance and in the average. The Figure 4 takes the logarithm of the data but fails to achieve stationarity both in the variance and in the average. The Figure 5 differences at the first degree and takes the logarithm of the data and presents a stationary display both in the average and in the variance, though partly.
In this instance, the autocorrelation (acf) graph (Figure 6) forecasts the coefficient of the MA model while the partial autocorrelation (pacf) graph (Figure 7) forecasts the coefficient of the AR model.
Since the acf and pacf graphs of the models do not involve any diagram outside confidence limits, AR and MA models may be taken as 0.
The Table 4, Table 5 and Figure 8 indicates that the Box- Jenkins model method suggests that forecast values are appropriate for the years 2010 to 2016, but confidence interval gets some wider for the years 2017 to 2020, thereby leading to a fall in the reliability of forecasts.
Table 4: ARIMA (0,1,0) Model fit statistics.
|Years||The number of observed patients||The number of persons per physician||Lower limit||Upper limit|
Table 5: Forecast values regarding the number of persons per physician by year for the Box-Jenkins model.
The Table 6 illustrates, in summary, the goodness of fit criteria of the relevant models. It is seen that the ARIMA (0,1,0) model has the best values except for mean absolute percentage error The Figure 9 illustrates the graph of observed number of patients in three years and forecasts about the number of persons per physician in three years for the Holt model of the exponential smoothing method and the ARIMA (0,1,0) model of the Box-Jenkis method.
Table 6: The Goodness of fit criteria of the exponential smoothing and ARIMA (0,1,0) models.
The Figure 9 indicates that the forecasts by the ARIMA (0,1,0) model of the Box-Jenkis method are closer to the observed number of patients in comparison to the forecasts by the Holt model of the exponential smoothing method. However, since the difference is not too big, it can be said that the forecasts of both methods are good.
This study aims to forecast the number of persons per physician in the future through predictive analysis by producing different models and determining the best models in this matter. Among the exponential smoothing models, the best predictive one was seen to be the Holt model. Among the Box Jenkins models, the best predictive one was seen to be the ARIMA (0,1,0) model.
The comparison of the above-mentioned best predictive models with one another was made based on the comparison of the forecast values of the models and the observed number of persons per physician in the first 3 years (2011, 2012, and 2013) and goodness of fit criteria. The Holt exponential smoothing model was found to be the best predictive model in terms of goodness of fit criteria in that it had a lower mean error in comparison to the other model. On the other hand, the ARIMA (0,1,0) model was seen to be the best predictive model in terms of the closeness of the forecasts regarding the first 3 years to the reality. The fact that the forecasts are quite close to the observed number of patients shows that these techniques can be used for forecasting the patient volume of hospitals and the sufficiency of staff. In this regard, more right and reliable policies may be developed for the health care industry through forecasts for the future.
According to the activity report data of the Ministry of Health for the year 2012, 124,219 physicians worked in Turkey at the end of 2012. Of these physicians, 68,262 were specialist physicians; 35,739 were practicing physicians; and 20,218 were physician assistants. The Health Transformation Programme has facilitated the access of the patients to the doctor. The number of cases of consulting a physician per person was 3.2 in 2002 but had risen to 8.3 by 2012. The total number of physicians in Turkey was 124,219 in 2012 when the number of physicians per one hundred thousand people was 165; that of practicing physicians was 48; that of specialist physicians was 90; that of dentists was 27; that of pharmacists was 34; and that of midwives and nurses was 232. Medical faculties have 45,732 registered students and 10,440 faculty members. The number of students per faculty member is 4.3. While the number of physicians per 1000 people is 3.3 in average in Europe, this figure is around 1.6 in Turkey. It implies a considerable physician shortage. On the other hand, the quotes of medical faculties in Turkey have been increasing rapidly. The total quote was 6492 in 2008, but went up to 8453 in 2012. Proper steps should be taken for a correct planning of the quantity and quality of physicians in Turkey.
Another important recommendation is that an independent institution should be responsible for the country-wide organization of the health information system. All public and private institutions, organizations, and people operating or working in the field of health should ensure data flow to this institution. Officials should be appointed and units should be set up to ensure such data flow in institutions and organizations of a size bigger than a specific size. Legislative regulations should be introduced to accelerate bureaucratic processes in this matter. All administrative works and functions such as planning, organization, management, and rearrangement of healthcare services should be based on the data/evidence to be provided by this institution. Likewise, the problems and effects of healthcare services should be evaluated based on the data of this institution.
- Box GEP, Jenkins GM. Time Series Analysis, Forecasting and Control, San Francisco: Holden-Day; 1976
- Işığıçok E. Causality Tests in Search of Relationships between Variables and an Application Testing. Doctoral Thesis. Uludağ University Institute of Social Sciences; 1993
- Box GEP, Jenkins GM, Reinsel GC. Time series analysis: Forecasting and control, 3rd ed. Englewood Cliffs, NJ: Prentice Hall; 1994.
- Brockwell PJ, Davis RA. Time Series: Theory and Methods, 2nd ed.: Springer-Verlag; 1991.
- Makridakis SG, Wheelwright SC, Hyndman RJ. Forecasting: Methods and applications, 3rd ed. New York: John Wiley and Sons; 1997.
- Yaman K, Sarucan A, Atak M, Aktürk N. Preparation of Data for Dynamic Scheduling Using Image Processıng and ARIMA Models. Gazi University J Faculty Eng Arch 2001; 16: 19-40.
- Box GBP, Jenkins GM, Reinsel GC, Liu LM. Time Series Analysis, 4th ed. Pearson Education; 2009.
- Kam HJ, Sung JO, Park RW. Prediction of Daily Patient Numbers for a Regional Emergency Medical Center using Time Series Analysis. Healthc Inform Res 2010; 16: 158-165.
- Moosazadeh M, Nasehi M, Bahrampour A, Khanjani N, Sharafi S, Ahmadi S. Forecasting Tuberculosis Incidence In Iran Using Box-Jenkins Models. Iran Red Crescent Med J 2014; 16: e11779.
- Soni K, Kapoor S, Parmar KS, Kaskaoutis DG. Statistical analysis of aerosols over the Gangetic–Himalayan region using ARIMA model based on long-term MODIS observations. Atmos Res 2014; 149: 174-219.
- Soni K, Parmar KS, Kapoor S. Time series model prediction and trend variability of aerosol optical depth over coal mines in India. Environ SciPollut Res 2015; 22: 3652-3671.
- Commandeur JJF, Koopman SJ. Introduction to State Space Time Series Analysis. Oxford University Press; 2007.
- Gardner ES, Exponential smoothing: The state of the art. J Forecasting 1985; 4: 1-28.
- Irmak S, Köksal CD, Asilkan Ö. Predicting Future Patient Volumes of the Hospitals by Using Data Mining Methods. Int J Alanya Faculty Business 2012; 4: 101-114.
- SPSS. Clementine11.1 User’s Guide, Integral Solutions Limited, Chicago, IL., 2007.
- SPSS. Clementine11.1 Node Reference, Integral Solutions Limited, Chicago, IL, 2007.
- Helfenstein U. Box-Jenkins modelling in medical research. Statistic Meth Med Res 1996; 5: 3.