IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM
DOI:
https://doi.org/10.11113/jt.v57.1523Keywords:
Missing data, expectation maximization, mean imputationAbstract
This paper presents a study on the estimation of missing data. Data samples with different missingness mechanism namely Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) are simulated accordingly. Expectation maximization (EM) algorithm and mean imputation (MI) are applied to these data sets and compared and the performances are evaluated by the mean absolute error (MAE) and root mean square error (RMSE). The results showed that EM is able to estimate the missing data with minimum errors compared to mean imputation (MI) for the three missingness mechanisms. However the graphical results showed that EM failed to estimate the missing values in the missing quadrants when the situation is MNAR.References
E. Acuna, C. Rodriguez. 2004. The Treatment of Missing Values and its Effect in the Classifier Accuracy. In Classification, Clustering and Data Mining Application. 639-648.
M. Janssen, Donders, A. R. T., Harrell, F. E., Vergouwe, Y. et al. 2009. Missing Covariate Data in Medical Research: to Impute is Better than to Ignore. Journal of Clinical Epidemiology. 63: 721-727.
R. Presti, E. Barca, G. Passarella. 2010. A Methodology for Treating Missing Data Applied to Daily Rainfall Data in the Candelaro River Basin (Italy). Environ. Monit. Assess 160: 1-22.
M. Firat, F. Dikbas, A. C. Koc, M. Gungor. 2010. Missing Data Analysis and Homogeneity Test for Turkish Precipitation Series. Sadhana. 35(6): 707-720.
F. V. Nelwamondo, S. Mohamed, T. Marwala. 2007. Missing Data: A Comparison of Neural Network and Expectation Maximization Techniques. Current Science. 93(11): 1514-1520.
M. Nakai. 2011. Analysis of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset. Int. Journal of Math. Analysis. 5(45): 2217-2227.
R. J. A. Little, D. B. Rubin. 1987. Statistical Analysis with Missing Data. Unites States in America: John Wiley & Sons Inc.
I. Mohamad. 2003. Data Analysis in the Presence of Missing Data. PhD Thesis, Lancaster University.
W. Y. Chin, Z. M. Khalid, M. K. Ho. 2011. Analysis of Repeated Measures via Simulation. Simposium Kebangsaan Sains Matematik ke-19 (SKSM 19), UiTM Pulau Pinang, 9-11 November 2011.
Y. L. Xia, P. Fabian, A. Stohl, M. Winterhalter. 1999. Forest Climatology: Estimation of Missing Values for Bavaria, Germany. Agricultural and Forest Meteorology. 96: 131-144.
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.