Abstract—In medical filed, missing data often existed, which maybe result in the bias of research results. Therefore, this study proposes a new imputation method, that is a nearest neighborhood method based on distance threshold to impute missing value. The proposed imputation method has two merits: (1) utilize distance threshold to adjust the optimal nearest neighborhood for estimating missing values, (2) the proposed method compares with other imputation methods in medical data missing values. This study collected the stroke dataset from the International Stroke Trial (IST) to verify the proposed method, the result shows that the proposed method is better than other imputation methods, it means that the proposed method can be effectively utilized in practical medical dataset.
Index Terms—Missing value, imputation, nearest neighborhood, stroke disease.
Ching-Hsue Cheng and Hao-Hsuan Huang are with Department of Information Management, National Yunlin University of Science & Technology, 123, section 3, University Road, Touliu, Yunlin, 640, Taiwan (e-mail: chcheng@yuntech.edu.tw, liang50641@gmail.com).
[PDF]
Cite:Ching-Hsue Cheng and Hao-Hsuan Huang, "A Distance-Threshold kNN Method for Imputing Medical Data Missing Values," Journal of Advances in Computer Networks vol. 7, no. 1, pp. 13-17, 2019.