Abstract—This paper presents the analysis of the effect of
clustering the training data and test data in classification
efficiency of Naive Bayes classifier. KDD cup 99 benchmark
dataset is used in this research. The training set is clustered
using k means clustering algorithm into 5 clusters. Then 8800
samples are taken from the clusters to form the training and
test set. The results are compared with that of two Naive Bayes
classifiers trained on random sampled data containing 8800
and 17600 instances respectively. The main contribution of this
paper is that it is empirically proved that the training set
derived from clusters generated by k-means clustering
algorithm improves the classification efficiency of the Naive
Bayes classifier. The results show the accuracy of the Naive
Bayes classifier trained with clustered instances is 94.4% while
that of normal instances are 85.41% and 89.26%.
Index Terms—Network security, machine learning, classifier
evaluation, anomaly intrusion detection.
Uma Subramanian and Hang See Ong are with the Department of
Electronics and Communication Engineering, College of Engineering,
Universiti Tenaga Nasional, Jalan IKRAM- UNITEN, Kajang, Selangor,
43000, Malaysia (Corresponding author: Uma Subramanian, e-mail:
umas746@gmail.com).
[PDF]
Cite:Uma Subramanian and Hang See Ong, "Analysis of the Effect of Clustering the Training Data in Naive Bayes Classifier for Anomaly Network Intrusion Detection," Journal of Advances in Computer Networks vol. 2, no. 1, pp. 85-88, 2014.