Data Normalization Techniques on Intrusion Detection for Dataset Applications
Intrusion Detection System (IDS) is an important security tool for safeguarding the network from both internal and external threats. Conventional IDSs employ signature-based methods or anomaly-based methods which rely on dataset for training and testing the system. KDD CUP 99 is one such widely used dataset. Artificial Neural Networks (ANN), Machine learning, Data mining, Evolutionary computing, Statistical methods, Computational Intelligence, etc., algorithms make use of this KDD CUP 99 dataset for testing. The dataset consists of symbolic, binary, numeric, and continuous features scattered in different range of values. In statistical methods such as Euclidean distance, the larger value dominates the distance measurement. In clustering algorithms, the larger values shift the cluster center. Such disadvantages could be overcome by ensuring uniformity to the dataset while retaining the exactness of the features mapped which could be achieved by a process known as Normalization. Data normalization is a data preprocessing stage which maps data from different ranges on to a common scale. In this paper, a detailed analysis of the existing various data normalization techniques that can be applied on KDD CUP 99 dataset is presented along with the illustration. From the analysis, it was found that different normalization techniques are suitable for different subsets of KDD CUP 99 dataset. The problem under investigation is to prove that the new dataset generated on application of various normalization techniques exhibits the same characteristics as that of the original KDD CUP 99 dataset. Also, the effect of data normalization techniques, viz., of Min-max, Z-Score, Log, and Sigmoid on the neural Network algorithm in terms of detection rate and false alarms were compared and it was experimentally found that the log and sigmoid data normalization techniques result in better detection rate.