IMPROVING DETECTION FOR INTRUSION USING DEEP LONG SHORT-TERM MEMORY WITH HYBRID FEATURE SELECTION METHOD

- Due to the importance of the intrusion detection system, which is considered supportive of enhancing network security. Therefore, we seek to increase the efﬁciency of intrusion detection systems through the use of deep learning mechanisms. However, intrusion detection algorithms still suffer from problems in the process of classiﬁcation and determining the presence and type of attack, which causes a decrease in the detection rate, an increase in the number of false alarms, and reduces system performance. This is due to a large number of redundant features that are not relevant to the dataset. To solve this problem, we propose a hybrid algorithm based on the use of the feature selection technique, which helps in reaching the goal optimally by choosing the best and most important features. It works by integrating three ways to reduce the number of features by deleting the static features that do not have much value from the information gained. This is done before the training stage by the deep learning model of LSTM as preprocessing for the CSE-CIC-IDS data set, which helps in improving the performance of the system by reducing the processing time and increasing the detection rate and accuracy. The results of the experiment showed a high accuracy of 99%


I. INTRODUCTION
The wide growth in technology and applications and the emergence of the Internet of things have been accompanied by difficulties and security breaches [1].Which requires the need to monitor the network and detect attacks and prevent intrusions, where an intrusion detection system plays an important role in overcoming security breaches [2].By analyzing traffic and identifying suspicious activities and preventing them [3].There are different methods for detecting intrusion in the network, the most common of which are signature-based detection, which relies on matching the signature of known attacks with traffic; and anomaly-based detection, which monitors the normal behavior of the network under normal conditions without attacks and distinguishes activities that deviate from normal behavior [4].Deviation-based detection is better for its ability to monitor traffic and detect new attacks as a second level of protection in addition to firewall and authentication methods that prevent unauthorized access to the system [5].The intrusion is detected in the network based on identifying the traffic by extracting its useful features and categorizing it into normal traffic or attack using a model that adopts one of the machine learning algorithms [6].
Given that the data of the Internet of Things is wide-ranging, it requires the use of deep learning, which is more efficient in dealing with large Datasets [7].Especially the LSTM model for its ability to solve the vanishing scaling problem that appears in the model of traditional RNNs and linking communication records [8].It is worth noting that many features affect the accuracy and detection time of the model, as it expands the scope of the search in IDSÂ [9].The use of feature selection methods is an excellent solution to exclude less relevant, fixed, repetitive, and useless techniques to reduce the This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).dimensions of the data used to choose and classify the intrusion detection model, which helps improve performance [10], especially with data that contains a large number of features as in CSE-CIC-IDS2018.Feature selection models include three main types: the wrapper feature selection model, the filter model, and embedded technologies.Whereas the wrapper feature selection model evaluates the learning models to find ideal properties and takes a long computation time, which increases the possibility of getting the most lethal [11].
While in the embedded techniques model, the properties are selected in each iteration during the training phase and do not take a long time to calculate and help reduce the occurrence of the most lethal events [12].In the filtering model, a special scale is used to make a subset of the features, and it is fast in calculating, and the percentage of getting the most lethal is low [13].The filtering method based on the ratio of gain of information to intrinsic information is one of the best ways to improve deep learning models in addition to removing constant features and reducing dimensionality using constant features, quasi-constant features, and mutual information.

II. LITERATURE SURVEY
Given the importance of the topic of network traffic analysis and the detection of breaches in it, several recently published studies have addressed this topic.In addition to improving the business model through data reduction and feature selection, in this section, the most important works that mention this topic are listed.
Ismael R. et al. [3] proposed a system based on the use of a deep neural network (DNN) with the use of a feature selection method known as Binary Particle Swarm Optimization (BPSO) to improve the performance of the model.The performance of the model was tested on the CSE-CIC-IDS2018 dataset.The result of the model test showed an accuracy of 95%, faster processing, a good detection rate, and fewer false alarms.Farhan, R. I. et al. [14] proposed a DNN-based intrusion detection system combined with a hybrid feature selection algorithm including two-particle optimized BPSO (BPSO) and correlation-based (CFS) to improve model performance.
And it helped to solve the problem of selecting features efficiently and with 95% accuracy with little processing time and a high detection rate.Alahmed S. I. et al. [15] proposed an intrusion detection system based on the use of generative adversarial networks (GANs) that helps provide better protection against adversarial perturbations.The Random Forest classifier was used, and feature selection methods such as principal component analysis (PCA) and recursive feature elimination (Rfe) were used to reduce data dimensions and enhance system resilience.The model was tested using the CSE-CICIDS2018 dataset, with an accuracy of 99.9%.Laghrissi F. et al. [16] The paper presents the use of the LSTM model as a deep learning model that detects attacks with the use of feature selection and dimensionality reduction techniques represented by PCA (principal component analysis) and Mutual Information (MI) to improve the performance of the model.It was concluded that after applying the model to KDD99 data, PCA achieved a high accuracy rate.In the training and testing phases of binary and multiplayer ratings.

Megantara,
A. et al. [17] proposed a hybrid model for machine learning that combines a supervised model for feature selection and an unsupervised model for data reduction.It identifies the important features that are strongly relevant to the data This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).ISSN:2222-758X e-ISSN: 2789-7362 using a decision tree, which helps to remove redundant features and distinguishes the local outlier factor technique.The model was tested on the NSL-KDD dataset and achieved a high accuracy of 99.89% in detecting R2L attacks.Ashiku, L. et al. [18] suggest an adaptive and flexible system for intrusion detection and classification.He focused on exploiting deep neural networks (DNNs) in facilitating IDS and discovering previously known and undiscovered features.Where, it has been tried to close the intrusion ports for intruders on the system and reduce penetration and test the system on a UNSW-NB15 dataset that represents real-time data to prove the accuracy of the model.Lin et al. [19] viewed a proposed system to detect anomalies using LSTM long-term memory and Attention Mechanism (AM) to increase network training performance.The CIC-IDS 2018 data set has been used to train the proposed form, and the analysis of the results has mentioned the accuracy as 96.22%, the detection rate at 15% and the recall rate at 96%.

Yuyang
Zhou et al. [20] proposed a framework that combines learning and feature selection that works for intrusion detection.
A proposed CFS-BA algorithm was used to reduce the dimensions.Tested on three CIC-IDS2017 datasets, NSL-KDD and Aegean WiFi Intrusion Dataset (AWID), the accuracy rate of 99.9% and 99.5% has been mentioned.

III. METHODOLOGY
This section introduces the design of a hybrid model for network intrusion detection that uses deep learning techniques and feature selection methods as preprocessing stage, including constant features, quasi-constant features, and mutual information, which overcomes the high dimensions of network traffic content and reduces the features in the CSE-CIC-IDS2018 dataset that is used to test the model.This helps reduce training time and preserve the accuracy of the single LSTM model without pretreatment, which is 99.83

A. Long short-term memory (LSTM)
Deep learning is a more effective and accurate model for detection compared to machine learning models, especially with large and complex datasets [21].It uses multiple hidden layers for processing that help increase accuracy and reduce costs by extracting features automatically instead of the method of feature engineering in machine learning (ML) [22].
The LSTM model represents the most important deep learning model as it solves the issue of long-term reliance problem that appears in the RNN model with serial data by distinguishing the current traffic and past traffic of the network and remembering it for a long time [23].Whereas the hidden layer in RNN is simple and it is only tanh layer, while LSTM has four hidden layers as in Fig. 1, and it includes an input gate layer, an output gate layer, and a forget gate layer in addition to the main layer, and it has feedback connections [24], [25].Table I shows the hyperparameters of the proposed LSTM model that help to avoid overfitting.
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).to use a hybrid method to reduce the dimensions and choose the features that are relevant and have the most relationships with the dataset, and it includes a constant feature, a quasi-constant feature, and utual information.

C. Constant Feature
The method of choosing features using constant features is one of the easiest types of filter methods used to delete fixed features.Where the values of these features do not show any difference for all recordings in the data set.These features are isolated and deleted as they are not useful in the training process, and take a lot of time.It is an easy and efficient way to reduce the dimensions of the data group and improve performance.10 features were excluded and deleted by applying this method in our model to the CSE-CIC-IDS2018 data set because it is not useful in the training process.
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).One of the easiest filtering methods used to reduce dimensions and remove the Quasi-Constant feature.It is the method of selecting and removing Quasi Constant features as they are not useful for ranking and depend on the value of the threshold limit.Where if the minimum value specified for the threshold limit is 0.01 in our model applied to the CSE-CIC-IDS2018 dataset, it was found that only one feature was omitted because that feature was 99% of its recordings the same and had the same value from the dataset.The programmer can change the value of the threshold limit because it is used to determine the similarity ratio of the call to the semi-fixed feature and delete it.When the limit value was changed to 0.98 in our model, eight features were identified and omitted, as they were considered nearly the same.Changing the number of features omitted is due to a change in the threshold limit, which determines the measure of similarity.The Quasi-Constant feature method is an easy and effective way to reduce the dimensions of a data set and improve performance.

E. Mutual Information (MI)
The mutual information method is a method of calculating the statistical dependence between two variables and measures the amount of information for each feature in the data set.It helps to know the important features that affect the outcome, and they are the features that have a high degree of MI (which represents the amount of knowledge of one variable of uncertainty in another variable, and by increasing the value of MI, the uncertainty decreases).While it excludes features that have a low MI score or a value of zero, because that means that there is no relationship between these variables, which reduces the value of this feature.It is similar to the concept of correlation, but it is more general in that it does not represent a linear correlation.This method works better with discrete classes and values.The concept of MI is related to the concept of entropy of the random variable, which contributes to knowing the amount of information expected in the random variable.It is one of the basic ideas in information theory and is represented between the two variables Z | W | and is denoted I(Z; W).Cover and Tomas defined it [26]: p zw (z, w) log p zw (z, w) p z (z)p w (w) = E P zW log P zW P z P w p z (z) and p w (w) are the marginals: This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).to improving performance by encoding and normalizing features and removing static and useless features are performed using feature selection methods that include (Constant Feature, Quasi Constant Feature, and Mutual Information) methods, which help reduce the dimensions and features in the dataset, which helps increase efficiency and reduce time.Fig. 2 shows the flowchart for the proposed model.

G. Real Dataset (CSE-CIC-IDS2018)
The CSE-CIC-IDS2018 dataset is one of the most important real data sets used in the field of intrusion detection and represents a transformation from a static data set such as NSL-KDD to a dynamic dataset.This data is taken from the Amazon Platform (AWS) by Communications Security Corporation (CSE) and the Canadian Cyber Security Institute (CIC) and represents real-time network traffic [27].It is considered one of the most reliable data sources for evaluating intrusion detection models based on network anomalies [14].This data contains 16,000,000 instances collected in ten days and includes the latest attacks ten classes of attacks according to the percentage of detection in the data as shown in table II: Benign, Bot, FTP BruteForce, SSH-Bruteforce, DDOS attack-HOIC, DDOS attack-LOIC-UDP, DoS attacks -GoldenEye, DoS Attacks-Slow HTTP Test, Intrusion and Web attacks [28].The original dataset contained 80 features.There are some features that have little effect on interpreting the behavior of data and traffic, whether it is normal or not.Therefore, these features, such as the timestamp feature and IP addresses, that do not help in training the neuron to detect errors and intrusions are deleted, so we use 78 features from the original number of features.It is divided into two types; one is This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).for training and 35% for testing, and then evaluating the results and performance of the model using evaluation scales.

A. Used Metrics to Evaluation
This paper used multiple metrics to evaluate the performance of the model; namely, the accuracy, loss, precision, recall, confusion matrix, and F1 Score are representations of the ability to classify samples correctly in a model.2) True negative (TN) expresses that a normal input is properly classified as a normal input.
3) False Positive (FP) is incorrectly classifying a normal entry as an attack.Accuracy is the ratio of correctly classified samples to the total number of samples.Accuracy is inversely proportional to the false alarm rate (FAR).The higher the accuracy, the lower the false alarm rate.Fig. 3 shows the accuracy measurement in the training and testing phases.The loss function is the variation between the expected and actual output.Fig. 4 shows the loss measurement in the training and testing phases.The precision is the ratio of the predicted positive samples to the total number of positive samples.The recall represents the ratio of the predicted positive samples to the total number of samples.The confusion matrix is a graphical representation that summarizes the performance and accuracy of the classification process, illustrating true and false positive values and giving an idea of the errors that the model makes and how to correct them.where each row in the array represents the status of an expected class and each column represents the status of an actual class.F1 Score It is an important measure in the case of data with varying class average recall and precision.Predict natural and attacking packets in network traffic.Fig. 5 shows the confusion matrix.III.This is used as a preprocessing to reduce the features and choose the most important features, where 50 features were selected out of 80 features in the data set, which reduces training time.
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).Our used model showed an accuracy of 99.83%, which is better than the rest of the previous models.The use of a hybrid model consisting of three methods for selecting features helped improve the model.The learning model is represented by LSTM, which is considered one of the best learning models.

VI. CONCLUSION
In this paper, a network intrusion detection system using deep learning technology is proposed.Where the LSTM model was applied to build the neural network.To improve and support the performance of the model, a pretreatment consisting of a hybrid method was used that combines three types of feature selection methods (Const-Feature, Quasi Const-Feature, and MI).It defines the most relevant and non-redundant features that support the detection method applied to the CSE-CIC-IDS2018 real dataset.This helped to maintain good accuracy of up to 99.83%, reduce errors, and speed up the training process by reducing the number of features from 80 to 50.Finally, looking to the future, we plan to use a multi-layer model to increase detection and to test the model on another dataset to support the validity and suitability of the model.
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

Figure 2 :
Figure 2: Flowchart for the proposed Model ISSN:2222-758X e-ISSN: 2789-7362 normal, and the other is nine types of attacks classified according to the above-mentioned types as shown in table II with the percentages of each attack from the origin of the data.

F 1 2 ) 1 )
Score = 2 + T P 2 ⇤ T P + NP + F N (True Positive (TP) is the correct classification of the attack as an attack.

Figure 3 :
Figure 3: The Accuracy of Train stage and Test stage

Figure 4 :
Figure 4: Loss of Train stage and Test stage

TABLE II Volume
of data points in Attack Class and Ratio of it.

TABLE III The
Comparison Among our Model and Another Method