EXPLAINABLE AI FOR NETWORK INTRUSION DETECTION: UNVEILING GRANULAR ATTACK PATTERNS AND INSIGHTS

24/3/2025
EXPLAINABLE AI FOR NETWORK INTRUSION DETECTION: UNVEILING GRANULAR ATTACK PATTERNS AND INSIGHTS

Abstract-Network intrusion attacks pose a significant threat to computer networks and cybersecurity. Detecting and understanding these attacks is crucial for maintaining secure systems. In this study, we use explainable AI (XAI) to enhance the interpretability of deep learning models for network intrusion detection at the neuron level, obtaining more granular insights. Utilizing the CIC-IDS 2017 dataset, we train a deep neural network (DNN) to detect malicious network activities. We then analyze the activations of critical neurons to achieve a finer separation of attack patterns, leading to a more nuanced intrusion detection system. This approach then correlates attack features with layer-specific neurons, identifying the most and least relevant features for different attack types. This method differs from current works by investigating different underlying mechanisms of model prediction of a single class at the neuron level and then attributing features to these sub-clusters, as opposed to macro or global explanation methods which fail to capture the intricacies of complex predictions.

Index terms- Explainable AI, Network Intrusion Detection, Deep Neural Networks

I. INTRODUCTION

Network intrusion attacks are a significant threat to the security of computer systems with the potential of severe consequences if not deterred [9]. Over the years cyber threats have become increasingly sophisticated, increasing the need for robust intrusion detection systems (IDS). Machine Learning (ML) based IDS methods have been shown to outperform traditional IDS methods [1], however, due to the black-box nature of deep neural networks there is a lack of the necessary interpretability required to fully understand and mitigate complex attacks. The lack of explainability and transparency poses challenges in obtaining trust and large-scale usage.

To address these issues, the area of explainable AI (XAI) has become ever important, offering methods for interpretability and insights into the internal mechanisms of more complex models [6]. Specifically in the area of network intrusion detection, XAI has the ability to reveal details on how attacks are detected, improving reliability and trustworthiness. While the use of XAI for IDS has immense promise, current works perform interpretation solely at the macro level, either on the model layers, or on the input [14], [12], [4], leaving much to be desired in regards of understanding model predictions.

In this paper, we propose a novel approach that uses XAI to improve the interpretability of deep neural networks (DNN) used for network intrusion detection. We use the CIC-IDS 2017 [18] dataset, which contains a variety of attack types and train an artificially coarse DNN to distinguish between benign

and malicious network activities while utilizing the true class label for verification of the obtained granular insights. With the binary task of this model, we investigate critical neuron activations, uncovering detailed patterns specific to different types of attacks.

Deep learning model layers typically have many neurons; however, it has been shown that only a few of them in the decision-making process [7]. Investigating only the critical neurons simplifies the complexity and makes the mechanism discovery more streamlined. This methodology involves identifying critical neurons using Model-X Knockoffs [5] and analyzing their activations through t-SNE [19] dimensionality reduction. This method reveals clusters that correspond to specific attack types, demonstrating how we can derive more granular insights from a coarser model architecture. Utilizing DBSCAN [15] we automate the clustering task where each cluster corresponds to an attack type. After finding the most relevant critical neurons for each cluster via the random forest feature importance technique [3], we then use activation maximization technique to determine the most influential features for those neurons and correlate these features with specific attack types, enhancing the interoperability of this intrusion detection system. We obtain insights of the critical neurons for each layer of the model.

As described in Figure 1, this paper contains the following components:

  1. Identify Critical Neurons: We utilize Model-XKnockoffs, a statistical tool, to find the critical neurons within each layer of the DNN. These neurons play an important role in the model’s predictive process.

  2. Uncover Underlying Mechanisms: By analyzing the activations of the identified critical neurons for various test samples, we use t-SNE for dimensionality reduction and DBSCAN for clustering in order to reveal the underlying mechanisms and patterns within the models prediction for a singular class.

  3. Neuron relevancy Analysis: We employ random forest feature importance to attribute key critical neurons for each cluster corresponding to each true label attack type.

  4. Attribute Features per Cluster: Activation maximization is used to determine the most and least relevant features for the chosen critical neurons leading to insights on which features from the CIC-IDS 2017 dataset are most and least relevant for each attack type.

  5. This method not only advances the current state of XAI in network intrusion detection, but additionally provides a novel framework for obtaining increased granularity and neuron level understanding for deep learning-based network intrusion detection.

A diagram of a clustering process

AI-generated content may be incorrect.

Figure 1: End-to-end framework of the XAI process on deep learning models for network intrusion detection

II. RELATED WORK

Intrusion detection systems monitor network traffic for malicious activities and play a key role in the area of cybersecurity. IDS can operate in various way including those that are based on signatures, recognized predefined patterns [17], or those that recognize a baseline of normal activity and detect anomalies. Recently, with the rise of deep learning these machine learning based methods are constructed utilizing previous data to predict potential intrusions [13], [10]. Current works use XAI to solely identify key features for model predictions. Methods such as SHAP (Shapley Additive Explanations) [11] and LIME (Local Interpretable Model-agnostic Explanations) [16] are used to determine which features are key for model predictions [14], [12], [4]. Other XAI techniques for network IDS are based on rule extraction. These techniques create human readable ’rules’ from the more complex AI model computations [2], simplifying the understanding of the model activity.

Identifying and analyzing critical neurons is an important step towards understanding model decision making processes. Critical neurons are those that play a major role in model decision process, and their activations across data samples provide a glimpse into the underlying mechanisms of the model. Yosinski et al. [21] investigates live neuron activations from videos to gather insights into how convolution networks work, while Vu et al. [20] uses Model-X-Knockoffs to investigate DNN’s underlying mechanisms from an information-theoretic viewpoint.

III. METHODOLOGY

A. Data and Model Training

The CIC-IDS 2017 dataset [18] is a comprehensive and commonly used dataset for IDS analysis containing over 2 million total samples. This dataset is designed to mimic real network traffic and contains the most up to date attacks, including DDoS, Brute Force FTP, Brute Force SSH, DoS, and Web Attack. Captured over the course of 5 days, this dataset contains traffic based on the activities of 25 users utilizing protocols such as HTTP, HTTPS, FTP, SSH, and email. Samples with missing features were removed, and a total of 78 features of the network traffic obtained from CICFlowMeter [8] were normalized and used. To enhance clarity of the analysis in addition to the original outputs, we have categorized the 78 features into the following subgroups: Packet Properties, Flow Properties, Flag Counts, Header/Payload Sizes, Bulk Transfer Properties, TCP Window Size, and Activation Periods. For model training, 80% of the dataset was used while the other 20% was used for the analysis.

The model being analyzed is a deep neural network with 4 hidden layers with 256, 128, 64, 32 neurons respectively. It has ReLU activation’s for the hidden layers and a Sigmoid output activation. Its purpose is binary classification with the intent to detect malicious network activity, encompassing all attacks from the CIC-IDS 2017 dataset. Although the DNN was trained on the entire dataset, this paper specifically investigates Portscan, DDoS, FTP-Patator, and SSH-Patator attacks due to their distinguishable nature and sizable sample counts.

B. Critical Neurons

The first step of this framework is to identify the critical neurons of each layer of the DNN. This is important, as the majority of the model’s decision making process is determined by the computations done by these neurons. In order to identify the critical neurons in the DNN we follow the NeuCEPTDiscover method [20].

A screenshot of a graph

AI-generated content may be incorrect.

Figure 2: (Top) The DNN model architecture iregards of layers, neurons per layer, and activations. (Bottom) The amount of samples per attack type in the CIC-IDS dataset.

The neural network f(·) has L layers, and for each layer l where l ∈ {1, 2, . . . , L}, let nl represent the number of neurons in the l-th layer and Nl represent the set of neurons in the l-th layer. This model’s task is binary classification where its output, y = f(x), is rounded such that y ∈ {0, 1}. The activations of layer l can be represented by z(l), where z(l) = fl((l-1)). The random variable representation of the activations at layer l is Zl, and a subset S of neurons activations is represented as ZSl. As computation propagates through the network, the activations at each layer l are determined by the activations of the previous layer l - 1, thus we have Z(l)=fl(Zl-1). The propagation of activations through this DNN can be modeled as a Markov chain, where each state represents the activations at a specific layer obtained from computations of the previous layer’s activations;

Z(l)Z(l+1)l ∈{1,2,…L}

Mutual information (MI) can be used to quantify the amount of information a set of activations ZSl provides regarding the activations of the following layer ZS(l+1)l+1. The critical neurons are those that have the highest mutual information with the output.

A black text on a white background

AI-generated content may be incorrect.

(1)

In order to find the critical neurons per layer, we can utilize a series of MI operations:

A black text on a white background

AI-generated content may be incorrect.

As this computational process is NP-hard, it is simplified by using layer activation and model output pairs; (ZlS,y) rather than (ZlS,ZSl+1l+1) for computing layer critical neurons. For this, we use the concept of Markov blanket. The Markov blanket of a node is the set of nodes that can be used to explain all impacts on that node, isolating it from any effects from the rest of the network. Provided a response variable T, the set Ml(T) is the markov blanket of T where MlT⊆ Nl, is the set of all neurons in l that explain T.

We use Model-X Knockoffs to perform variable selection while also controlling the false discovery rate. Given a large set of variables, the Model-X Knockoffs technique finds a subset of those variables that are most responsible for the response. For a matrix X and response variable Y , a knockoff matrix X is constructed to mimic the correlation structure of X but be conditionally independent of Y . The construction of knockoffs follows two key properties: swapping any subset of variables in X with corresponding variables in X  should not change the joint distribution of (X, X), and the knockoff variables in X  should be conditionally independent of Y . The importance of each variable is then determined by a statistic Wj = f(Xj, Xj; Y ) = Uj - Uj, which helps identify the significant variables.

C. Activation Analysis

After identifying the critical neurons per layer, we use tSNE for dimensionality reduction. For each layer we choose the 10 highest impact neurons and use t-SNE to reduce the 10 dimensions ZlS to 2 dimensions for each sample of the test set pi. t-SNE works by minimizing the KullbackLeibler divergence between the joint probabilities of the highdimensional and low-dimensional data:

Image

Pij and Qij represent the joint probabilities of the high-dimensional and low-dimensional data points 

A collage of images of different colors

AI-generated content may be incorrect.

Figure 3: The t-SNE reduced CN activations for each layer of the DNN. This includes the Benign samples in addition to DoS Hulk, FTP-Patator, Portscan, and SSH-Patator

A screenshot of a computer generated image

AI-generated content may be incorrect.

Figure 4: The t-SNE reduced CN activations for each layer of the DNN not including benign samples. This includes samples from the DoS Hulk, FTP-Patator, Portscan, and SSH-Patator classes. being neighbors. The resulting 2D visualization is color-coded, based on the true label attack type, revealing distinct clusters. This demonstrates the underlying mechanisms of model processing of malicious activity, despite the model being trained for binary classification. This is extended to each layer l, demonstrating the progression of the model’s internal representation.

We automate the cluster indetification process by applying the DBSCAN algorithm to the 2D t-SNE representations of the activations ZlS for layer l resulting in clusters C1, C2,…, Ck. DBSCAN classifies points into clusters by computing the ϵ-neighborhood: Npi={pj|‖pi-pj‖≤ϵ} for each point pi. It then identifies the core points, which have at least the minimum number of points necessary to form a dense region. By applying DBSCAN to the t-SNE reduced activations of each layer, we can automatically identify clusters that correspond to different attack types.

In order to attribute features to each attack type we need to analyze the importance of neurons to each DBSCAN obtained cluster. For each cluster Ci, we construct a binary classification problem:

  • Positive samples: (XCi+, yCi+) where XCi+=zjCi and yCi+=1.

  • Negative samples: (XCi-, yCi-) where XCi-=zjCi and yCi-=0.

We combine the positive and negative samples into a single dataset and train a random forest classifier RFCi on this dataset to predict whether a sample belongs to cluster Ci. The importance of each neuron (feature) in distinguishing samples in Ci from those not in Ci is given by the feature importance ICi of the random forest classifier:

ICi = FeatureImportances(RFCi).

The feature importance ICi quantifies the relevance of each neuron in the cluster Ci, providing insights into the critical neurons for different attack types identified by the clustering algorithm.

After determining the most important neurons per cluster, we use activation maximization to determine which features and patterns these neurons are most reactive towards. For each layer l, the activation maximization algorithm determines which input features maximize the activations of the critical neurons Nl. Formally, for a neuron nj in layer l, we aim to find the input x that maximizes the activation zj:

Image

Given the maximized activations, the impact of each feature xk (where k indexes the features) is determined by examining how changes in xk influence Zl. This is computed for all critical neurons in Nl. Due to the complexity of analyzing 78 features per attack, we additionally group multiple features into groups of similar characteristics and obtain the average per group. This method allows us to identify which features have the most and least impact on the activations of critical neurons per layer, thereby providing insights into the network’s decision-making process for different types of attacks.

IV. RESULTS

The DNN trained for binary classification of network data achieves an accuracy of 98.7% and an F1 score of 0.966 indicating a high performance and reliability in classification performance.

A. Granularity Clustering

To visualize the different mechanisms of the models propagation of malicious samples we used t-SNE to reduce the dimensionality of the critical neuron activations into 2 dimensions. As previously mentioned, for experimentation we focus on the following attacks; DoS Hulk, Portscan, FTPPatator, and SSH-Patator. The results, including the benign samples can be observed in Figure 3, while in Figure 4 we remove the benign samples and perform the computations again. For each class we use 1,000 testing samples and in these plots each point represents a network activity sample that is colored according to its true attack type. Even though the DNN was trained solely on binary classification (benign vs. malicious), the model’s internal processing representations revealed a higher granularity for samples of the same class. This granularity is noticeable, and can be observed as distinct clustering patterns that align with specific types of network attacks.

B. Feature Impact

For the feature analysis we focus on the fourth layer of the DNN since this layer having most compact clustering, and most discriminating abilities. We observe that for some

A diagram of different colored dots

AI-generated content may be incorrect.

Figure 5: The DBSCAN algorithm output clusters obtained from t-SNE dimension reduction on layer 4.of the attacks, 2 distinct clusters are formed, representing possible variations of these attack types. In order to avoid noisy clusters, or those that are insignificant (lesser amount of samples) we selected the 4 most populated clusters to perform the analysis on, each corresponding to an attack type. After obtaining the t-SNE created clusters we apply the DBSCAN algorithm to automatically identify clusters. This method allows us to automate the process of cluster identification and reproducible. Figure 5 shows the DBSCAN clustering result. To additionally evaluate the similarity between the DBSCAN clusters and the true attack type labels we calculated the Adjusted Rand Index (ARI) and obtained a score of 0.5177.

After obtaining the DBSCAN created clusters, we utilize the random forest feature importance technique on different sets of neurons that corresponds to each cluster. The results of this can be seen in Figure 6 where we show each critical neurons importance for each cluster.

A group of blue and white bars

AI-generated content may be incorrect.

Figure 6: Most relevant critical neuron for each cluster obtained by the random forest feature importance algorithm.

To identify the most and least impactful features for each attack type, we apply activation maximization on the critical neurons identified by 

A group of blue squares

AI-generated content may be incorrect.

Figure 7: Most and least relevant feature group corresponding to each attack type.

the random forest feature importance technique. This analysis reveals significant changes in feature importance across different attack types. This is performed on the individual features as shown in Figure 8, or on the grouped features for a more high level understanding in Figure 7. Next we highlight the results for 2 of the attacks. For Portscan we observe that the most important features are Flow Bytes per second, Flow IAT Min, and Average Packet Size, while the least important features are Backward Packet Length Max, Forward IAT Min, and Backward IAT Min. For the DoS Hulk attack we observe that the most important features are Flow Bytes/ Second, PSH Flag Count, and Active Mean, while the least important features are Backward URG Flags, Average Packet Size, and Active Std.

V. DISCUSSION

Even though the DNN was trained on a binary classification task (benign vs malicious network activity), the t-SNE produced visualizations clearly demonstrate distinct clusters representing different types of network attacks. As the layers progress, the clustering becomes more compact revealing the DNN’s increasing ability to differentiate between different types of attacks. This ability to obtain more granular insights from coarse model neuron activations enables a better understanding of the different mechanisms by which the model handles the same class (malicious). In the context of network intrusion detection systems, this allows for a deeper understanding of the roles features play in DNN decision-making and enables the clustering of attack types to break down similarities and differences, even within the same class and potentially attack type.

This pipeline begins with the identification of critical neurons using Model-X Knockoffs, an important step in simplifying the complexity of further computations, and allowing for a differentiation of feature attribution to cluster type as done by the random forest feature importance method. For the analysis of neuron activations, we utilized t-SNE for dimensionality reduction and DBSCAN for clustering. This finding is important as it demonstrates that even a coarsely trained model can uncover detailed, granular insights about the nature of various attacks. The clustering patterns indicate that the model implicitly learns to differentiate between attack types, providing a deeper understanding of the underlying mechanisms. To evaluate the quality of the clusters produced by DBSCAN we use the ARI resulting in a score of 0.5177 in the fourth layer. This score suggests similarity between the clustering results and the true labels, however due to multiple clusters for the same attack type this score is overall lowered.

A group of graphs with different colored lines

AI-generated content may be incorrect.

Figure 8: For each feature, the activation maximization score is presented with the 3 highest colored in orange while the 3 lowers are colored in blue.

The findings from this research underscore the potential of explainable AI in enhancing the interpretability of network intrusion detection systems. By uncovering the internal workings of a DNN trained on a binary classification task, we were able to achieve a granular understanding of various network attacks. This approach not only aids in improving model transparency but also provides valuable insights for refining intrusion detection systems.

VI. CONCLUSION

With the threat that network intrusion attacks pose to computer systems, reliable and transparent intrusion detection systems are necessary. Although deep learning methods have demonstrated considerable success in detecting these attacks, there remains a gap in understanding the inner workings of the decision-making process. This study proposes a novel explainable AI pipeline to enhance the interpretability of DNNs used for IDS by revealing more granular insights and patterns of sub-class attack types. By creating sub-clusters of various classes, we can better understand the mechanisms leading to these sub-classes and improve our knowledge of their characteristics. This study not only advances the current state of XAI in network intrusion detection but also provides a novel framework for obtaining increased granularity and neuron-level understanding in deep learning-based IDS.

REFERENCES

[1] Raisa Abedin and Sajjad Waheed. Performance analysis of machine learning models for intrusion detection system using gini impurity-based weighted random forest (giwrf) feature selection technique. Cybersecurity, 5, 12 2022. doi: 10.1186/s42400-021-00103-8.

[2] Jesse Ables, Nathaniel Childers, William Anderson, Sudip Mittal, Shahram Rahimi, Ioana Banicescu, and Maria Seale. Eclectic rule extraction for explainability of deep neural network based intrusion detection systems, 2024. URL https://arxiv.org/abs/2401.10207.

[3] Abhineet Agarwal, Ana Kenney, Yan Tan, Tiffany Tang, and Bin Yu. Mdi+: A flexible random forest-based feature importance framework, 07 2023.

[4] Pieter Barnard, Nicola Marchetti, and Luiz Silva. Robust network intrusion detection through explainable artificial intelligence (xai). IEEE Networking Letters, 4:1–1, 09 2022. doi: 10.1109/LNET.2022.3186589.

[5] Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 10 2016. doi: 10.1111/rssb.12265.

[6] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning, 2017. URL https://arxiv.org/abs/1702.08608.

[7] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks, 2019. URL https://arxiv.org/abs/1803.03635.

[8] Arash Habibi Lashkari, Gerard Draper Gil, Mohammad Mamun, and Ali Ghorbani. Characterization of tor traffic using time based features. pages 253–262, 01 2017. doi: 10.5220/0006105602530262.

[9] Ansam Khraisat, Iqbal Gondal, Peter Vamplew, and Joarder Kamruzzaman. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity, 2, 12 2019. doi: 10.1186/s42400-019-0038-7.

[10] Hongyu Liu and Bo Lang. Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9:4396, 10 2019. doi: 10.3390/app9204396.

[11] Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017. URL https://arxiv. org/abs/1705.07874.

[12] Shraddha Mane and Dattaraj Rao. Explaining network intrusion detection system using explainable ai framework, 03 2021.

[13] Asadullah Momand, Sana Ullah Jan, and Naeem Ramzan. A systematic and comprehensive survey of recent advances in intrusion detection systems using machine learning: Deep learning, datasets, and attack taxonomy. Journal of Sensors, 2023, 02 2023. doi: 10.1155/2023/6048087.

[14] Shruti Patil, Vijayakumar Varadarajan, Siddiqui Mazhar, Abdulwodood Sahibzada, Nihal Ahmed, Onkar Sinha, Satish Kumar V C, Kaialsh Shaw, and Ketan Kotecha. Explainable artificial intelligence for intrusion detection system. Electronics, 11:3079, 09 2022. doi: 10.3390/ electronics11193079.

[15] Anant Ram, Jalal Sunita, Anand Jalal, and Kumar Manoj. A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Applications, 3, 06 2010. doi: 10.5120/ 739-1038.

[16] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “why should i trust you?”: Explaining the predictions of any classifier, 2016. URL https://arxiv.org/abs/1602. 04938.

[17] Max Schroetter, Andreas Niemann, and Bettina Schnor. A comparison of neural network-based intrusion detection against signature-based detection in iot networks, 02 2024.

[18] Iman Sharafaldin, Arash Habibi Lashkari, and Ali Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. pages 108–116, 01 2018. doi: 10.5220/0006639801080116.

[19] Laurens van der Maaten and Geoffrey Hinton. Viualizing data using t-sne. Journal of Machine Learning Research, 9:2579–2605, 11 2008.

[20] Minh N. Vu, Truc D. Nguyen, and My T. Thai. Neucept: Locally discover neural networks’ mechanism via critical

neurons identification with precision guarantee, 2022. URL https://arxiv.org/abs/2209.08448.

[21] Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization, 2015. URL https://arxiv.org/abs/1506. 06579.