Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models

Sawadogo, Zakaria; Dembele, Jean-Marie; Mendy, Gervais; Ouya, Samuel

RSIF Digital Repository Home
→
Rsif Scholars' Publications
→
ICTs including Big Data and Artificial Intelligence
→
View Item

dc.contributor.author	Sawadogo, Zakaria
dc.contributor.author	Dembele, Jean-Marie
dc.contributor.author	Mendy, Gervais
dc.contributor.author	Ouya, Samuel
dc.date.accessioned	2023-11-21T10:36:06Z
dc.date.available	2023-11-21T10:36:06Z
dc.date.issued	2023-03-29
dc.identifier.uri	https://repository.rsif-paset.org/xmlui/handle/123456789/306
dc.description	Full text: https://ieeexplore.ieee.org/document/10079245	en_US
dc.description.abstract	Machine learning techniques have become an essential part of research into the detection and classification of malicious applications. There are several approaches or algorithms that learn from existing data and predict classes. Machine learning principles recommend a balance of classes in the training dataset, but the reality on the ground is quite different. The majority of datasets used for malicious application detection are unbalanced. Class imbalance degrades classifier performance, so it is a common problem in classification tasks. This observation is much more significant in the field of Android malware detection and classification. There is little work to our knowledge on the effects of unbalanced datasets in the field of Android malware detection. Our contribution focuses on the impact of unbalanced datasets on the performance of different algorithms and the relevance of using evaluation metrics in Android malware detection. And the state of the databases from which researchers typically draw datasets. We show that for malicious application detection, some classification algorithms are not suitable for unbalanced datasets. We also prove that some of the most widely used performance evaluation metrics in the literature (Accuracy, Precision, Recall) are not very well suited to unbalanced datasets. On the other hand, the metrics (Balanced Accuracy, Geometric mean) are more suitable. These results were obtained by evaluating the performances of eleven classification algorithms as well as the adequacy of the different evaluation metrics (Accuracy, Recall, Precision, F1_score, Balanced accuracy, Matthews corrcoef, Geometric mean, Fowlkes_mallows). Also not all databases are accessible by researchers and many of these databases are not updated.	en_US
dc.publisher	IEEE Xplore	en_US
dc.subject	imbalanced dataset , Android malware detection , Malware classification , Artificial intelligence , Machine learning	en_US
dc.title	Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models	en_US
dc.type	Presentation	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

ICTs including Big Data and Artificial Intelligence [74]

Show simple item record

Search RSIF Digital Repository

Advanced Search

Browse

All of RSIF Digital Repository
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models

Files in this item

This item appears in the following Collection(s)

Search RSIF Digital Repository

Browse

All of RSIF Digital Repository

This Collection

My Account