Android malware detection: Investigating the impact of imbalanced data-sets on the performance of machine learning models

Sawadogo, Zakaria; Mendy, Gervais; Marie Dembele, Jean; Ouya, Samuel

RSIF Digital Repository Home
→
Rsif Scholars' Publications
→
ICTs including Big Data and Artificial Intelligence
→
View Item

dc.contributor.author	Sawadogo, Zakaria
dc.contributor.author	Mendy, Gervais
dc.contributor.author	Marie Dembele, Jean
dc.contributor.author	Ouya, Samuel
dc.date.accessioned	2022-03-29T09:53:57Z
dc.date.available	2022-03-29T09:53:57Z
dc.date.issued	2022-03-11
dc.identifier.uri	https://repository.rsif-paset.org/xmlui/handle/123456789/143
dc.description	Conference paper presented at the 2022 24th International Conference on Advanced Communication Technology (ICACT) on 13-16 Feb. 2022, PyeongChang Kwangwoon_Do, Korea, Republic of. Full paper: https://doi.org/10.23919/ICACT53585.2022.9728833	en_US
dc.description.abstract	Artificial intelligence has revolutionized many areas of research, including research on malicious application detection and classification. Nowadays, there are many approaches that learn from existing data and predict the classes of new data. Machine learning principles recommend a balance of classes in the training dataset, but the reality in the field is quite different. The majority of datasets used for malicious application detection are imbalanced. Class imbalance degrades classifier performance, so it is a common problem in classification tasks. This observation is much more significant in the area of Android malware detection and classification. There are few works to our knowledge on the effects of imbalanced datasets in the field of Android malware detection. Our contribution focuses on the impact of imbalanced datasets on the performance of different algorithms and the suitability of using evaluation metrics in Android malware detection. We show that for malicious application detection, some classification algorithms are not suitable for unbalanced datasets. We also proved that some of the most used in literature performance evaluation metrics (Accuracy, Precision, Recall) are not very well adapted to imbalanced datasets. On the other hand, the metrics (Balanced_accuracy, Geometric mean) are more adapted. These results were obtained by evaluating the performances of eleven classification algorithms and also the adequacy of the different evaluation metrics (Accuracy, Recall, Precision, F1_score, Balanced accuracy, Matthews corrcoef, Geometric mean, Fowlkes_mallows).	en_US
dc.publisher	IEEE Xplore	en_US
dc.subject	artificial inytelligence, machine learning, Android malware detection, machine learning models	en_US
dc.title	Android malware detection: Investigating the impact of imbalanced data-sets on the performance of machine learning models	en_US
dc.type	Presentation	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

ICTs including Big Data and Artificial Intelligence [74]

Show simple item record

Search RSIF Digital Repository

Advanced Search

Browse

All of RSIF Digital Repository
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Android malware detection: Investigating the impact of imbalanced data-sets on the performance of machine learning models

Files in this item

This item appears in the following Collection(s)

Search RSIF Digital Repository

Browse

All of RSIF Digital Repository

This Collection

My Account