Abstract:
Machine learning classification algorithms have been extensively utilized in addressing user
authentication challenges. Nonetheless, a majority of solutions categorize users into three classes,
whereas adaptive authentication scenarios necessitate classification beyond this threshold. The
rationale behind this limitation has not been thoroughly explored. The current study leveraged the
Naive Bayes theorem for user authentication endeavors to assess the risk associated with login
attempts. The Naive Bayes Machine Learning algorithm, along with its variations such as Gaussian,
Categorical, and Bernoulli, was applied on both weighted and unweighted datasets to ascertain risk
levels and categorize them into six classes. Additionally, the classification task was executed using
alternative algorithms. The outcomes of cross-validation and comparative analyses revealed that the
performance was commendable for up to three classes, after which a decrease was observed in certain
Naive Bayes and SVM classifiers. Among the Naïve Bayes family, the Bernoulli NB algorithm exhibited
superior performance but was surpassed by Decision Trees, SVM, XGB, and Random Forests.
Notably, the weighted dataset consistently outperformed the unweighted counterpart, with the
allocation of weights significantly influencing algorithmic efficacy. The 80:20 split strategy yielded the
most favorable outcomes in contrast to the 70:30 and 60:40 splits, albeit no significant variances were
detected during cross-validation. Non-Naïve Bayes algorithms demonstrated superior performance
compared to Naïve Bayes algorithms. For Naïve Bayes, optimal performance is achieved with three
classes, highlighting its utility in conditional risk calculation, while non-Naïve Bayes multiclassification algorithms are more suitable for classification tasks due to the problem’s inherent
compatibility with conditional probabilities. In conclusion, it is imperative to acknowledge that the
characteristics of the data, the use of weights, and the data splitting methodology significantly
influence the accuracy of machine learning algorithms in multi-class user classification.