Abstract:
owadays, there is a growing interest in data mining and information retrieval applications from Knowledge Graphs (KG). However, the latter (KG) suffers from several data quality problems such as accuracy, completeness, and different kinds of errors. In DBpedia, there are several issues related to data quality. Among them, we focus on the following: several entities are in classes they do not belong to. For instance, the query to get all the entities of the class Person also returns group entities, whereas these should be in the class Group. We call such entities “outliers.” The discovery of such outliers is crucial for class learning and understanding. This paper proposes a new outlier detection method that finds these entities. We define a semantic measure that favors the real entities of the class (inliers) with positive values while penalizing outliers with negative values and improving it with the discovery of frequent and rare itemsets. Our measure outperforms FPOF (Frequent Pattern Outlier Factor) ones. Experiments show the efficiency of our approach.