Information Theoretic, Probabilistic And Maximum Partial Substructure Algorithms For Discovering Graph-based Anomalies
Abstract
The ability to mine data represented as a graph has become important in several domains for detecting various structural patterns. One important area of data mining is anomaly detection, particularly for fraud. However, less work has been done in terms of detecting anomalies in graph-based data. While there has been some previous work that has used statistical metrics and conditional entropy measurements, the results have been limited to certain types of anomalies and specific domains. In this work we present graph-based approaches to uncovering anomalies in domains where the anomalies consist of unexpected entity/relationship alterations that closely resemble non-anomalous behavior. We have developed three algorithms for the purpose of detecting anomalies using the minimum description length principle to first discover the normative substructure. Once the common pattern is known, each algorithm then uses a different approach to discover particular types of anomalies. Using synthetic and real-world data, we evaluate the effectiveness of each of these algorithms. Our approach demonstrates the usefulness of examining a graph-based representation of data for the purposes of detecting fraud, where some individual or entity is cloaking their illegal activities through an attempt at closely resembling legitimate transactions.