MLN-Subdue: Decoupling Approach-Based Substructure Discovery In Multilayer Networks (MLNs)
Abstract
Substructure discovery is well-researched for single graphs (both simple and attribute) as it is an important component of knowledge discovery for many applications
such as finding the core substructure in a protein, important concept in a large graph, etc. However, multilayer networks or MLNs (instead of attribute graphs) have been shown to be better for modeling complex data sets that have multiple entity and feature types. This model provides more clarity on semantics of data sets as
well as the ability to use an arbitrary subset of layers for analysis. However, the challenge is that many algorithms such as community and centrality detection as well as substructure discovery need to be extended to MLN representation.
With the representation of complex data sets as MLNs brings new challenges in terms of fi nding substructures in MLNs or a subset of MLNs. A naive approach would be to collapse (or aggregate) all (or a subset of) layers into a single attribute
graph and use extant algorithms. There are a number of substructure discovery algorithms ranging from memory-based, disk-based, SQL-based, and partitioned using
map/reduce framework.
While substructure discovery has been widely used for the analysis of single
networks, attribute graphs, and forests, the development of an efficient substructure discovery methods for multilayer networks without aggregation is currently not available.
This thesis proposes a new decoupling approach-based substructure discovery algorithm for homogeneous MLNs (or HoMLNs). HoMLNs are MLNs where each layer has the same set of nodes but different connectivity in each layer.
In this dissertation, we propose a decoupling approach-based algorithm for HoMLNs where aggregation is not needed. Further, the algorithm has been implemented using the Map/Reduce framework in order to handle arbitrary number of
layers and improving the response time through parallelism. Each layer is processed individually/separately in parallel, but the substructures generated for each layer
are combined after each iteration to identify substructures across layers (or MLN). The focus is on correctness of the algorithm and resource utilization with respect to number of layers. The proposed algorithm is validated analytically as well through
extensive experimental analysis on large real-world and synthetic graphs.