Analyzing Bank Fraud Dataset using Neo4j

Samarth Goyal
4 min readApr 19, 2023

Perform online analytical processing using any appropriate distributed clustering package for Neo4J.

There are many uses for community detection in a variety of domains, including social networks, biological networks, and financial networks. Community detection is a crucial task in network analysis. The aim of community detection is to Find clusters of nodes in a network that are more closely connected to one another than to other nodes. Communities are these collections of nodes, which can represent communities of people with shared interests, communities of genes with related roles, or any other groupings that can be deduced from the network structure.

One popular community detection algorithm is the Louvain algorithm. In order to maximise a quality function that gauges the modularity of the network, the Louvain method, a hierarchical clustering algorithm, first assigns each node to its own community before iteratively merging those communities. The degree to which nodes within communities are more tightly connected to one another than to nodes in other communities is measured by a network’s modularity. The Louvain algorithm optimizes the modularity by greedily merging pairs of communities that result in the highest increase in modularity until no further improvement can be made

Implementing the Louvain algorithm in Neo4j can be done using the Graph Data Science Library. The Graph Data Science Library provides a suite of algorithms for graph analysis and includes an implementation of the Louvain algorithm for community detection. To demonstrate the implementation of the Louvain algorithm in Neo4j, we will use an example of a financial network consisting of bank accounts and transfers between them.

The code provided in the prompt demonstrates how to implement the Louvain algorithm in Neo4j using the Graph Data Science Library.

This line of code begins with a ‘MATCH’ clause that finds all pairs of ‘Account’ nodes connected by a ‘TRANSFER’ relationship, and assigns them to variables ‘a’ and ‘b’. The t variable is used to refer to the ‘TRANSFER’ relationship itself.

The ‘WITH’ clause pipes the results from the ‘MATCH’ clause to this line, which calculates the sum of all amount properties on the ‘TRANSFER’ relationships and assigns it to the ‘totalAmount’ variable. This variable will be used later to set a ‘totalAmount’ property on a community node.

This code creates a new ‘Community’ node, or finds an existing one, and creates ‘BELONGS_TO’ relationships between the ‘a’ and ‘b’ accounts and the community node. If there is an existing community node that both ‘a’ and ‘b’ already belong to, then that node is used instead.

This line sets a ‘totalAmount’ property on the ‘Community’ node that ‘a’ and ‘b’ belong to, using the ‘totalAmount’ variable calculated earlier.

This line sorts the ‘Community’ nodes by their ‘totalAmount’ property in descending order.

This line collects all of the ‘Community’ nodes into a list, which is assigned to the ‘communities’ variable.

From the above code every “Community” node receives a “rank” attribute based on which it falls in the list of “communities.” Each community will start with the ‘rank’ of 1, for the next highest community it increases to 2, and so on.

Finally, the work of this function is to return the Account node and the name of the Community node after locating all “Account” nodes which are connected to a “Community” node.

The above code performs some data transformations on a graph database that represents financial transactions between accounts. Specifically, it creates ‘Community’ nodes that group together accounts based on their transfer activity, sets a ‘totalAmount’ property on each community node based on the total amount transferred between accounts in the community, sorts the communities by their total transfer amount and assigns a ‘rank’ property to each community based on its position in the sorted list, and finally returns all of the accounts and the name of the community they belong to. This code is used for performing community detection. It performs it on a financial transaction network and identify groups of accounts which are connected highly to each other.

--

--

Samarth Goyal

Passionate about everything related to technology and smart gadgets. I developed an E-commerce android application in 10th grade and will continue to explor