Problem: Given a network \(G=(V,E)\) where \(V=\{1,2,\ldots,n\}\) and \(|E|=m\text{,}\) find the vertex partition \(V=V_1\uplus V_2\) (where it is possible for one of these sets to be empty) that maximizes the modularity function
There are many general approaches that have been developed for this type of discrete optimization problem, such as simulated annealing, genetic algorithms, extremal optimization, etc. In practice, it is very difficult to develop an algorithm that is both efficient and produces the actual maximizer. Instead, most algorithms to solve this optimization problem only provide approximate maximizers -- this is usually good enough in practice.
We will consider a simple algorithm to provide an approximate solution to this optimization problem that is specific to modularity maximization, rather than general-purpose discrete optimization algorithms. The method we will consider was introduced by Newman in 2006 in the paper "Modularity and community structure in networks" https://www.pnas.org/doi/10.1073/pnas.0601602103. This method is most effective when combined into a two-stage method, where the first stage uses "spectral methods" involving eigenvector calculations for the modularity matrix \(B=[B_{i,j}]\) to find an initial approximate solution, and the node-moving algorithm described below is the second step to make small adjustments to improve the solution found with the spectral method.
At the completion of one pass, go through all of the partitions obtained during that pass (including the original partition) and select the partition \(V=W_1\uplus W_2\) with the maximum modularity.
If the partition \(V_1\uplus V_2\) is the same as \(W_1\uplus W_2\text{,}\) then we cannot improve the modularity with a pass, and the algorithm terminates with \(W_1\uplus W_2\) as output.
For anyone who is interested in seeing how this looks in code, here is a sample of Sagemath code that you can use in https://sagecell.sagemath.org/. This code snippet will run the node-moving algorithm on a random graph from the configuration model with a degree sequence of your choice.
import networkx as nx
def nodemove(V,i):
W = []
if i in V[0]:
W=[]
W.append([j for j in V[0] if j != i])
W.append([i]+V[1])
else:
W=[]
W.append([i]+V[0])
W.append([j for j in V[1] if j != i])
return(W)
def mostdiffnode(G,V,checked):
maxdiff = -100
for i in Graph(G).vertices():
if i not in checked:
if N(nx.community.modularity(G,nodemove(V,i))) - N(nx.community.modularity(G,V)) > maxdiff:
maxvert = i
maxdiff = N(nx.community.modularity(G,nodemove(V,i))) - N(nx.community.modularity(G,V))
return(maxvert)
def nodepasspartitions(G,V):
n = len(Graph(G).vertices())
parts = [V]
checked = []
while n > len(checked):
t = mostdiffnode(G,parts[-1],checked)
parts.append(nodemove(parts[-1],t))
checked.append(t)
return(parts)
def largestmodpart(G,parts):
mod = -100
for W in parts:
if N(nx.community.modularity(G,W)) > mod:
largestpart = W
mod = N(nx.community.modularity(G,W))
return(largestpart)
D = 5*[1]+[ZZ.random_element(1,20) for _ in range(20)]+10*[1]+[ZZ.random_element(1,20) for _ in range(10)]+10*[1]+[ZZ.random_element(1,20) for _ in range(10)]+5*[1]
G = nx.configuration_model(D)
V = [[i for i in range(floor(len(D)/2))],[i for i in range(floor(len(D)/2),len(D))]]
W = largestmodpart(G,nodepasspartitions(G,V))
count = 0
while sorted([sorted(V[0]),sorted(V[1])]) != sorted([sorted(W[0]),sorted(W[1])]):
W1 = largestmodpart(G,nodepasspartitions(G,W))
print('modularity update: '+str(N(nx.community.modularity(G,W1))))
V = W
W = W1
count += 1
print(str(count)+' passes')
Graph(G).show(layout='circular')
print(sorted([sorted(W[0]),sorted(W[1])]))
print('estimated max modularity is '+str(N(nx.community.modularity(G,W))))
print('number of passes: '+str(count))
Note that there is no guarantee that the result of this algorithm is actually a maximizer for modularity -- however, it does provide a way to generate a partition with higher modularity than you started with. There are many interesting algorithms for maximizing modularity, and there are many other approaches beyond modularity to the study of community detection. So, you should view what we have done as an invitation to further study rather than as a comprehensive survey.