Skip to main content

MA 415G course notes

Section 5.7 An Algorithm to (Approximately) Maximize Modularity

We have now developed a well-defined problem to solve in order to optimally split a network into one or two communities.
Problem: Given a network \(G=(V,E)\) where \(V=\{1,2,\ldots,n\}\) and \(|E|=m\text{,}\) find the vertex partition \(V=V_1\uplus V_2\) (where it is possible for one of these sets to be empty) that maximizes the modularity function
\begin{equation*} Q := \frac{1}{2m} \sum_{1\leq i,j \leq n} \left( A_{i,j} - \frac{d_i d_j}{2m} \right) \delta_{g_i,g_j} \, . \end{equation*}
over the set of all such vertex partitions.
There are many general approaches that have been developed for this type of discrete optimization problem, such as simulated annealing, genetic algorithms, extremal optimization, etc. In practice, it is very difficult to develop an algorithm that is both efficient and produces the actual maximizer. Instead, most algorithms to solve this optimization problem only provide approximate maximizers -- this is usually good enough in practice.
We will consider a simple algorithm to provide an approximate solution to this optimization problem that is specific to modularity maximization, rather than general-purpose discrete optimization algorithms. The method we will consider was introduced by Newman in 2006 in the paper "Modularity and community structure in networks" https://www.pnas.org/doi/10.1073/pnas.0601602103. This method is most effective when combined into a two-stage method, where the first stage uses "spectral methods" involving eigenvector calculations for the modularity matrix \(B=[B_{i,j}]\) to find an initial approximate solution, and the node-moving algorithm described below is the second step to make small adjustments to improve the solution found with the spectral method.

Definition 5.7.1. Node-Moving Algorithm.

Let \(G\) be a network.
  1. Select a random partition of the vertices \(V=V_1\uplus V_2\) into roughly equally-sized groups.
  2. A pass of the node-moving algorithm is defined as follows.
    1. For each node in the network, calculate the change in the modularity if that node were moved to the other set in the partition.
    2. Select the node that most increases (or least decreases) the modularity and move it.
    3. Repeat the first two steps, however, you are not allowed to move a node once it has been moved during the pass.
    4. After all nodes have been moved, the pass ends.
  3. At the completion of one pass, go through all of the partitions obtained during that pass (including the original partition) and select the partition \(V=W_1\uplus W_2\) with the maximum modularity.
    1. If the partition \(V_1\uplus V_2\) is different than \(W_1\uplus W_2\text{,}\) repeat another pass using \(W_1\uplus W_2\) as the starting partition.
    2. If the partition \(V_1\uplus V_2\) is the same as \(W_1\uplus W_2\text{,}\) then we cannot improve the modularity with a pass, and the algorithm terminates with \(W_1\uplus W_2\) as output.
For anyone who is interested in seeing how this looks in code, here is a sample of Sagemath code that you can use in https://sagecell.sagemath.org/. This code snippet will run the node-moving algorithm on a random graph from the configuration model with a degree sequence of your choice.
import networkx as nx

def nodemove(V,i):
    W = []
    if i in V[0]:
        W=[]
        W.append([j for j in V[0] if j != i])
        W.append([i]+V[1])
    else:
        W=[]
        W.append([i]+V[0])
        W.append([j for j in V[1] if j != i])
    return(W)

def mostdiffnode(G,V,checked):
    maxdiff = -100
    for i in Graph(G).vertices():
        if i not in checked:
            if N(nx.community.modularity(G,nodemove(V,i))) - N(nx.community.modularity(G,V)) > maxdiff:
                maxvert = i
                maxdiff = N(nx.community.modularity(G,nodemove(V,i))) - N(nx.community.modularity(G,V))
    return(maxvert)

def nodepasspartitions(G,V):
    n = len(Graph(G).vertices())
    parts = [V]
    checked = []
    while n > len(checked):
        t = mostdiffnode(G,parts[-1],checked)
        parts.append(nodemove(parts[-1],t))
        checked.append(t)
    return(parts)

def largestmodpart(G,parts):
    mod = -100
    for W in parts:
        if N(nx.community.modularity(G,W)) > mod:
            largestpart = W
            mod = N(nx.community.modularity(G,W))
    return(largestpart)

D = 5*[1]+[ZZ.random_element(1,20) for _ in range(20)]+10*[1]+[ZZ.random_element(1,20) for _ in range(10)]+10*[1]+[ZZ.random_element(1,20) for _ in range(10)]+5*[1]

G = nx.configuration_model(D)

V = [[i for i in range(floor(len(D)/2))],[i for i in range(floor(len(D)/2),len(D))]]

W = largestmodpart(G,nodepasspartitions(G,V))

count = 0

while sorted([sorted(V[0]),sorted(V[1])]) != sorted([sorted(W[0]),sorted(W[1])]):
    W1 = largestmodpart(G,nodepasspartitions(G,W))
    print('modularity update: '+str(N(nx.community.modularity(G,W1))))
    V = W
    W = W1
    count += 1
    print(str(count)+' passes')

Graph(G).show(layout='circular')
print(sorted([sorted(W[0]),sorted(W[1])]))
print('estimated max modularity is '+str(N(nx.community.modularity(G,W))))
print('number of passes: '+str(count))

Checkpoint 5.7.2.

Discuss the algorithm above. Does it make sense?
Run this algorithm on the graph below starting from the partition \(V_1=\{0,1\}, V_2=\{3,4\}\text{.}\)
Figure 5.7.3. Graph for the node-moving algorithm checkpoint.
You should find that the algorithm terminates at the partition \(W_1=\{0,2\}, W_2=\{1,3\}\text{.}\)
Note that there is no guarantee that the result of this algorithm is actually a maximizer for modularity -- however, it does provide a way to generate a partition with higher modularity than you started with. There are many interesting algorithms for maximizing modularity, and there are many other approaches beyond modularity to the study of community detection. So, you should view what we have done as an invitation to further study rather than as a comprehensive survey.