The Configuration Model for Random Networks

Section 5.5 The Configuration Model for Random Networks

The first challenge when attempting to identify communities in a network is to define what a "community" within the network actually is. One intuitive definition is that a community is a group of nodes that are more densely connected to each other than to the rest of the network. However, this definition is somewhat vague, as it does not specify how dense the connections within the community should be, nor does it clarify what it means to be "more densely connected" to each other than to the rest of the network.

🔗

Another idea, which is the basis for modularity, is that a community is a set of vertices that have more internal connections than would be expected in a "similar" random network. Thus, for example, if we had a network and we wanted to see if it naturally split into two communities, we could consider all possible ways to partition the vertices into two groups and, for each partition, determine how far the number of edges within the two groups are above what would be expected in a "similar" random network. Then, the partition that maximizes this distance above the expected value would be considered the "best" division of the network into two communities.

🔗

The first problem that we therefore have to solve is to identify a class of networks that are "similar" to the network under consideration, and then to determine how to compute the expected number of edges within a subset of vertices in such a network. This means that we need to do three things:

identify a random graph model that matches some of the properties of the graph under consideration,
🔗

🔗
assign a probability distribution to this model, and finally
🔗

🔗
compute the expected number of edges within a subset of vertices for these networks.
🔗

🔗

🔗

We will be able to achieve these goals by using the configuration model as our random graph model. The configuration model uses a broader class of networks than only the finite simple graphs we have been exploring so far. Instead, it involves the class of looped multigraphs with a given degree sequence.

🔗

Definition 5.5.1.

A looped multigraph is a graph that is allowed to have multiple edges between the same pair of vertices as well as edges that connect a vertex to itself, i.e., loops. The degree of a vertex in a looped multigraph is defined as the number of edges incident to that vertex, with loops contributing two to the degree count.

🔗

Example 5.5.2.

Consider the following looped multigraph on ten vertices where every vertex has degree seven.

🔗

🔗

The configuration model is a random graph model that creates a (usually non-uniform) distribution on all looped multigraphs with a fixed degree sequence. When thinking about the configuration model, it is often helpful to use the "stub" or "half-edge" representation of a looped multigraph. In this representation, each vertex \(i\) is assigned \(d_i\) "stubs", or partial edges incident to \(i\text{,}\) and the looped multigraph is formed by randomly pairing these stubs to create edges. Thus, every looped multigraph in the configuration model is generated by a perfect matching of the set of all stubs.

🔗

Checkpoint 5.5.4.

Consider the looped multigraph in the figure below, which is the same as we considered in the previous example.

Imagine cutting each edge in half to create stubs. How many stubs are there in total? (Remember that every vertex has degree seven.)
🔗

🔗
If you cut each of the loops and reconnect the resulting stubs to form edges, can you create a different multigraph with the same degree sequence that has no loops?
🔗

🔗

🔗

🔗

Definition 5.5.6.

Fix a sequence of non-negative integers \((d_1, d_2, \ldots, d_n)\) such that the sum \(d_1 + d_2 + \cdots + d_n\) is even. The configuration model with degree sequence \((d_1, d_2, \ldots, d_n)\) is the probability space where:

the event set consists of all perfect matchings of the set of stubs formed on the vertex set \(\{1,2,\ldots,n\}\) where vertex \(i\) has \(d_i\) stubs, and
🔗

🔗
the uniform distribution is used for the set of these perfect matchings of stubs.
🔗

🔗

Each such perfect matching of stubs corresponds to a looped multigraph on vertex set \(\{1, 2, \ldots, n\}\) where vertex \(i\) has degree \(d_i\text{.}\) Thus, the configuration model induces a probability distribution on the set of looped multigraphs with degree sequence \((d_1, d_2, \ldots, d_n)\text{.}\)

🔗

Note that it is possible for two different perfect matchings of stubs to correspond to the same looped multigraph. Thus the configuration model is not the uniform distribution on looped multigraphs with the given degree sequence, rather it is the uniform distribution on perfect matchings of stubs, and this induces a non-uniform probability distribution on the looped multigraphs. The following example illustrates this point.

🔗

Example 5.5.7.

Consider the configuration model on two vertices with degree sequence \((2,2)\text{.}\) Let’s denote the two vertices as \(u\) and \(v\text{.}\) Each vertex has two stubs, so we have a total of four stubs: \(u_1, u_2\) for vertex \(u\) and \(v_1, v_2\) for vertex \(v\text{.}\) The possible perfect matchings of these stubs are:

\(\displaystyle \{(u_1, v_1), (u_2, v_2)\}\)
🔗

🔗
\(\displaystyle \{(u_1, v_2), (u_2, v_1)\}\)
🔗

🔗
\(\displaystyle \{(u_1, u_2), (v_1, v_2)\}\)
🔗

🔗

The first two matchings correspond to the same simple graph with two edges between \(u\) and \(v\text{,}\) while the third matching corresponds to a graph with two loops, one on each vertex. Therefore, in the configuration model with degree sequence \((2,2)\text{,}\) the network with two edges between \(u\) and \(v\) has a probability of \(2/3\text{,}\) while the network with two loops has a probability of \(1/3\text{.}\)

🔗

Let’s check this experimentally: using Sagemath with the python networkx library, we can simulate the configuration model with degree sequence \((2,2)\) multiple times and observe the frequencies of the resulting graphs. Here is a sample code snippet to perform this simulation where we randomly sample from the configuration model 10000 times.

🔗

import networkx as nx

D = {}

# Simulate the configuration model 10000 times
for _ in range(10000):
    G = Graph(nx.configuration_model([2,2])).copy(immutable=True)
    if G in D.keys():
        D[G] += 1
    else:
        D[G] = 1

# Display the results
for g in D.keys():
    print(D[g])
    show(g)
    print()

L = [D[g] for g in D.keys()]
s = sum(L)
for i in L:
    print(N(i/s))

You can copy and paste this code into https://sagecell.sagemath.org/ to test it yourself, and you will see that the results align with our theoretical calculations -- the graph with two parallel edges appears approximately twice as often as the graph with two loops.

🔗

Checkpoint 5.5.8.

What is the event set for the configuration model with degree sequence \((1,2,3)\) on three vertices? (Hint: there are fifteen perfect matchings, hence fifteen events.)
🔗

🔗
What is the probability of each looped multigraph in this configuration model? (Hint: there are three looped multigraphs with degree sequence \((1,2,3)\text{.}\))
🔗

🔗
Modify the Sagemath code snippet in the previous example to simulate the configuration model with degree sequence \((1,2,3)\) and verify your answer to question 2 experimentally.
🔗

🔗

🔗

We can compute the total number of perfect matchings of the stubs in the configuration model with degree sequence \((d_1, d_2, \ldots, d_n)\text{.}\)

🔗

Proposition 5.5.9.

Let \((d_1, d_2, \ldots, d_n)\) be a sequence of non-negative integers such that \(2m = d_1 + d_2 + \cdots + d_n\) is even, where \(m=|E|\) is the number of edges in any looped multigraph with degree sequence \((d_1, d_2, \ldots, d_n)\text{.}\) The total number of perfect matchings of the set of \(2m\) stubs in the configuration model with degree sequence \((d_1, d_2, \ldots, d_n)\) is given by

\begin{equation*} \frac{(2m)!}{2^{m} (m)!} = (2m - 1)!! \end{equation*}

where \((2m - 1)!!\) denotes the double factorial of \(2m - 1\text{,}\) defined as the product of all odd integers from \(1\) to \(2m - 1\text{.}\)

🔗

Proof.

List the pairs in the perfect matching one at a time. There are \(2m\) choices for the first stub in the first pair and \(2m - 1\) choices for the second stub in the first pair. After choosing the first pair, there are \(2m - 2\) stubs left, so there are \(2m - 2\) choices for the first stub in the second pair and \(2m - 3\) choices for the second stub in the second pair. Continuing in this way, we see that the total number of ways to list the pairs is

\begin{equation*} (2m) \cdot (2m - 1) \cdot (2m - 2) \cdot (2m - 3) \cdots 3 \cdot 2 \cdot 1 = (2m)! \end{equation*}

However, in this count, each perfect matching is counted multiple times. Specifically, each perfect matching is counted \(2^{m} \cdot (m)!\) times, since:

each of the \(m\) pairs can be listed in two different orders, and
🔗

🔗
the \(m\) pairs themselves can be listed in any order.
🔗

🔗

Therefore, the total number of distinct perfect matchings is

\begin{equation*} \frac{(2m)!}{2^{m} (m)!}\, . \end{equation*}

Some additional algebraic manipulation shows that this expression is equal to \((2m - 1)!!\text{.}\)

🔗

Checkpoint 5.5.10.

Discuss the proof above. Does it make sense? Why or why not?

🔗

We next use this fact to determine the expected number of edges between two vertices in this model.

🔗

Theorem 5.5.11.

Let \((d_1, d_2, \ldots, d_n)\) be a sequence of non-negative integers such that \(2m = d_1 + d_2 + \cdots + d_n\text{.}\) In the configuration model with degree sequence \((d_1, d_2, \ldots, d_n)\text{,}\) the expected number of edges between vertices \(i\) and \(j\) is given by

\begin{equation*} \frac{d_i d_j}{2m - 1}\, . \end{equation*}

🔗

Proof.

We will use indicator functions to compute the expected number of edges between vertices \(i\) and \(j\text{.}\) Let \(X\) be the random variable that counts the number of edges between vertices \(i\) and \(j\text{.}\) We can express \(X\) as the sum of indicator functions:

\begin{equation*} X = \sum_{a=1}^{d_i} \sum_{b=1}^{d_j} X_{a,b} \end{equation*}

where \(X_{a,b}\) is the indicator function that is \(1\) if the \(a\)-th stub of vertex \(i\) is matched with the \(b\)-th stub of vertex \(j\text{,}\) and \(0\) otherwise. By the linearity of expectation, we have:

\begin{equation*} \mathbb{E}[X] = \mathbb{E}\left[\sum_{a=1}^{d_i} \sum_{b=1}^{d_j} X_{a,b}\right] = \sum_{a=1}^{d_i} \sum_{b=1}^{d_j} \mathbb{E}[X_{a,b}]\, . \end{equation*}

Since \(X_{a,b}\) is an indicator function, its expectation is equal to the probability that the \(a\)-th stub of vertex \(i\) is matched with the \(b\)-th stub of vertex \(j\text{.}\) The number of perfect matchings where these two stubs are matched together is equal to the total number of perfect matchings of the remaining \(2m - 2\) stubs. By our preceding result, this number is \((2m - 3)!!\text{.}\) The total number of perfect matchings of all \(2m\) stubs is \((2m - 1)!!\text{.}\) Thus, the probability that the \(a\)-th stub of vertex \(i\) is matched with the \(b\)-th stub of vertex \(j\) is:

\begin{equation*} \mathbb{P}(X_{a,b} = 1) = \frac{(2m - 3)!!}{(2m - 1)!!} = \frac{1}{2m - 1}\, . \end{equation*}

Substituting this back into our expression for the expectation, we get:

\begin{equation*} \mathbb{E}[X] = \sum_{a=1}^{d_i} \sum_{b=1}^{d_j} \frac{1}{2m - 1} = \frac{d_i d_j}{2m - 1}\, . \end{equation*}

🔗

Checkpoint 5.5.12.

Discuss the proof above. Does it make sense? Why or why not?

🔗

We have now achieved all three of our goals for developing the configuration model and computing the expected number of edges between a pair of vertices. Our task in the next section will be to use the expected number of edges computed above to define the modularity of a partition of a network.

🔗

Prev Top Next