Making Connections: Networks {#Networks}

Sanjiv R. Das

Networks are beautiful

Small Worlds

http://academic.research.microsoft.com

Graphs

What is a graph? It is a picture of a network, a diagram consisting of relationships between entities. We call the entities as vertices or nodes (set $V$) and the relationships are called the edges of a graph (set $E$). Hence a graph $G$ is defined as

\begin{equation} G = (V,E) \end{equation}

Types of graphs

If the edges $e \in E$ of a graph are not tipped with arrows implying some direction or causality, we call the graph an "undirected" graph. If there are arrows of direction then the graph is a "directed" graph.

If the connections (edges) between vertices $v \in V$ have weights on them, then we call the graph a "weighted graph" else it's "unweighted". In an unweighted graph, for any pair of vertices $(u,v)$, we have

\begin{equation} w(u,v) = \left\{ \begin{array}{ll} w(u,v) = 1, & \mbox{ if } (u,v) \in E \\ w(u,v) = 0, & \mbox{ if } (u,v) \ni E \end{array} \right. \end{equation}

In a weighted graph the value of $w(u,v)$ is unrestricted, and can also be negative.

Directed graphs can be cyclic or acyclic. In a cyclic graph there is a path from a source node that leads back to the node itself. Not so in an acyclic graph. The term dag is used to connote a "directed acyclic graph". The binomial option pricing model in finance that you have learnt is an example of a dag.

Adjacency Matrix

A graph may be represented by its adjacency matrix. This is simply the matrix $A = \{w(u,v)\}, \forall u,v$. You can take the transpose of this matrix as well, which in the case of a directed graph will simply reverse the direction of all edges.

While it was long believed that the majority of networks are scale-free, recent evidence from a large-scale study shows that this is not necessarily the case, see @BroidoClauset. Of course, the debate rages on.^[See: https://www.quantamagazine.org/scant-evidence-of-power-laws-found-in-real-world-networks-20180215/.]

And, a 53-Year-Old Network Coloring Conjecture Is Disproved: https://www.quantamagazine.org/mathematician-disproves-hedetniemis-graph-theory-conjecture-20190617/

igraph package

Graph Attributes

Dijkstra's Shortest Paths Algorithm

This is one of the most well-known algorithms in theoretical computer science. Given a source vertex on a weighted, directed graph, it finds the shortest path to all other nodes from source $s$. The weight between two vertices is denoted $w(u,v)$ as before. Dijkstra's algorithm works for graphs where $w(u,v) \geq 0$. For negative weights, there is the Bellman-Ford algorithm.

For an excellent exposition of the algorithm in copious detail, see: https://medium.com/basecs/finding-the-shortest-path-with-a-little-help-from-dijkstra-613149fbdc8e

The algorithm is as follows.

D3 plots

D3 is a well known framework for plotting spring graphs. The following plot shows how one may use javascript in R, using the html widgets framework. See: http://www.htmlwidgets.org/

Centrality

Centrality is a property of vertices in the network. Given the adjacency matrix $A=\{w(u,v)\}$, we can obtain a measure of the "influence" of all vertices in the network. Let $x_i$ be the influence of vertex $i$. Then the column vector $x$ contains the influence of each vertex. What is influence? Think of a web page. It has more influence the more links it has both, to the page, and from the page to other pages. Or think of a alumni network. People with more connections have more influence, they are more "central".

It is possible that you might have no connections yourself, but are connected to people with great connections. In this case, you do have influence. Hence, your influence depends on your own influence and that which you derive through others. Hence, the entire system of influence is interdependent, and can be written as the following matrix equation

\begin{equation} x = A\;x \end{equation}

Now, we can just add a scalar here to this to get

\begin{equation} \xi \; x = A x \end{equation}

an eigensystem. Decompose this to get the principle eigenvector, and its values give you the influence of each member. In this way you can find the most influential people in any network. There are several applications of this idea to real data. This is eigenvector centrality is exactly what Google trademarked as PageRank, even though they did not invent eigenvector centrality.

Betweenness

Another concept of centrality is known as "betweenness". Betweenness centrality of a node $v$ is the sum of the fraction of all-pairs shortest paths that pass through $v$. This is the proportion of shortest paths that go through a node relative to all paths that go through the same node, summed over all start and end nodes for the paths passing through $v$. This may be expressed as

\begin{equation} B(v) = \sum_{a \neq v \neq b} \left[\frac{n_{a,b}(v)}{n_{a,b}}\right] \end{equation}

where $n_{a,b}$ is the number of shortest paths from node $a$ to node $b$, and $n_{a,b}(v)$ are the number of those paths that traverse through vertex $v$. Here is an example from an earlier directed graph.

See: https://neo4j.com/docs/graph-algorithms/current/algorithms/betweenness-centrality/

References:

Communities

Community detection methods partition nodes into clusters that tend to interact together. It is useful to point out the considerable flexibility and realism built into the definition of our community clusters. We do not require all nodes to belong to communities. Nor do we fix the number of communities that may exist at a time, and we also allow each community to have different size.

With this flexibility, the key computational challenge is to find the "best" partition because the number of possible partitions of the nodes is extremely large. Community detection methods attempt to determine a set of clusters that are internally tight-knit. Mathematically, this is equivalent to finding a partition of clusters to maximize the observed number of connections between cluster members minus what is expected conditional on the connections within the cluster, aggregated across all clusters. More formally, we choose partitions with high modularity $Q$, where

\begin{equation} Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{d_i \times d_j}{2m} \right] \cdot \delta(i,j) \end{equation}

$A_{ij}$ is the $(i,j)$-th entry in the adjacency matrix, i.e., the number of connections in which $i$ and $j$ jointly participated, $d_i=\sum_j A_{ij}$ is the total number of transactions that node $i$ participated in (or, the degree of $i$) and $m = \frac{1}{2} \sum_{ij} A_{ij}$ is the sum of all edge weights in matrix $A$.

The function $\delta(i,j)$ is an indicator equal to 1.0 if nodes $i$ and $j$ are from the same community, and zero otherwise. $Q$ is bounded in [-1, +1]. If $Q > 0$, intra-community connections exceed the expected number given deal flow.

Consider a network of five nodes $\{A,B,C,D,E\}$, where the edge weights are as follows: $A:B=6$, $A:C=5$, $B:C=2$, $C:D=2$, and $D:E=10$. Assume that a community detection algorithm assigns $\{A,B,C\}$ to one community and $\{D,E\}$ to another, i.e., only two communities. The adjacency matrix for this graph is given by matrix $A$ below.

Using NetworkX in Python

An alternate to the igraph package is the networkx package, more popular among python users.

https://networkx.github.io/

Undirected graphs

Directed Graphs

Degree, Eigenvalue, Betweenness Centrality

Financial Applications

Risk Networks

Example

Overall Risk Score

Risk Decomposition

Centrality

Risk Increment

Criticality

Cross Risk

Risk Scaling: Spillovers

Risk Scaling with Increased Connectivity

Too Big To Fail

The change in risk score ${S}$ as the number of nodes increases, while keeping the average number of connections between nodes constant. This mimics the case where banks are divided into smaller banks, each of which then contains part of the transacting volume of the previous bank. The plot shows how the risk score increases as the number of nodes increases from 10 to 100, while expected number of total edges in the network remains the same. A compromise vector is also generated with equally likely values $\{0,1,2\}$. This is repeated 5000 times for each fixed number of nodes and the mean risk score across 5000 simulations.

Systemic Risk in Indian Banks

Systemic Risk Portals

http://www.systemic-risk.org/

http://www.systemic-risk-hub.org/risk_centers.php

Shiny application

The example above may also be embedded in a shiny application for which the code is provided below. The screen will appear as follows.

The files below also require the data file systemicR.csv or an upload.