Chapter 26. Hierarchical random graphs

1. Introduction
2. Representing HRGs
3. Fitting HRGs
4. HRG sampling
5. Conversion to and from igraph graphs
6. Predicting missing edges

1.  Introduction

A hierarchical random graph is an ensemble of undirected graphs with n vertices. It is defined via a binary tree with n leaf and n-1 internal vertices, where the internal vertices are labeled with probabilities. The probability that two vertices are connected in the random graph is given by the probability label at their closest common ancestor.

Please read the following two articles for more about hierarchical random graphs: A. Clauset, C. Moore, and M.E.J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98 - 101 (2008); and A. Clauset, C. Moore, and M.E.J. Newman. Structural Inference of Hierarchies in Networks. In E. M. Airoldi et al. (Eds.): ICML 2006 Ws, Lecture Notes in Computer Science 4503, 1-13. Springer-Verlag, Berlin Heidelberg (2007).

igraph contains functions for fitting HRG models to a given network (igraph_hrg_fit), for generating networks from a given HRG ensemble (igraph_hrg_game, igraph_hrg_sample), converting an igraph graph to a HRG and back (igraph_hrg_create, igraph_hrg_dendrogram), for calculating a consensus tree from a set of sampled HRGs (igraph_hrg_consensus) and for predicting missing edges in a network based on its HRG models (igraph_hrg_predict).

The igraph HRG implementation is heavily based on the code published by Aaron Clauset, at his website, http://tuvalu.santafe.edu/~aaronc/hierarchy/

2. Representing HRGs

2.1. igraph_hrg_t — Data structure to store a hierarchical random graph

typedef struct igraph_hrg_t {
    igraph_vector_t left, right, prob, edges, vertices;
} igraph_hrg_t;

A hierarchical random graph (HRG) can be given as a binary tree, where the internal vertices are labeled with real numbers.

Note that you don't necessarily have to know this internal representation for using the HRG functions, just pass the HRG objects created by one igraph function, to another igraph function.

It has the following members:

Values: 

left:

Vector that contains the left children of the internal tree vertices. The first vertex is always the root vertex, so the first element of the vector is the left child of the root vertex. Internal vertices are denoted with negative numbers, starting from -1 and going down, i.e. the root vertex is -1. Leaf vertices are denoted by non-negative number, starting from zero and up.

right:

Vector that contains the right children of the vertices, with the same encoding as the left vector.

prob:

The connection probabilities attached to the internal vertices, the first number belongs to the root vertex (i.e. internal vertex -1), the second to internal vertex -2, etc.

edges:

The number of edges in the subtree below the given internal vertex.

vertices:

The number of vertices in the subtree below the given internal vertex, including itself.

2.2. igraph_hrg_init — Allocate memory for a HRG.

int igraph_hrg_init(igraph_hrg_t *hrg, int n);

This function must be called before passing an igraph_hrg_t to an igraph function.

Arguments: 

hrg:

Pointer to the HRG data structure to initialize.

n:

The number of vertices in the graph that is modeled by this HRG. It can be zero, if this is not yet known.

Returns: 

Error code.

Time complexity: O(n), the number of vertices in the graph.

2.3. igraph_hrg_destroy — Deallocate memory for an HRG.

void igraph_hrg_destroy(igraph_hrg_t *hrg);

The HRG data structure can be reinitialized again with an igraph_hrg_destroy call.

Arguments: 

hrg:

Pointer to the HRG data structure to deallocate.

Time complexity: operating system dependent.

2.4. igraph_hrg_size — Returns the size of the HRG, the number of leaf nodes.

int igraph_hrg_size(const igraph_hrg_t *hrg);

Arguments: 

hrg:

Pointer to the HRG.

Returns: 

The number of leaf nodes in the HRG.

Time complexity: O(1).

2.5. igraph_hrg_resize — Resize a HRG.

int igraph_hrg_resize(igraph_hrg_t *hrg, int newsize);

Arguments: 

hrg:

Pointer to an initialized (see igraph_hrg_init) HRG.

newsize:

The new size, i.e. the number of leaf nodes.

Returns: 

Error code.

Time complexity: O(n), n is the new size.

3. Fitting HRGs

3.1. igraph_hrg_fit — Fit a hierarchical random graph model to a network

int igraph_hrg_fit(const igraph_t *graph,
                   igraph_hrg_t *hrg,
                   igraph_bool_t start,
                   int steps);

Arguments: 

graph:

The igraph graph to fit the model to. Edge directions are ignored in directed graphs.

hrg:

Pointer to an initialized HRG, the result of the fitting is stored here. It can also be used to pass a HRG to the function, that can be used as the starting point of the Markov Chain Monte Carlo fitting, if the start argument is true.

start:

Logical, whether to start the fitting from the given HRG.

steps:

Integer, the number of MCMC steps to take in the fitting procedure. If this is zero, then the fitting stop is a convergence criteria is fulfilled.

Returns: 

Error code.

Time complexity: TODO.

3.2. igraph_hrg_consensus — Calculate a consensus tree for a HRG.

int igraph_hrg_consensus(const igraph_t *graph,
                         igraph_vector_t *parents,
                         igraph_vector_t *weights,
                         igraph_hrg_t *hrg,
                         igraph_bool_t start,
                         int num_samples);

The calculation can be started from the given HRG (hrg), or (if start is false), a HRG is first fitted to the given graph.

Arguments: 

graph:

The input graph.

parents:

An initialized vector, the results are stored here. For each vertex, the id of its parent vertex is stored, or -1, if the vertex is the root vertex in the tree. The first n vertex ids (from 0) refer to the original vertices of the graph, the other ids refer to vertex groups.

weights:

Numeric vector, counts the number of times a given tree split occured in the generated network samples, for each internal vertices. The order is the same as in parents.

hrg:

A hierarchical random graph. It is used as a starting point for the sampling, if the start argument is true. It is modified along the MCMC.

start:

Logical, whether to use the supplied HRG (in hrg) as a starting point for the MCMC.

num_samples:

The number of samples to generate for creating the consensus tree.

Returns: 

Error code.

Time complexity: TODO.

4. HRG sampling

4.1. igraph_hrg_sample — Sample from a hierarchical random graph model

int igraph_hrg_sample(const igraph_t *input_graph,
                      igraph_t *sample,
                      igraph_vector_ptr_t *samples,
                      igraph_integer_t no_samples,
                      igraph_hrg_t *hrg,
                      igraph_bool_t start);

Sample from a hierarchical random graph ensemble. The ensemble can be given as a graph (input_graph), or as a HRG object (hrg). If a graph is given, then first an MCMC optimization is performed to find the optimal fitting model; then the MCMC is used to sample the graph(s).

Arguments: 

input_graph:

An igraph graph, or a null pointer. If not a null pointer, then a HRG is first fitted to the graph, possibly starting from the given HRG, if the start argument is true. If is is a null pointer, then the given HRG is used as a starting point, to find the optimum of the Markov chain, before the sampling.

sample:

Pointer to an uninitialized graph, or a null pointer. If only one sample is requested, and it is not a null pointer, then the sample is stored here.

samples:

An initialized vector of pointers. If more than one samples are requested, then they are stored here. Note that to free this data structure, you need to call igraph_destroy() on each graph first, then igraph_free() on all pointers, and finally igraph_vector_ptr_destroy.

no_samples:

The number of samples to generate.

hrg:

A HRG. It is modified during the sampling.

start:

Logical, whether to start the MCMC from the given HRG.

Returns: 

Error code.

Time complexity: TODO.

4.2. igraph_hrg_game — Generate a hierarchical random graph

int igraph_hrg_game(igraph_t *graph,
                    const igraph_hrg_t *hrg);

This function is a simple shortcut to igraph_hrg_sample. It creates a single graph, from the given HRG.

Arguments: 

graph:

Pointer to an uninitialized graph, the new graph is created here.

hrg:

The hierarchical random graph model to sample from. It is modified during the MCMC process.

Returns: 

Error code.

Time complexity: TODO.

5. Conversion to and from igraph graphs

5.1. igraph_hrg_dendrogram — Create a dendrogram from a hierarchical random graph.

int igraph_hrg_dendrogram(igraph_t *graph,
                          const igraph_hrg_t *hrg);

Creates the igraph graph equivalent of an igraph_hrg_t data structure.

Arguments: 

graph:

Pointer to an uninitialized graph, the result is stored here.

hrg:

The hierarchical random graph to convert.

Returns: 

Error code.

Time complexity: O(n), the number of vertices in the graph.

5.2. igraph_hrg_create — Create a HRG from an igraph graph.

int igraph_hrg_create(igraph_hrg_t *hrg,
                      const igraph_t *graph,
                      const igraph_vector_t *prob);

Arguments: 

hrg:

Pointer to an initialized igraph_hrg_t. The result is stored here.

graph:

The igraph graph to convert. It must be a directed binary tree, with n-1 internal and n leaf vertices. The root vertex must have in-degree zero.

prob:

The vector of probabilities, this is used to label the internal nodes of the hierarchical random graph. The values corresponding to the leaves are ignored.

Returns: 

Error code.

Time complexity: O(n), the number of vertices in the tree.

6. Predicting missing edges

6.1. igraph_hrg_predict — Predict missing edges in a graph, based on HRG models

int igraph_hrg_predict(const igraph_t *graph,
                       igraph_vector_t *edges,
                       igraph_vector_t *prob,
                       igraph_hrg_t *hrg,
                       igraph_bool_t start,
                       int num_samples,
                       int num_bins);

Samples HRG models for a network, and estimated the probability that an edge was falsely observed as non-existent in the network.

Arguments: 

graph:

The input graph.

edges:

The list of missing edges is stored here, the first two elements are the first edge, the next two the second edge, etc.

prob:

Vector of probabilies for the existence of missing edges, in the order corresponding to edges.

hrg:

A HRG, it is used as a starting point if start is true. It is also modified during the MCMC sampling.

start:

Logical, whether to start the MCMC from the given HRG.

num_samples:

The number of samples to generate.

num_bins:

Controls the resolution of the edge probabilities. Higher numbers result higher resolution.

Returns: 

Error code.

Time complexity: TODO.