Chapter 19. Graph motifs, dyad census and triad census

1. igraph_dyad_census — Calculating the dyad census as defined by Holland and Leinhardt.
2. igraph_triad_census — Triad census, as defined by Davis and Leinhardt
3. Finding triangles
4. Graph motifs

This section deals with functions which find small induced subgraphs in a graph. These were first defined for subgraphs of two and three vertices by Holland and Leinhardt, and named dyad census and triad census.

1. igraph_dyad_census — Calculating the dyad census as defined by Holland and Leinhardt.

int igraph_dyad_census(const igraph_t *graph, igraph_integer_t *mut,
                       igraph_integer_t *asym, igraph_integer_t *null);

Dyad census means classifying each pair of vertices of a directed graph into three categories: mutual (there is at least one edge from a to b and also from b to a); asymmetric (there is at least one edge either from a to b or from b to a, but not the other way) and null (no edges between a and b in either direction).

Holland, P.W. and Leinhardt, S. (1970). A Method for Detecting Structure in Sociometric Data. American Journal of Sociology, 70, 492-513.

Arguments: 

graph:

The input graph. For an undirected graph, there are no asymmetric connections.

mut:

Pointer to an integer, the number of mutual dyads is stored here.

asym:

Pointer to an integer, the number of asymmetric dyads is stored here.

null:

Pointer to an integer, the number of null dyads is stored here. In case of an integer overflow (i.e. too many null dyads), -1 will be returned.

Returns: 

Error code.

See also: 

Time complexity: O(|V|+|E|), the number of vertices plus the number of edges.

2. igraph_triad_census — Triad census, as defined by Davis and Leinhardt

int igraph_triad_census(const igraph_t *graph, igraph_vector_t *res);

Calculating the triad census means classifying every triple of vertices in a directed graph. A triple can be in one of 16 states:

003

A, B, C, the empty graph.

012

A->B, C, a graph with a single directed edge.

102

A<->B, C, a graph with a mutual connection between two vertices.

021D

A<-B->C, the binary out-tree.

021U

A->B<-C, the binary in-tree.

021C

A->B->C, the directed line.

111D

A<->B<-C.

111U

A<->B->C.

030T

A->B<-C, A->C.

030C

A<-B<-C, A->C.

201

A<->B<->C.

120D

A<-B->C, A<->C.

120U

A->B<-C, A<->C.

120C

A->B->C, A<->C.

210

A->B<->C, A<->C.

300

A<->B<->C, A<->C, the complete graph.

See also Davis, J.A. and Leinhardt, S. (1972). The Structure of Positive Interpersonal Relations in Small Groups. In J. Berger (Ed.), Sociological Theories in Progress, Volume 2, 218-251. Boston: Houghton Mifflin.

This function calls igraph_motifs_randesu() which is an implementation of the FANMOD motif finder tool, see igraph_motifs_randesu() for details. Note that the order of the triads is not the same for igraph_triad_census() and igraph_motifs_randesu().

Arguments: 

graph:

The input graph. A warning is given for undirected graphs, as the result is undefined for those.

res:

Pointer to an initialized vector, the result is stored here in the same order as given in the list above. Note that this order is different than the one used by igraph_motifs_randesu().

Returns: 

Error code.

See also: 

Time complexity: TODO.

3. Finding triangles

3.1. igraph_adjacent_triangles — Count the number of triangles a vertex is part of.

int igraph_adjacent_triangles(const igraph_t *graph,
                              igraph_vector_t *res,
                              const igraph_vs_t vids);

Arguments: 

graph:

The input graph. Edge directions and multiplicities are ignored.

res:

Initiliazed vector, the results are stored here.

vids:

The vertices to perform the calculation for.

Returns: 

Error mode.

See also: 

igraph_list_triangles() to list them.

Time complexity: O(d^2 n), d is the average vertex degree of the queried vertices, n is their number.

3.2. igraph_list_triangles — Find all triangles in a graph.

int igraph_list_triangles(const igraph_t *graph,
                          igraph_vector_int_t *res);

Arguments: 

graph:

The input graph, edge directions are ignored. Multiple edges are ignored.

res:

Pointer to an initialized integer vector, the result is stored here, in a long list of triples of vertex ids. Each triple is a triangle in the graph. Each triangle is listed exactly once.

Returns: 

Error code.

See also: 

igraph_transitivity_undirected() to count the triangles, igraph_adjacent_triangles() to count the triangles a vertex participates in.

Time complexity: O(d^2 n), d is the average degree, n is the number of vertices.

4. Graph motifs

4.1. igraph_motifs_randesu — Count the number of motifs in a graph.

int igraph_motifs_randesu(const igraph_t *graph, igraph_vector_t *hist,
                          int size, const igraph_vector_t *cut_prob);

Motifs are small weakly connected induced subgraphs of a given structure in a graph. It is argued that the motif profile (i.e. the number of different motifs in the graph) is characteristic for different types of networks and network function is related to the motifs in the graph.

This function is able to find directed motifs of sizes three and four and undirected motifs of sizes three to six (i.e. the number of different subgraphs with three to six vertices in the network).

In a big network the total number of motifs can be very large, so it takes a lot of time to find all of them, a sampling method can be used. This function is capable of doing sampling via the cut_prob argument. This argument gives the probability that a branch of the motif search tree will not be explored. See S. Wernicke and F. Rasche: FANMOD: a tool for fast network motif detection, Bioinformatics 22(9), 1152--1153, 2006 for details. https://doi.org/10.1093/bioinformatics/btl038

Set the cut_prob argument to a zero vector for finding all motifs.

Directed motifs will be counted in directed graphs and undirected motifs in undirected graphs.

Arguments: 

graph:

The graph to find the motifs in.

hist:

The result of the computation, it gives the number of motifs found for each isomorphism class. See igraph_isoclass() for help about isomorphism classes. Note that this function does not count isomorphism classes that are not connected and will report NaN (more precisely IGRAPH_NAN) for them.

size:

The size of the motifs to search for. For directed graphs, only 3 and 4 are implemented, for undirected, 3 to 6. The limitation is not in the motif finding code, but the graph isomorphism code.

cut_prob:

Vector of probabilities for cutting the search tree at a given level. The first element is the first level, etc. Supply all zeros here (of length size) to find all motifs in a graph.

Returns: 

Error code.

See also: 

igraph_motifs_randesu_estimate() for estimating the number of motifs in a graph, this can help to set the cut_prob parameter; igraph_motifs_randesu_no() to calculate the total number of motifs of a given size in a graph; igraph_motifs_randesu_callback() for calling a callback function for every motif found; igraph_subisomorphic_lad() for finding subgraphs on more than 4 (directed) or 6 (undirected) vertices.

Time complexity: TODO.

Example 19.1.  File examples/simple/igraph_motifs_randesu.c

#include <igraph.h>

/* This is a callback function suitable for use with igraph_motifs_randesu_callback().
 * It prints each motif it is calld with. */
igraph_bool_t print_motif(const igraph_t *graph, igraph_vector_t *vids,
                          int isoclass, void* extra) {
    printf("Found isoclass %2d:  ", isoclass);
    igraph_vector_print(vids);
    return 0; /* Return 'false': do not interrupt the search. */
}

int main() {

    igraph_t graph;
    igraph_vector_t hist;
    igraph_real_t zeros[] = { 0.0, 0.0, 0.0, 0.0, 0.0 };
    igraph_vector_t cut_prob;

    /* Compute the 4-motif distritbuion in Zachary's karate club network. */

    igraph_famous(&graph, "Zachary");
    igraph_vector_init(&hist, 0);

    igraph_motifs_randesu(&graph, &hist, 4, igraph_vector_view(&cut_prob, zeros, 4));

    /* Compute the total number of motifs (connected 4-vertex subgraphs)
     * so that we can print the normalized distribution. */
    igraph_real_t sum = 0.0;
    igraph_integer_t n = igraph_vector_size(&hist);
    for (igraph_integer_t i=0; i < n; i++) {
        if (!igraph_is_nan(VECTOR(hist)[i])) {
            sum += VECTOR(hist)[i];
        }
    }
    printf("4-motif distribution:\n");
    for (igraph_integer_t i=0; i < n; i++) {
        /* Print NaN values in a platform-independent manner: */
        igraph_real_printf(VECTOR(hist)[i] / sum);
        printf(" ");
    }
    printf("\n\n");

    igraph_vector_destroy(&hist);
    igraph_destroy(&graph);

    /* Identify the vertices of each three-motif in a small Kautz graph. */

    igraph_kautz(&graph, 2, 1);
    igraph_motifs_randesu_callback(&graph, 3, igraph_vector_view(&cut_prob, zeros, 3), &print_motif, NULL);
    igraph_destroy(&graph);

    return 0;
}


4.2. igraph_motifs_randesu_no — Count the total number of motifs in a graph.

int igraph_motifs_randesu_no(const igraph_t *graph, igraph_integer_t *no,
                             int size, const igraph_vector_t *cut_prob);

This function counts the total number of motifs in a graph, i.e. the number of of (weakly) connected triplets or quadruplets, without assigning isomorphism classes to them.

Arguments: 

graph:

The graph object to study.

no:

Pointer to an integer type, the result will be stored here.

size:

The size of the motifs to count.

cut_prob:

Vector giving the probabilities that a branch of the search tree will be cut at a given level.

Returns: 

Error code.

See also: 

Time complexity: TODO.

4.3. igraph_motifs_randesu_estimate — Estimate the total number of motifs in a graph.

int igraph_motifs_randesu_estimate(const igraph_t *graph, igraph_integer_t *est,
                                   int size, const igraph_vector_t *cut_prob,
                                   igraph_integer_t sample_size,
                                   const igraph_vector_t *parsample);

This function estimates the total number of weakly connected induced subgraphs, called motifs, of a fixed number of vertices. For example, an undirected complete graph on n vertices will have one motif of size n, and n motifs of size n - 1. As another example, one triangle and a separate vertex will have zero motifs of size four.

This function is useful for large graphs for which it is not feasible to count all the different motifs, because there are very many of them.

The total number of motifs is estimated by taking a sample of vertices and counts all motifs in which these vertices are included. (There is also a cut_prob parameter which gives the probabilities to cut a branch of the search tree.)

Directed motifs will be counted in directed graphs and undirected motifs in undirected graphs.

Arguments: 

graph:

The graph object to study.

est:

Pointer to an integer type, the result will be stored here.

size:

The size of the motifs to look for.

cut_prob:

Vector giving the probabilities to cut a branch of the search tree and omit counting the motifs in that branch. It contains a probability for each level. Supply size zeros here to count all the motifs in the sample.

sample_size:

The number of vertices to use as the sample. This parameter is only used if the parsample argument is a null pointer.

parsample:

Either pointer to an initialized vector or a null pointer. If a vector then the vertex ids in the vector are used as a sample. If a null pointer then the sample_size argument is used to create a sample of vertices drawn with uniform probability.

Returns: 

Error code.

See also: 

Time complexity: TODO.

4.4. igraph_motifs_randesu_callback — Finds motifs in a graph and calls a function for each of them.

int igraph_motifs_randesu_callback(const igraph_t *graph, int size,
                                   const igraph_vector_t *cut_prob, igraph_motifs_handler_t *callback,
                                   void* extra);

Similarly to igraph_motifs_randesu(), this function is able to find directed motifs of sizes three and four and undirected motifs of sizes three to six (i.e. the number of different subgraphs with three to six vertices in the network). However, instead of counting them, the function will call a callback function for each motif found to allow further tests or post-processing.

The cut_prob argument also allows sampling the motifs, just like for igraph_motifs_randesu(). Set the cut_prob argument to a zero vector for finding all motifs.

Arguments: 

graph:

The graph to find the motifs in.

size:

The size of the motifs to search for. Only three and four are implemented currently. The limitation is not in the motif finding code, but the graph isomorphism code.

cut_prob:

Vector of probabilities for cutting the search tree at a given level. The first element is the first level, etc. Supply all zeros here (of length size) to find all motifs in a graph.

callback:

A pointer to a function of type igraph_motifs_handler_t. This function will be called whenever a new motif is found.

extra:

Extra argument to pass to the callback function.

Returns: 

Error code.

Time complexity: TODO.

Example 19.2.  File examples/simple/igraph_motifs_randesu.c

#include <igraph.h>

/* This is a callback function suitable for use with igraph_motifs_randesu_callback().
 * It prints each motif it is calld with. */
igraph_bool_t print_motif(const igraph_t *graph, igraph_vector_t *vids,
                          int isoclass, void* extra) {
    printf("Found isoclass %2d:  ", isoclass);
    igraph_vector_print(vids);
    return 0; /* Return 'false': do not interrupt the search. */
}

int main() {

    igraph_t graph;
    igraph_vector_t hist;
    igraph_real_t zeros[] = { 0.0, 0.0, 0.0, 0.0, 0.0 };
    igraph_vector_t cut_prob;

    /* Compute the 4-motif distritbuion in Zachary's karate club network. */

    igraph_famous(&graph, "Zachary");
    igraph_vector_init(&hist, 0);

    igraph_motifs_randesu(&graph, &hist, 4, igraph_vector_view(&cut_prob, zeros, 4));

    /* Compute the total number of motifs (connected 4-vertex subgraphs)
     * so that we can print the normalized distribution. */
    igraph_real_t sum = 0.0;
    igraph_integer_t n = igraph_vector_size(&hist);
    for (igraph_integer_t i=0; i < n; i++) {
        if (!igraph_is_nan(VECTOR(hist)[i])) {
            sum += VECTOR(hist)[i];
        }
    }
    printf("4-motif distribution:\n");
    for (igraph_integer_t i=0; i < n; i++) {
        /* Print NaN values in a platform-independent manner: */
        igraph_real_printf(VECTOR(hist)[i] / sum);
        printf(" ");
    }
    printf("\n\n");

    igraph_vector_destroy(&hist);
    igraph_destroy(&graph);

    /* Identify the vertices of each three-motif in a small Kautz graph. */

    igraph_kautz(&graph, 2, 1);
    igraph_motifs_randesu_callback(&graph, 3, igraph_vector_view(&cut_prob, zeros, 3), &print_motif, NULL);
    igraph_destroy(&graph);

    return 0;
}


4.5. igraph_motifs_handler_t — Callback type for igraph_motifs_randesu_callback

typedef igraph_bool_t igraph_motifs_handler_t(const igraph_t *graph,
        igraph_vector_t *vids,
        int isoclass,
        void* extra);

igraph_motifs_randesu_callback() calls a specified callback function whenever a new motif is found during a motif search. This callback function must be of type igraph_motifs_handler_t. It has the following arguments:

Arguments: 

graph:

The graph that that algorithm is working on. Of course this must not be modified.

vids:

The IDs of the vertices in the motif that has just been found. This vector is owned by the motif search algorithm, so do not modify or destroy it; make a copy of it if you need it later.

isoclass:

The isomorphism class of the motif that has just been found. Use igraph_isoclass or igraph_isoclass_subgraph to find out which isomorphism class belongs to a given motif.

extra:

The extra argument that was passed to igraph_motifs_randesu_callback().

Returns: 

A logical value, if TRUE (=non-zero), that is interpreted as a request to stop the motif search and return to the caller.

See also: