{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NetworKit Randomization Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The randomization module implements algorithms to perturb existing graphs. This is commonly used to obtain a null-model for hypothesis testing in network analysis (see below for an example). Intuitively one tests whether an observation in original also appears in *similar* graphs. By doing so one can quantify the statistical significance of the observation. The ensemble of *similar* networks is commonly chosen as the set of (simple) graphs that have the same degree sequence. \n", "\n", "Edge Switching is a commonly used Markov Chain approach for this purpose. In every step it selects two edges uniformly at random and exchanges one endpoint of each. While no practical rigorous bounds on the mixing time are known, in literature typically 10 to 100 times the number of edges is used.\n", "\n", "The Curveball algorithms creates a random sample with the same degree sequence, by repeatedly selecting two random nodes and randomly exchanging their neighbourhoods (this is referred to as a trade). This corresponds to a random walk on the space of all graphs in the ensemble. The more trades are carried out, the less correlated input and output are. While there a no practical lower bounds, one typically choose 10 to 100 times the number n of nodes.\n", "\n", "GlobalCurveball is a related algorithm that is typically faster than Curveball. Also it's implementation is more versatile (e.g., it support directed and undirected graphs). In each step it carries out n/2 trades involving all nodes. A typical choice of the number of GlobalTrades is 10 to 100. **We recommend the usage of GlobalCurveball over Curveball for performance reasons**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Global Curveball" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Global Curveball class is an implementation of EM-GCB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades\", Carstens et al., ESA 2018`. The algorithm perturbs an input graph, by iteratively randomizing the neighbourhoods of node pairs. For a large number of global trades this process is shown to produce a uniform sample from the set of all graphs with the same degree sequence as the input graph.\n", "\n", "The `GlobalCurveball(G, number_of_global_rounds=20, allowSelfLoops=False, degreePreservingShufflePreprocessing=True)` constructor expecs a graph, followed by the parameter `number_of_global_rounds` which dictactes the number of global rounds to carry out. To accelerate the perturbation process we recommend to set degreePreservingShufflePreprocessing to true. This jump-starts the randomization process and yields faster convergence of the algorithm (see \"Smaller Universes for Uniform Sampling of 0,1-matrices with fixed row and column sums\" * by Annabell Berger, Corrie Jacobien Carstens [https://arxiv.org/abs/1803.02624]).\n", "For directed simple graphs the algorithm will not yield a uniform sample unless the preprocessing is enabled." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.graphio.readGraph(\"../input/karate.graph\", nk.Format.METIS)\n", "print(G.numberOfNodes(), G.numberOfEdges())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "globalCurve = nk.randomization.GlobalCurveball(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run algorithm\n", "globalCurve.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get randomized graph\n", "randomGlobalG = globalCurve.getGraph()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Verify\n", "print(randomGlobalG.numberOfNodes(), randomGlobalG.numberOfEdges())\n", "assert(randomGlobalG.numberOfNodes() == G.numberOfNodes())\n", "for u in range (randomGlobalG.upperNodeIdBound()):\n", " assert(randomGlobalG.degree(u) == G.degree(u))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Application example\n", "\n", "Our hypothesis is that Hyperbolic Graphs (RHGs) have a high clustering. To test the hypothesis\n", "we first obtain an RHG and compute its local clustering coefficient. Then, we compute five random graphs with the same degree sequence and observe that their mean LCC score is much lower.\n", "This gives empirical support towards our hypothesis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def llcScore(G):\n", " \"\"\"Compute average local clustering coefficient\"\"\"\n", " return sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()\n", "\n", "# Hyperbolic graphs are known to have an above average clustering\n", "G = nk.generators.HyperbolicGenerator(1000, 10).generate()\n", "print(\"Avg. clustering of input: %.3f\" % llcScore(G))\n", "\n", "for i in range(5):\n", " # Take 5 random graphs with the same degree sequence\n", " sampledGraph = nk.randomization.GlobalCurveball(G).run().getGraph()\n", " print(\"Avg. clustering of random sample: %.3f\" % llcScore(sampledGraph))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Curveball" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Curveball class in an implementation of IM-CB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades\", Carstens et al., ESA 2018`. The algorithm perturbs an undirected and unweighted input graph, by iteratively randomizing the neighbourhoods of node pairs.\n", "\n", "The `Curveball(G)` constructor expects an undirected, unweighted graph. This class does not support the `run()` method; it instead provides the `run(trades)` method. The parameter `trades` is a vector of pairs of nodes. The `run` method can be called multiple times.\n", "\n", "Following is an example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.graphio.readGraph(\"../input/celegans_metabolic.thrill\", nk.Format.ThrillBinary)\n", "print(G.numberOfNodes(), G.numberOfEdges())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "curve = nk.randomization.Curveball(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `CurveballUniformTradeGenerator(num_trades, num_nodes)` class can be used to generate trades. `num_trades` is the number of trades to generate while `num_nodes` is the number of node indices to draw from. It generates a trade sequence consisting of `num_trades` many single trades. Each trade contains two different node indices drawn uniformly at random from the interval [0, num_nodes)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate trade sequence\n", "utg = nk.randomization.CurveballUniformTradeGenerator(G.numberOfNodes(), G.numberOfNodes())\n", "trades = utg.generate()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run algorithm\n", "curve.run(trades)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get randomized graph\n", "randomCurveG = curve.getGraph()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Verify\n", "print(randomCurveG.numberOfNodes(), randomCurveG.numberOfEdges())\n", "assert(randomCurveG.numberOfNodes() == G.numberOfNodes())\n", "for u in range (randomCurveG.upperNodeIdBound()):\n", " assert(randomCurveG.degree(u) == G.degree(u))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DegreePreservingShuffle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DegreePreservingShuffle is available as a standalone module. It will relabel nodes while keeping their degrees but wont change the topology of the graph. The constructor `DegreePreservingShuffle(G)` expects a directed or undirected graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate graph\n", "G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialze algorithm\n", "dps = nk.randomization.DegreePreservingShuffle(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run algorithm\n", "dps.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "randomG = dps.getGraph()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Verify\n", "for u in range(G.upperNodeIdBound()):\n", " assert(G.degree(u) == randomG.degree(u))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edge Switching Markov Chain\n", "\n", "Edge Switching Markov Chain [\"The markov chain simulation method for generating connected power law random graphs\", Mihail and Zegura] perturbs simple directed or undirected graphs while preserving the node degrees. In each step, two edges are selected uniformly at random, and two endpoints exchanged. Swaps that introduce multi-edges or self-loops are rejected *without* replacement -- this is necessary to allow uniform sampling [see \"Switching edges to randomize networks: what goes wrong and how to fix it\", Carstens and Horadam]. \n", "\n", "In general, simple edge switching does not yield a uniform distribution on simple **directed** graphs because the orientation of a directed triangles cannot be changed. Using DegreePreservingShuffle as a preprocessing step overcomes this limitation. For convenience this is done by default for any graph to heuristically accelerate mixing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate graph\n", "G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialze algorithm\n", "es = nk.randomization.EdgeSwitching(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run algorithm\n", "es.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "randomG = es.getGraph()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Verify\n", "for u in range(G.upperNodeIdBound()):\n", " assert(G.degree(u) == randomG.degree(u))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inplace Edge Switching\n", "\n", "We offer an in-place variant of `nk.randomization.EdgeSwitching` which is more efficient if the original graph is not needed anymore.\n", "The wrapper also takes ownership of the graph, so it's safe to let the original graph name go out of scope.\n", "\n", "In the following we pick up the previous Local Clustering Coefficient example.\n", "This time, however, we are not interested in the average value, but rather in the progress the made as we carry out an increasing number of Markov Chain steps." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.generators.HyperbolicGenerator(1000, 10).generate()\n", "alg = nk.randomization.EdgeSwitchingInPlace(G, 0.1)\n", "\n", "for i in range(10):\n", " if i > 0: \n", " # do a few switches\n", " alg.run() # this will update G directly\n", " \n", " score = sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()\n", " print(\"After % 5d switches the llc score is %.3f\" % (500 * i, score))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }