{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NetworKit Sparsification Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook covers the NetworKit sparsification module, which provides algorithms to compute edge scores and to sparsify an input graph.\n", "\n", "Sparsification algorithms rely on edge scores, therefore graph edges must be indexed. Call the [indexEdges()](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=indexedges#networkit.Graph.indexEdges) function to do so. \n", "\n", "Every sparsification algorithm computing edge scores in NetworKit provides a `scores()` function that returns the edge attributes maximum parameter value such that the edge is contained in the sparsified graph.\n", "\n", "The [getSparsifiedGraph(G, parameter, attribute)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=getsparsif#networkit.sparsification.Sparsifier.getSparsifiedGraph) or [getSparsifiedGraphOfSize(G, edgeRatio, attribute)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=getsparsif#networkit.sparsification.Sparsifier.getSparsifiedGraphOfSize) functions return the sparsified graph. `parameter` determines the degree of sparsification, while `edgeRatio` is the target edge ratio of the specified graph. `attribute` is an optional parameter representing a previously calculated edge attribute." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "G = nk.readGraph(\"../input/jazz.graph\", nk.Format.METIS)\n", "G.indexEdges()\n", "G.numberOfNodes(), G.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All sparsification algorithms need an `edgeRatio` parameter. We use the same `edgeRatio` in all our examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "targetRatio = 0.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forest Fire " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Forest Fire sparsifier implements a variant of the Forest Fire sparsification approach that is based on random walks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [ForestFireScore(G, pf, tebr)]() constructor expects as inputs a graph, the probability `pf` that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until `tebr * m` edges have been burnt (where `m` is the number of edges of `G`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Initialize the algorithm\n", "ffs = nk.sparsification.ForestFireScore(G, 0.6, 5.0)\n", "\n", "# Run\n", "ffs.run()\n", "\n", "# Get edge scores\n", "attributes = ffs.scores()\n", "for attribute in attributes[:5]:\n", " print(\"{:.3f}\".format(attribute))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sparsification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [ForestFireSparsifier(burnProbability, targetBurntRatio)]() constructor expects as inputs the probability `burnProbability` that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until `targetBurntRatio * m` edges have been burnt." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "fireSparsifier = nk.sparsification.ForestFireSparsifier(0.6, 5.0)\n", "\n", "# Get sparsified graph\n", "fireGraph = fireSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), fireGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Global Threshold Filter " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Global Threshold Filter calculates a sparsified graph by filtering globally using a constant threshold value and a given edge attribute.\n", "\n", "The [GlobalThresholdFilter(G, attribute, e, above)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=globalth#networkit.sparsification.GlobalThresholdFilter) constructor expects as inputs a graph, a list of edge attributes, a threshold value `e` and a boolean value `above`. If `above` is set to `True`, all edges with an attribute value greater than or equal `e` will be kept in the sparsified graph. The `calculate` method returns the sparsified graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Initialize the algorithm\n", "gtf = nk.sparsification.GlobalThresholdFilter(G, attributes, 0.2, False)\n", "\n", "# Run\n", "newG = gtf.calculate()\n", "G.numberOfEdges(), newG.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Local Degree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The local degree sparsification strategy is based on the idea of hub nodes. For each edge of the graph, it determines the maximum parameter value such that the edge is still contained in the sparsified graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [LocalDegreeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=local%20degree#networkit.sparsification.LocalDegreeScore) constructor expects a graph as input." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Initialize the algorithm\n", "lds = nk.sparsification.LocalDegreeScore(G)\n", "\n", "# Run\n", "lds.run()\n", "\n", "# Get edge scores\n", "ldsScores = lds.scores()\n", "for score in ldsScores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": { "scrolled": false }, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Initialize the algorithm\n", "localDegSparsifier = nk.sparsification.LocalDegreeSparsifier()\n", "\n", "# Get sparsified graph\n", "localDegGraph = localDegSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), localDegGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Local Similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [LocalSimilarityScore(G, triangles)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=local#networkit.sparsification.LocalSimilarityScore) constructor expects a graph and previously calculated edge triangle counts of the graph.\n", "\n", "The edge triangles can be computed using the [TriangleEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=triangle#networkit.sparsification.TriangleEdgeScore) algorithm." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Compute triangles in G\n", "e_triangles = nk.sparsification.TriangleEdgeScore(G)\n", "e_triangles.run()\n", "triangles = e_triangles.scores()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Initialize the algorithm\n", "lss = nk.sparsification.LocalSimilarityScore(G, triangles)\n", "\n", "# Run\n", "lss.run()\n", "\n", "# Get edge scores\n", "scores = lss.scores()\n", "for score in scores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Initialize the algorithm\n", "similaritySparsifier = nk.sparsification.LocalSimilaritySparsifier()\n", "\n", "# Get sparsified graph\n", "similarityGraph = similaritySparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), similarityGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random Edge Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This strategy assigns to each edge a random value in [0,1]." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [RandomEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=randomedge#networkit.sparsification.RandomEdgeScore) constructor expects a graph as input." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize\n", "res = nk.sparsification.RandomEdgeScore(G)\n", "\n", "# Run\n", "res.run()\n", "\n", "# Get edge scores\n", "randomEdgeScores = res.scores()\n", "for score in randomEdgeScores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "randomEdgeSparsifier = nk.sparsification.RandomEdgeSparsifier()\n", "\n", "# Get sparsified graph\n", "randomGraph = randomEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), randomGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random Node Edge Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This attributizer returns edge attributes where each value is selected uniformly at random from [0,1]." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [RandomNodeEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=randomnode#networkit.sparsification.RandomNodeEdgeScore) constructor expects a graph as input." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize\n", "rn = nk.sparsification.RandomNodeEdgeScore(G)\n", "\n", "# Run\n", "rn.run()\n", "\n", "# Get edge scores\n", "randomNodeScores = rn.scores()\n", "for score in randomNodeScores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "randomNodeEdgeSparsifier = nk.sparsification.RandomNodeEdgeSparsifier()\n", "\n", "# Get sparsified graph\n", "randomNodeGraph = randomNodeEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), randomNodeGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SCAN Structural Similarity Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This algorithm is a Structural Clustering Algorithm for Networks (SCAN) whose goal is to find clusters, hubs, and outliers in large networks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [SCANStructuralSimilarityScore(G, triangles)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=scan#networkit.sparsification.SCANStructuralSimilarityScore) constructor expects as inputs a graph and previously calculated edge triangle counts of the graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "scan = nk.sparsification.SCANStructuralSimilarityScore(G, triangles)\n", "\n", "# Run\n", "scan.run()\n", "\n", "# Get edge scores\n", "scanScores = scan.scores()\n", "for score in scanScores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize\n", "scanSparsifier = nk.sparsification.SCANSparsifier()\n", "\n", "# Get sparsified graph\n", "scanGraph = scanSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), scanGraph.numberOfEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simmelian Overlap Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an implementation of the parametric variant of Simmelian Backbones. It calculates for each edge the minimum parameter value such that the edge is still contained in the sparsified graph. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge Scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [SimmelianOverlapScore(G, triangles, maxRank)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=simmelian#networkit.sparsification.SimmelianOverlapScore) constructor expects as inputs a graph, triangles and the maximum rank that is considered for overlap calculation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "sos = nk.sparsification.SimmelianOverlapScore(G, triangles, 5)\n", "\n", "# Run\n", "sos.run()\n", "\n", "# Get edge scores\n", "sosScores = sos.scores()\n", "for score in sosScores[:5]:\n", " print(\"{:.3f}\".format(score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sparsification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "simmelianSparsifier = nk.sparsification.SimmelianSparsifierNonParametric()\n", "\n", "# Get sparsified graph\n", "simmelieanGraph = simmelianSparsifier.getSparsifiedGraphOfSize(G, targetRatio)\n", "G.numberOfEdges(), simmelieanGraph.numberOfEdges()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }