{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NetworKit Distance Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetworKit provides several graph traversal and pathfinding algorithms within the `distance` module. This notebook covers most of these algorithms, and shows how to use them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this tutorial we will use the same graph, and the same source and target node. We will indext the edges of the graph because some algorithms require the edges to be indexed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read a graph\n", "G = nk.readGraph(\"../input/foodweb-baydry.konect\", nk.Format.KONECT)\n", "GDir = G\n", "G = nk.graphtools.toUndirected(G)\n", "source = 0\n", "target = 27\n", "G.indexEdges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algebraic Distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Algebraic distance assigns a distance value to pairs of nodes according to their structural closeness in the graph. Algebraic distances will become small within dense subgraphs.\n", "\n", "The [AlgebraicDistance(G, numberSystems=10, numberIterations=30, omega=0.5, norm=0, withEdgeScores=False)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=alg#networkit.distance.AlgebraicDistance) constructor expects a graph followed by the number of systems to use for algebraic iteration and the number of iterations in each system. `omega` is the overrelaxation parameter while `norm` is the norm factor of the extended algebraic distance. Set `withEdgeScores` to true if the array of scores for edges {u,v} that equal ad(u,v) should be calculated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "ad = nk.distance.AlgebraicDistance(G, 10, 100, 1, 1, True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "ad.preprocess()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The algebraic distance between the source and target node\n", "ad.distance(source, target)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All-Pairs Shortest-Paths (APSP)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The APSP algorithm computes all pairwise shortest-path distances in a given graph. It is implemented running Dijkstra’s algorithm from each node, or BFS if the graph is unweighted.\n", "\n", "The constructor [APSP(G)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=apsp#networkit.distance.APSP) expects a graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "apsp = nk.distance.APSP(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "apsp.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(apsp.getDistance(source, target))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pruned Landmark Labeling\n", "\n", "Pruned Landmark Labeling is an alternative to APSP. It computes distance labels by performing a *pruned* BFS from each node in the graph. Distance labels are then used to quickly compute shortest-path distances between node pairs. This algorithm only works for unweighted graphs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm - in case of weighted graphs, edge weights are ignored\n", "pll = nk.distance.PrunedLandmarkLabeling(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run - this step computes the distance labels\n", "pll.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieve the shortest-path distance\n", "print(pll.query(source, target))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some-Pairs Shortest-Paths (SPSP)\n", "\n", "SPSP is an alternative to APSP, it computes the shortest-path distances from a set of user-specified source nodes to all the other nodes of the graph.\n", "\n", "The constructor `SPSP(G, sources` takes as input a graph and a list of source nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "sources = [0, 1, 2]\n", "\n", "spsp = nk.distance.SPSP(G, sources)\n", "\n", "# Run\n", "spsp.run()\n", "\n", "# Print the distances from the selected sources to the target\n", "for source in sources:\n", " print(\"Distance from {:d} to {:d}: {:.3e}\".format(source, target, spsp.getDistance(source, target)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " A* is an informed search algorithm , as it uses information about path cost and also uses heuristics to find the shortest path.\n", "\n", "The [AStar(G, heu, source, target, storePred=True)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=astar#networkit.distance.AStar) constructor expects a graph, the source and target nodes as mandatory parameters. The algorithm will also store the predecessors and reconstruct a shortest path from the source and the target if `storePred` is true. `heu` is a list of lower bounds of the distance of each node to the target." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we do not have any prior knowledge about the graph we choose all zeros as a heuristic because zero is always a lower bound of the distance between two nodes. In this case, the A* algorithm is equivalent to Dijkstra." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "heuristic = [0 for _ in range(G.upperNodeIdBound())]\n", "astar = nk.distance.AStar(G, heuristic, source, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "astar.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(astar.getDistance())\n", "# The path from source to target node\n", "print(astar.getPath())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Breadth-First Search (BFS) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BFS is an algorithm for traversing a graph which starts from the source node `u`, and explores all of the u's neighbors nodes at the present depth before moving on to the nodes at the next depth level. BFS finds the shortest paths from a source to all the reachable nodes of an unweighted graph.\n", "\n", "The [BFS(G, source, storePaths=True, storeNodesSortedByDistance=False, target=none)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=bfs#networkit.distance.BFS) constructor expects a graph and a source node as mandatory parameters. If the paths should be stored, set `storedPaths` to true. If `storeNodesSortedByDistance` is set, a vector of nodes ordered in increasing distance from the source is stored. `target` is the target node." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "bfs = nk.distance.BFS(G, source, True, False, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "bfs.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(bfs.distance(target))\n", "# The number of shortest paths between the source node\n", "print(bfs.numberOfPaths(target))\n", "# Returns a shortest path from source to target\n", "print(bfs.getPath(target))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bidirectional BFS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Bidirectional BFS algorithm explores the graph from both the source and target nodes until the two explorations meet. This version of BFS is more efficient than BFS when the target node is known.\n", "\n", "The [BidirectionalBFS(G, source, target, storePred=True)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=bidirec#networkit.distance.BidirectionalBFS) constructor expects a graph, the source and target nodes as mandatory parameters. The algorithm will also store the predecessors and reconstruct a shortest path from the source and the target if `storePred` is true. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "biBFS = nk.distance.BidirectionalBFS(G, source, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "biBFS.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike BFS, the `getPath` method does not include the source at the beginning, and the target at the end of the the returned list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(biBFS.getHops())\n", "print(biBFS.getPath())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dijkstra " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dijkstra's algorithm finds the shortest path from a source node a target node. This algorithm creates a tree of shortest paths from the source to all other nodes in the graph. Dijkstra's algorithm finds the shortest paths from a source to all the reachable nodes of a weighted graph.\n", "\n", "The [Dijkstra(G, source, storePaths=True, storeNodesSortedByDistance=False, target=none)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=dij#networkit.distance.Dijkstra) constructor expects a graph and a source node as mandatory parameters. If the paths should be stored, set `storedPaths` to true. If `storeNodesSortedByDistance` is set, a vector of nodes ordered in increasing distance from the source is stored. `target` is the target node." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "dijkstra = nk.distance.Dijkstra(G, source, True, False, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "dijkstra.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(dijkstra.distance(target))\n", "# The number of shortest paths between the source node\n", "print(dijkstra.numberOfPaths(target))\n", "# Returns a shortest path from source to target\n", "print(dijkstra.getPath(target))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bidirectional Dijkstra" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Bidirectional Dijkstra algorithm explores the graph from both the source and target nodes until the two explorations meet. This version of Dijkstra is more efficient than the convential Dijkstra when the target node is known.\n", "\n", "The [BidirectionalDijkstra(G, source, target, storePred=True)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=bidirec#networkit.distance.BidirectionalDijkstra) constructor expects a graph, the source and target nodes as mandatory parameters. The algorithm will also store the predecessors and reconstruct a shortest path from the source and the target if `storePred` is true." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "biDij = nk.distance.BidirectionalDijkstra(G, source, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "biDij.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike Dijkstra, the `getPath` method does not include the source at the beginning, and the target at the end of the the returned list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(biDij.getDistance())\n", "# The path from source to target node\n", "print(biDij.getPath())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Commute Time Distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This class computes the Euclidean Commute Time Distance between each pair of nodes for an undirected unweighted graph.\n", "\n", "The [CommuteTimeDistance(G, tol=0.1)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=commute#networkit.distance.CommuteTimeDistance) constructor expects a graph as a mandatory parameter. The optional parameter `tol` is the tolerance parameter used for approximation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "ctd = nk.distance.CommuteTimeDistance(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "ctd.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(ctd.distance(source, target))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one wants to compute the commute time distance between two nodes, then they should use [runSinglePair(u, v)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=runsingle#networkit.distance.CommuteTimeDistance.runSinglePair) method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ctd.runSinglePair(source,target)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diameter " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This algorithm gives an estimation of the diameter of a given graph. The algorithm is based on the ExactSumSweep algorithm presented in Michele Borassi, Pierluigi Crescenzi, Michel Habib, Walter A. Kosters, Andrea Marino, Frank W. Takes: http://www.sciencedirect.com/science/article/pii/S0304397515001644.\n", "\n", "The [Diameter(G, algo=DiameterAlgo.AUTOMATIC, error=1.0, nSamples=0)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=diameter#networkit.distance.Diameter) constructor expects a graph as mandatory parameter. `algo` specifies the choice of diameter algorithm while `error` is the maximum allowed relative error. Set to 0 for the exact diameter. `nSamples`is the number of samples to be used. `algo` can be chosen between from \n", " 0. automatic\n", " 1. exact\n", " 2. estimatedRange\n", " 3. estimatedSamples\n", " 4. estimatedPedantic\n", "\n", "Note that the input graph must be connected, otherwise the resulting diameter will be infinite. As the graph we are using is not connected, we shall extract the largest connected component from it and then compute the diameter of the resulting graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Extract largest connect component\n", "newGraph = nk.components.ConnectedComponents.extractLargestConnectedComponent(G, True)\n", "newGraph.numberOfNodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm to compute the exact diameter of the input graph\n", "diam = nk.distance.Diameter(newGraph,algo=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "diam.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get diameter of graph\n", "diam.getDiameter()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The return value of `getDiameter` is a pair of integers, i.e., the lower bound and upper bound of the diameter. In the case, that we computed the exact diameter, the diameter is the first value of the pair." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eccentricity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The eccentricity of a node `u` is defined as the distance to the farthest node from node u. In other words, it is the longest shortest-path starting from node `u`.\n", "\n", "The eccentricity of a graph can be computed by calling the [getValue(G, v)]() method, and passing a graph and a node. The method returns the node farthest from v, and the length of the shortest path between `v` and the farthest node." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "nk.distance.Eccentricity.getValue(G, source)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Effective Diameter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effective diameter is defined as the number of edges on average to reach a given ratio of all other nodes.\n", "\n", "The [EffectiveDiameter(G, ratio=0.9)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=effective#networkit.distance.EffectiveDiameter) constructor expects an undirected graph and the ratio of nodes that should be connected. The ratio must be between in the interval (0,1]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "ed = nk.distance.EffectiveDiameter(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "ed.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get effective diameter\n", "ed.getEffectiveDiameter()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Effective Diameter Approximation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This class approximates the effective diameter according to the algorithm presented in the \"A Fast and Scalable Tool for Data Mining in Massive Graphs\" by [Palmer, Gibbons and Faloutsos](http://www.cs.cmu.edu/~christos/PUBLICATIONS/kdd02-anf.pdf).\n", "\n", "The [EffectiveDiameter(G, ratio=0.9, k=64, r=7)]() constructor expects an undirected graph, the ratio of nodes that should be connected, the number of parallel approximations `k` to get a more robust results, and the number of bits `r` that should be added to the bitmask. The more bits are added to the bitmask, the higher the accuracy. The ratio must be between in the interval (0,1]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "eda = nk.distance.EffectiveDiameterApproximation(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "eda.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get effective diameter\n", "eda.getEffectiveDiameter()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reverse BFS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This class does a reverse breadth-first search (following the incoming edges of a node) on a directed graph from a given source node.\n", "\n", "The [ReverseBFS(G, source, storePaths=True, storeNodesSortedByDistance=False, target=none)](https://networkit.github.io/dev-docs/python_api/distance.html?highlight=bfs#networkit.distance.BFS) constructor expects a graph and a source node as mandatory parameters. If the paths should be stored, set `storedPaths` to true. If `storeNodesSortedByDistance` is set, a vector of nodes ordered in increasing distance from the source is stored. `target` is the target node." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize algorithm\n", "rbfs = nk.distance.ReverseBFS(G, source, True, False, target)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run\n", "rbfs.run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The distance from source to target node\n", "print(rbfs.distance(target))\n", "# The number of shortest paths between source and target\n", "print(rbfs.numberOfPaths(target))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }