{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Link Prediction with NetworKit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Link prediction is concerned with estimating the probability of the existence of edges between nodes in a graph. The `linkprediction` module in NetworKit provides sampling algorithms as well link prediction algorithms.\n", "\n", "This notebook introduces a several link prediction algorithms available in NetworKit. It shows how to calculate link prediction measures, and how to use the sampling algorithms in combination with link prediction algorithms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Link prediction algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adamic/Adar Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Adamic/Adar Index predicts links in a graph according to the amount of shared links between two nodes. The index sums up the reciprocals of the logarithm of the degree of all common neighbors between two nodes `u` and `v`.\n", "\n", "The constructor, [AdamicAdarIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=adamic#networkit.linkprediction.AdamicAdarIndex) expects a graph as input. The `run(u, v)` method takes a pair of nodes `(u, v)` as input and returns the Adamic/Adar Index of the given pair of nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.graphio.readGraph(\"../input/karate.graph\", nk.Format.METIS)\n", "\n", "# Initialize algorithm\n", "aai = nk.linkprediction.AdamicAdarIndex(G)\n", "\n", "# Get Adamic/Adar Index of two nodes\n", "aai.run(14, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algebraic Distance Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Algebraic Distance Index assigns a distance value to pairs of nodes according to their structural closeness in the graph.\n", "\n", "The constructor [AlgebraicDistanceIndex(G, numberSystems, numberIterations, omega=0.5, norm= 2)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=algeb#networkit.linkprediction.AlgebraicDistanceIndex) expects as inputs a graph, the number of systems to use for algebraic iteration, and the number of iterations in each system. `omega` is the over-relaxation parameter and `norm` is the norm factor of the extended algebraic distance. Maximum norm is realized by setting `norm` to 0.\n", "\n", "After initialization, call the `preprocess()` method. Afterwards, call the `run` method: it takes a pair of nodes `(u, v)` as input and returns the Algebraic Distance Index of the given pair of nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "adi = nk.linkprediction.AlgebraicDistanceIndex(G, 30, 200)\n", "adi.preprocess()\n", "\n", "# Get the algebraic distance index of two nodes\n", "adi.run(1, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Common Neighbors Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Common Neighbors Index calculates the number of common neighbors of a pair of nodes in a graph. \n", "\n", "The constructor [CommonNeighborsIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=common#networkit.linkprediction.CommonNeighborsIndex), expects a graph as input. The `run(u, v)` method takes as input a pair of nodes `(u, v)` and returns the number of common neighbors between `u` and `v`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "cni = nk.linkprediction.CommonNeighborsIndex(G)\n", "\n", "# Calculate common neighbors between two nodes\n", "cni.run(14, 15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Neighbors Measure Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Neighbors Measure Index returns the number of connections between neighbors of the given nodes `u` and `v`.\n", "\n", "The constructor [NeighborsMeasureIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=neighborsme#networkit.linkprediction.NeighborsMeasureIndex) expects a graph as input. The `run(u, v)` takes a pair of nodes `(u, v)` as input and returns the neighbors measure index between `u` and `v`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "nmi = nk.linkprediction.NeighborsMeasureIndex(G)\n", "\n", "# Calculate the neighbors measure index between two nodes\n", "nmi.run(14, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preferential Attachment Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Preferential Attachment Index suggests that the more connected a node is, the more likely it is to receive new links.\n", "\n", "The constructor [PreferentialAttachmentIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=preferential#networkit.linkprediction.PreferentialAttachmentIndex) expects a graph as input. The `run(u, v)` method takes a pair of nodes `(u, v)` as input and returns the product of the cardinalities of the neighborhoods of nodes `u` and `v`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "pai = nk.linkprediction.PreferentialAttachmentIndex(G)\n", "\n", "# Calculate the preferential attachment index between two nodes\n", "pai.run(14, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resource Allocation Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The constructor [ResourceAllocationIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=resource#networkit.linkprediction.ResourceAllocationIndex) expects a graph as input. The `run(u, v)` method takes a pair of nodes `(u, v)` as input and returns the Resource Allocation Index between `u` and `v`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "rai = nk.linkprediction.ResourceAllocationIndex(G)\n", "\n", "# Calculate the resource allocation index between two nodes\n", "rai.run(14, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Same Community Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Same Community Index determines whether two nodes `u` and `v` are in the same community.\n", "\n", "The constructor [SameCommunityIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=samecommunity#networkit.linkprediction.SameCommunityIndex) expects a graph as input. The `run(u, v)` method takes a pair of nodes `(u, v)` as input and returns `1` if `u` and `v` are in the same community, `0` otherwise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "sni = nk.linkprediction.SameCommunityIndex(G)\n", "\n", "# Compute the Same Community Index between two pairs of nodes\n", "print(sni.run(14, 32))\n", "print(sni.run(0, 32))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Total Neighbors Index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Total Neighbors Index returns the number of nodes in the neighborhood-union of nodes `u` and `v`.\n", "\n", "The constructor [TotalNeighborsIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=totalneighb#networkit.linkprediction.TotalNeighborsIndex) expects a graph as input. The `run(u, v)` method takes a pair of nodes `(u, v)` as input and returns the Total Neighbors Index between `u` and `v`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize the algorithm\n", "tni = nk.linkprediction.TotalNeighborsIndex(G)\n", "\n", "# Calculate the Total Neighbors Index between two nodes\n", "tni.run(14, 32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Link sampling and link prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section shows how to use the training algorithms in combination with link prediction algorithms. In this example, we use the Random Link Sampler, the Missing Links Finder and the Katz Index algorithms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Katz index assigns a similarity score to a pair of nodes. This score is based on the weighted sum of the number of paths with length $l$, where $l$ is smaller than a given limit.\n", "\n", "[KatzIndex(G=None, maxPathLength=5, dampingValue=0.005)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=katzindex#networkit.linkprediction.KatzIndex) takes as inputs a graph (optional), the maximum length of paths to consider, and the damping value.\n", " \n", "[RandomLinkSampler(G, numLinks)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=randomlinksampler#networkit.linkprediction.RandomLinkSampler) provides methods to randomly sample a number of edges from a given graph. `numLinks` is the number of edges the returned graph should have.\n", "\n", "[MissingLinksFinder(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=missing#networkit.linkprediction.MissingLinksFinder) finds the missing edges in the given graph. The `findAtDistance(k)` function returns all missing links in the graph. The absent links to find are narrowed down by providing a distance that the nodes of the missing links should have. For example in case of distance 2 only node-pairs that would close a triangle in the given graph get returned." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.graphio.readGraph(\"../input/jazz.graph\", nk.Format.METIS)\n", "\n", "# Sample graph\n", "trainingGraph = nk.linkprediction.RandomLinkSampler.byPercentage(G, 0.7)\n", "\n", "# Find missing links\n", "missingLinks = nk.linkprediction.MissingLinksFinder(trainingGraph).findAtDistance(5)\n", "\n", "# Run link prediticion\n", "predictions = nk.linkprediction.KatzIndex(G).runOn(missingLinks)\n", "\n", "# Print the first 5 predictions\n", "for p in predictions[:5]:\n", " print(p)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }