{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Community Detection with NetworKit " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we will cover some community detection algorithms implemented in the `community` module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `community` module provides a top-level function, [detectCommunities(G, algo=None, inspect=True)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=detect#networkit.community.detectCommunities) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the `algo` parameter, community detection is performed using the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) algorithm.\n", "\n", "This function can be used as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.readGraph(\"../input/karate.graph\", nk.Format.METIS)\n", "communities = nk.community.detectCommunities(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following sections cover two popular community detection algorithms, `PLM` and `PLP`, and will illustrate how to use them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PLM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) class. It yields a high-quality solution at reasonably fast running times. The constructor `PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True)` expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. If the parameter `refine` is set to true, the algorithm performs a second move phase to refine the communities. The parameter `gamma` defines the multi-resolution modularity parameter. The string `par` defines the openmp parallelization strategy. `maxIter` is the maximum number of iterations for move phase. When `turbo` is set to true, the algorithm is faster but uses O(n) additional memory per thread. Set `recurse`to true in order to use recursive coarsening. Refer to [this]( http://journals.aps.org/pre/abstract/10.1103/PhysRevE.89.049902) for more details on recursive coarsening.\n", "\n", "In the example below we run PLM with `refine` set to true while leaving the rest of the parameters to their default values.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Choose and initialize algorithm \n", "plmCommunities = nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of the `detectCommunities` function is a partition of the nodes of the graph. It is represented by the [Partition](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=partition#networkit.Partition) data structure, which provides several methods for inspecting and manipulating a partition of a set of elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"{0} elements assigned to {1} subsets\".format(plmCommunities.numberOfElements(),\n", " plmCommunities.numberOfSubsets()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"the biggest subset has size {0}\".format(max(plmCommunities.subsetSizes())))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The contents of a partition object can be written to file in a simple format, in which the `i`-th line contains an integer representing the subset id of node `i`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nk.community.writeCommunities(plmCommunities, \"output/communtiesPLM.partition\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PLP " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, [PLP(G, updateThreshold=none, maxIterations=none)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plp#networkit.community.PLP). The constructor expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. The parameter `updateThreshold` dictates the number of nodes that have to be changed in each iteration so that a new iteration starts, and `maxIterations` is the maximum number of iterations. `none` is NetworKit constant set to the maximum value of a 64-bit integer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read graph\n", "G = nk.readGraph(\"../input/jazz.graph\", nk.Format.METIS)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Choose and initialize algorithm \n", "plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"{0} elements assigned to {1} subsets\".format(plpCommunities.numberOfElements(),\n", " plpCommunities.numberOfSubsets()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"the biggest subset has size {0}\".format(max(plpCommunities.subsetSizes())))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nk.community.writeCommunities(plpCommunities, \"output/communtiesPLP.partition\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }