{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Group Centrality Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook covers the group centrality algorithms implemented in NetworKit. Group centrality measures the relative importance of a group of nodes within a graph. \n",
    "\n",
    "The documentation of NetworKit group centrality algorithms is available in the [centrality](https://networkit.github.io/dev-docs/python_api/centrality.html) module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import networkit as nk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Read graph\n",
    "G = nk.readGraph(\"../input/karate.graph\", nk.Format.METIS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group Betweenness"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Group betweenness measures the importance of a group $S$ as the fraction of shortest paths connecting pairs of non-group members that pass through $S$.\n",
    "\n",
    "The constructor [ApproxGroupBetweenness(G, groupSize, epsilon)](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=approxgr#networkit.centrality.ApproxGroupBetweenness) expects as inputs an undirected graph, the desired group size `groupSize`, and the accuracy of the approximation `epsilon`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Initalize algorithm\n",
    "btwn = nk.centrality.ApproxGroupBetweenness(G, 10, 0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Run \n",
    "btwn.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After running the algorithm, we can extract some information about the group betweenness centrality, e.g. the group with the highest approximate betweenness centrality score. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Group with highest approximate betweenness centrality\n",
    "btwn.groupMaxBetweenness()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One may also have a group of nodes and be interested in the group's betweeness centrality score. The function [scoreOfGroup(group, normalized=False)]() returns the betweenness centrality score of the list of nodes in `group`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Create an arbitrary group of nodes\n",
    "group = [7, 8, 11, 20]\n",
    "\n",
    "# Get betweenness centrality score of group\n",
    "btwn.scoreOfGroup(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group Closeness"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GroupCloseness measures the importance of a group $S$ as the inverse of the sum of the distances from all the nodes outside $S$ to their nearest node in $S$.\n",
    "\n",
    "In NetworKit, GroupCloseness implements an heuristic greedy algorithm to compute a group of nodes with high group closeness centrality. The algorithm was introduced in \"[Scaling up Group Closeness Maximization](https://arxiv.org/abs/1710.01144)\" Bergamini et al., ALENEX 2018.\n",
    "\n",
    "The constructor [GroupCloseness(G, k=1, H=0)](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=group#networkit.centrality.GroupCloseness) expects an unweighted graph and the desired group size `k`. If `H > 0`, all BFSs will explore the graph up to distance `H`. This can speed up the algorithm, at the cost of a lower solution quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initalize algorithm\n",
    "close = nk.centrality.GroupCloseness(G, 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run\n",
    "close.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[groupMaxCloseness()](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=groupma#networkit.centrality.GroupCloseness.groupMaxCloseness) returns the group of `k` nodes computed by the algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Computed group of nodes\n",
    "close.groupMaxCloseness()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[scoreOfGroup(group)](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=scoreof#networkit.centrality.GroupCloseness.scoreOfGroup) takes a group of nodes as input, and returns the group's closeness centrality score. The score is returned as a value between 0 and 1: a score of 1 indicates maximum group closeness centrality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an arbitrary group of nodes\n",
    "group = [7, 8, 11, 20]\n",
    "\n",
    "# Get closeness centrality score of group\n",
    "close.scoreOfGroup(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Local Search for Group Closeness\n",
    "\n",
    "A faster local search algorithm for group closeness is the Grow-Shrink algorithm introduced in [Local Search for Group Closeness Maximization on Big Graphs](https://ieeexplore.ieee.org/document/9006206), Angriman et al., IEEE BigData 2019.\n",
    "\n",
    "The algorithm starts from an arbitrary group of vertices and performs local improvements until a local optimum is reached. You need to pass as input a graph and an initial set of nodes. Further, you can specify whether or not to use the extended version of the algorithm (use the `extended` parameter) and how many insertions/removals to perform at each iteration (`insertion` parameter). See the paper for further details about those parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def closenessOfGroup(group):\n",
    "    distances = []\n",
    "    nk.traversal.Traversal.BFSfrom(G, group, lambda u, dist: distances.append(dist))\n",
    "    return 1. / sum(distances)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a random group of vertices\n",
    "k, group = 5, set()\n",
    "while (len(group) < k):\n",
    "    group.add(nk.graphtools.randomNode(G))\n",
    "\n",
    "print(\"Initial group\", list(group), \"has score {:.3f}\".format(closenessOfGroup(group)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs = nk.centrality.GroupClosenessGrowShrink(G, group).run()\n",
    "group = gs.groupMaxCloseness()\n",
    "print(\"Final group\", group, \"has score {:.3f}\".format(closenessOfGroup(group)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The LS-Restrict algorithm for group closeness only allows swaps between vertices in the group and their neighbors outside. Because it considers less candidates for a swap it is faster than Grow-Shrink, but might yield results with lower quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a random group of vertices\n",
    "k, group = 5, set()\n",
    "while (len(group) < k):\n",
    "    group.add(nk.graphtools.randomNode(G))\n",
    "\n",
    "print(\"Initial group\", list(group), \"has score {:.3f}\".format(closenessOfGroup(group)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs = nk.centrality.GroupClosenessLocalSwaps(G, group).run()\n",
    "group = gs.groupMaxCloseness()\n",
    "print(\"Final group\", group, \"has score {:.3f}\".format(closenessOfGroup(group)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a random group of vertices\n",
    "k, group = 5, set()\n",
    "while (len(group) < k):\n",
    "    group.add(nk.graphtools.randomNode(G))\n",
    "\n",
    "print(\"Initial group\", list(group), \"has score {:.3f}\".format(close.scoreOfGroup(group)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs = nk.centrality.GroupClosenessLocalSearch(G, group).run()\n",
    "group = gs.groupMaxCloseness()\n",
    "print(\"Final group\", group, \"has score {:.3f}\".format(close.scoreOfGroup(group)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`GroupClosenessLocalSearch` evaluates all possible single-swaps between vertices in the group and the vertices outside. Its time complexity is $O(k \\cdot (n - k))$, compared to the aforementioned algorithms for group closeness it is slower, but it yields groups with higher group closeness score."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group Harmonic Closeness\n",
    "\n",
    "Group Harmonic Closeness interprets the importance of a group $S$ as the harmonic sum of the distances from the nodes outside $S$ to their nearest node in $S$.\n",
    "\n",
    "In NetworKit, GroupHarmonicCloseness implements an heuristic approximation algorithm for group harmonic closeness centrality. The algorithm was introduced in \"[Group-Harmonic and Group-Closeness Maximization – Approximation and Engineering](https://epubs.siam.org/doi/10.1137/1.9781611976472.12)\" Angriman et al., ALENEX 2021."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a random group of vertices\n",
    "k, group = 5, set()\n",
    "while (len(group) < k):\n",
    "    group.add(nk.graphtools.randomNode(G))\n",
    "\n",
    "print(\"Initial group\", list(group), \"has score {:.3f}\".format(close.scoreOfGroup(group)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hClose = nk.centrality.GroupHarmonicCloseness(G, 10).run()\n",
    "hClose.groupMaxHarmonicCloseness()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group Degree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Group Degree measures the importance of a group of nodes $S$ as the number of non group members that can be reached from $S$ in one hop. The algorithm implemented in NetworKit is a greedy algorithm that find an approximation of the group with the highest group degree centrality.\n",
    "\n",
    "The constructor [GroupDegree(G, k = 1, countGroupNodes = True)](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=groupdegree#networkit.centrality.GroupDegree)  expects a graph, the desired group size `k`, and a boolean `countGroupNodes` stating if nodes inside the group should be counted in the centrality score. By default, the nodes are included in the centrality score.\n",
    "Note that, in order to have a fixed $(1 - 1/e)$ approximation guarantee, `countGroupNodes` must be set to `False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Initalize algorithm\n",
    "degree = nk.centrality.GroupDegree(G, 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run\n",
    "degree.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[groupMaxDegree()](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=groupma#networkit.centrality.GroupDegree.groupMaxDegree) returns the group of `k` nodes computed by the algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Group with high degree centrality\n",
    "degree.groupMaxDegree()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[scoreOfGroup(group)](https://networkit.github.io/dev-docs/python_api/centrality.html?highlight=scoreof#networkit.centrality.GroupDegree.scoreOfGroup) takes a group of nodes as input parameter, and returns the group's degree centrality score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an arbitrary group of nodes\n",
    "group = [7, 8, 11, 20]\n",
    "\n",
    "# Get degree centrality score of group\n",
    "degree.scoreOfGroup(group)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}