{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NetworKit graph I/O tutorial\n", "\n", "This notebook will guide you through reading and writing graphs to files using NetworKit, i.e. the following will be covered,\n", "- Reading and writing graphs graph from a file\n", " - Using the different file formats supported by NetworKit\n", "- Converting a graph to a specific format " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkit as nk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a graph from a file\n", "\n", "NetworKit supports several graph formats which can be found via [graphio.Format](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=format#networkit.graphio.Format).\n", "Although most graphs support both reading and writing, a few do not support both alike. If you will be reading large graphs often, it is recommended to convert your graph to the ```NetworkitBinaryGraph``` format as this is currently the fastest reader available in NetworKit. For information on how to convert graphs between formats in NetworKit, see the section on [converting graphs](#converting)\n", "\n", "### SNAP file format\n", "\n", "The (optional) first line of a file denotes the problem line p <0 or 1-indexed>.\n", "\n", "The problem line is followed by a list of exactly m edges. \n", "The format is `` for a weighted graph, and `` for an unweighted graph.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [SNAPGraphReader(directed = False, remapNodes = True, nodeCount = 0)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=snap#networkit.graphio.SNAPGraphReader) constructor expects 3 optional values, i.e., ```directed``` which is true if the graph is directed, ```remapNodes``` indicates whether nodes should be remapped to other node ids in order to create consecutive node ids and the number of nodes in the graph as ```nodeCount``` which is used to preallocate memory for the number of nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reading a file in SNAP using the default constructor values can done like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/wiki-Vote.txt\", nk.Format.SNAP)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now access the graph object via ```G```. Alternatively, you can explicitly use the ```SNAPGraphReader``` class as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.graphio.SNAPGraphReader().read(\"../input/wiki-Vote.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Passing other values to the ```SNAPGraphReader``` can be done by creating a ```SNAPGraphReader``` object and then calling the ```read``` method on it like is done below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "snapReader = nk.graphio.SNAPGraphReader(True, False, 7115)\n", "G = snapReader.read(\"../input/wiki-Vote.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to write ```G``` to a file, we can use [writeGraph()](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=writegraph#networkit.writeGraph), and pass ```G```, the ```path``` to the file the graph should be written to and the format to the method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/wikiSNAP\", nk.Format.SNAP)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EdgeList file format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [EdgeList](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=edgelist#networkit.Format.EdgeList) file format is a simple format that stores each node's adjacency array in a seperate line. The ```EdgeList``` file format has several variations, all differing in the character used to seperate nodes in an edge list or the ID of the first node. The constructor [EdgeListReader(separator, firstNode, commentPrefix = \"#\", continuous = True, directed = False)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=edgelist#networkit.graphio.EdgeListReader) expects 5 parameters that dictate the exact format of the edge lists. NetworKit provides five standard ```EdgeListReader```s: \n", "\n", "1. ```EdgeListSpaceZero``` with ```seperator``` being a whitespace and ```firstNode```'s ID is 0.\n", "2. ```EdgeListSpaceOne``` with ```seperator``` being a whitespace and ```firstNode```'s ID is 1.\n", "3. ```EdgeListTabZero``` with ```seperator``` being a tab and ```firstNode```'s ID is 0.\n", "4. ```EdgeListTab0ne``` with ```seperator``` being a tab and ```firstNode```'s ID is 1.\n", "5. ```EdgeListCommaOne``` with ```seperator```being a comma and ```firstNode```'s ID is 1.\n", "\n", "Reading can be done in the same way as in the previous example. You can specify a different format for the ```EdgeListReader``` by calling its constructor, and passing the values to it. Assuming we want to use a '$' as a seperator, the first node is 0, and comments are prefixed by a semi-colon, we can do the following:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specify seperator, firstNode and commentPrefix for the EdgeListReader\n", "edgeListReader = nk.graphio.EdgeListReader('$', 0, ';')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reading a file with one of the exisiting ```EdgeListReader```s, e.g. ```EdgeListTabOne``` can be done by calling the ```readGraph``` method and specifying the format as ```EdgeListTabOne``` " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/example.edgelist\", nk.Format.EdgeListTabOne)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can write ```G``` to a file by calling the ```writeGraph()``` method, and passing ```G```, the ```path``` to the file the graph should be written to, and the format to ```writeGraph()```." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "if not os.path.isdir('./output'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G, './output/example.edgelist.TabOne', nk.Format.EdgeListTabOne)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### METIS file format\n", "The [METIS](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=metis#networkit.Format.METIS) format stores a graph of N nodes is stored in a file of N+1 lines. The first line lists the number of nodes and the number of edges seperated by a whitespace. If the first line contains more than two values, the extra values indicate the weights. Each line then contains a node's adjacency list. Comment lines begin with a \"%\" sign. A file in ```METIS``` format can be read using the ```readGraph``` method or by explicitly using the [METISGraphReader](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=metis#networkit.graphio.METISGraphReader) class:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/celegans_metabolic.graph\", nk.Format.METIS)\n", "# Alternative:\n", "metisReader = nk.graphio.METISGraphReader()\n", "G = metisReader.read(\"../input/celegans_metabolic.graph\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Writing a file in ```METIS``` format is the same as for the other formats we have seen so far:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/celegans_metabolicMETIS\", nk.Format.METIS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### GraphML format\n", "\n", "The [GML](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=gml#networkit.graphio.Format.GML) format is an XML-based file format for graphs. For me details, please refer to the [GML format specification.](http://www.fim.uni-passau.de/fileadmin/files/lehrstuhl/brandenburg/projekte/gml/gml-technical-report.pdf) Reading a file in ```GML``` is done in the same way as is reading other formats." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/jazz2_directed.gml\", nk.Format.GML)\n", "# Alternative:\n", "gmlReader = nk.graphio.GMLGraphReader()\n", "G = gmlReader.read(\"../input/jazz2_directed.gml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Writing a file in ```GML``` format is the same as for the other formats we have seen so far:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/jazz2_directedGML\", nk.Format.GML)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### GraphViz/ DOT file format\n", "\n", "NetworKit currently only supports writing of the ```DOT``` file format. More information on the ```DOT``` file format can be found [here](https://www.graphviz.org/doc/info/lang.html). We can read a graph in any format, and then write it in ```DOT``` as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "# Read graph in GML \n", "G = nk.readGraph(\"../input/jazz2_directed.gml\", nk.Format.GML)\n", "\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "# Write G in GraphViz/DOT\n", "nk.writeGraph(G,\"./output/jazz2_directedGraphViz\", nk.Format.GraphViz)\n", "# Write G in DOT\n", "nk.writeGraph(G,\"./output/jazz2_directedDOT\", nk.Format.DOT)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LFR \n", "\n", "Graphs in [LFR](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=lfr#networkit.graphio.Format.LFR) are identical to those in the ```EdgeListTabOne``` format. Therefore, in order to read a graph in ```LFR```, the ```EdgeListTabOne``` reader is used. Refer to the section about the ```EdgeListTabOne``` reader [here.](#edgelist)\n", "Alternatively, you can also read the graph by specifying ```LFR``` as the format. In this case, NetworKit calls the ```EdgeListTabOne```reader internally. The same goes for writing a graph to a file in the ```LFR``` file format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/network_overlapping.dat\", nk.Format.LFR)\n", "\n", "import os\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/network_overlapping.dat\", nk.Format.LFR)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### KONECT file format\n", "\n", "The reader [KONECTGraphReader(remapNodes = False, handlingmethod = DISCARD_EDGES)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=konect#networkit.graphio.KONECTGraphReader) expects two parameters; Node ids are remapped to consecutive ids if```remapNodes```is set to true. If your graph contains multiple edges between nodes, ```handlingmethod``` specifies how NetworKit should handle the multiple edges. ```handlingmethod``` can take any of the following three values:\n", " - DISCARD_EDGES = 0, //Reads and selects the first edge which occurs and discards all following\n", " - SUM_WEIGHTS_UP = 1, //If an edge occurs again, the weight of it is added to the existing edge\n", " - KEEP_MINIUM_WEIGHT = 2 //The edge with the lowest weight is kept\n", "\n", "In order to read a graph with multiple edges in while summing the weights of the multiple edges, you can pass the parameters to the ```KONECTGraphReader``` as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "konectReader = nk.graphio.KONECTGraphReader(True, 1)\n", "G = konectReader.read(\"../input/foodweb-baydry.konect\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetworKit currently only supports reading of the ```KONECT``` file format. If you want to write your graph to a file, you can write the graph in another format of your choice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### GraphToolBinary file format\n", "\n", "The [GraphToolBinaryReader](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=graphtool#networkit.graphio.GraphToolBinaryReader) reads graphs written in the binary format described [here](https://graph-tool.skewed.de/static/doc/gt_format.html). The graph's properties are stored in the file, and therefore, no constructor arguments are passed to the ```reader```. Reading a graph from a file in ```GraphToolBinaryReader```can be done like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/power.gt\", nk.Format.GraphToolBinary)\n", "# Alternative:\n", "graphToolReader = nk.graphio.GraphToolBinaryReader()\n", "G = graphToolReader.read(\"../input/power.gt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When writing the graph to the file, the writer [GraphToolBinaryWriter(littleEndianness = True)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=graphtool#networkit.graphio.GraphToolBinaryWriter) expects a Boolean value indicating the endianness of the machine. Set ```littleEndianness``` to true if you are running a little endian machine. The example below shows how you can pass the endianness to the ```GraphToolBinaryWriter```." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/power.gt\", nk.Format.GraphToolBinary, littleEndianness=True)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ThrillBinary file format\n", "\n", "The [ThrillBinaryReader(n)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=thrill#networkit.graphio.ThrillGraphBinaryReader) reads a graph format consisting of a serialized DIA of vector from the [Thrill format](http://project-thrill.org/). The constructor optionally takes a 64-but unsigned integer ```n``` which is the number of nodes in the graph. Reading is more efficient if the ```ThrillBinaryReader``` knows the number of nodes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/celegans_metabolic.thrill\", nk.Format.ThrillBinary)\n", "# Alternative:\n", "thrillBinaryReader = nk.graphio.ThrillGraphBinaryReader()\n", "G = thrillBinaryReader.read(\"../input/celegans_metabolic.thrill\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the [ThrillBinaryWriter](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=thrill#networkit.graphio.ThrillGraphBinaryWriter), writing is similar to the other writers: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/foodweb-baydry.thrill\", nk.Format.ThrillBinary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NetworkitBinaryGraph file format\n", "\n", "The [NetworkitBinaryGraph](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=binary#networkit.graphio.NetworkitBinaryReader) is a custom binary NetworKit file format for reading and writing graphs. It is not only much faster than existing formats, it is also compressed. The graph properties are stored directly in the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nk.readGraph(\"../input/foodweb-baydry.networkit\", nk.Format.NetworkitBinary)\n", "#Alternative:\n", "networkitBinaryReader = nk.graphio.NetworkitBinaryReader()\n", "G = networkitBinaryReader.read(\"../input/foodweb-baydry.networkit\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [NetworkitBinaryWriter(chunks = 32, weightsType = NetworkitBinaryWeights::AUTO_DETECT)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=binary#networkit.graphio.NetworkitBinaryWriter) constructor takes two optional parameters. The ```NetworkitBinaryWriter``` groups nodes in to ```chunks``` which reduces the space needed to save a graph. Futhermore, it takes the type of weights as an optional parameter. If none is passed, the ```NetworkitBinaryWriter``` detects the type of weights automatically. ```weightsType``` can be any of the following options: \n", " - none = 0, // The graph is not weighted\n", "\t- unsignedFormat = 1, //The weights are unsigned integers\n", "\t- signedFormat = 2, //The weights are signed integers\n", "\t- doubleFormat = 3, //The weights are doubles\n", "\t- floatFormat = 4, //The weights are floats\n", "\t- autoDetect \n", "You can pass the number of chunks and type of weights to the writer as follows(assuming signed weights):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/foodweb-baydry.networkit\", nk.Format.NetworkitBinary, chunks=16, NetworkitBinaryWeights=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert graphs to other formats\n", "Not all graph formats support reading and writing alike, and therefore, one may want to convert a graph to a different format.\n", "For example, if you want to convert `'../input/wiki-Vote.txt'` in the SNAP format to GML and save it in the `./output` directory, you can either use the [convertGraph(fromFormat, toFormat, fromPath, toPath=None)](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=convert#networkit.graphio.convertGraph) function from ```graphio```:\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nk.graphio.convertGraph(nk.Format.SNAP, nk.Format.GML, \"../input/wiki-Vote.txt\", \"output/example.gml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or you can pass the new format to the [writeGraph](https://networkit.github.io/dev-docs/python_api/graphio.html?highlight=writegraph#networkit.graphio.writeGraph) method: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "if not os.path.isdir('./output/'):\n", " os.makedirs('./output')\n", "nk.writeGraph(G,\"./output/example.gml\", nk.Format.GML)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }