Why a new format ?
==================

Working on libzim I discover few "mistakes" that not ease the reading or
creation process.

- Dirent has no size information. The size of a dirent depends of the size of
  the url and the title, and there is no size information in the header.
  So you have to parse a the dirent to know its size. You cannot read directly
  the title because you don't where it is (you have to search for '\\0', to know
  the end of the url)
- Cluster has no size information. You cannot now directly the size of a
  cluster. For an uncompressed cluster you can find the size quite easily has the
  header is not compressed. But for compressed cluster, you have to decompress the
  data (and you don't know the size of the compressed data, nor the uncompressed
  one) to be able to read the data.
- At creation, the size of the "Header"'s datas is not known before you know all
  the content in the zim file. So you cannot start to write the content directly
  in the zim file. You have to write things in temporary file and keep data
  structure in memory. And so you cannot create big zim file on computer with
  small ram.

We also want to do a series of improvement in the zim format :

- No more namespace. The separation between the article namespace (A) and
  Image (I) is totally useless. The (B) namespace is not used at all.
  Only the metadata (M) namespace is really use.
  The (X) namespace for index is only used by only one article (xapian database).
  It could be merge somewhere else, in the M namespace or directly in the header.
  See https://github.com/openzim/libzim/issues/15
- We want content signing. See https://github.com/openzim/libzim/issues/40
- Category handling. See https://github.com/openzim/libzim/issues/75
- We want to be able to split zim files efficiently.
- We want to have zim extensions. Having a small "base" zim file we may want to
  have extension to new content. Image is the base zim file is without image.
  Or new articles if the base zim is a selection of articles.
- We may want to have different kind of extensions. Low and high resolution
  image.
- We want to handle zim update. New version of a zim file could come as an
  update to a previous zim. This way, we avoid to the user to download all the
  content again.
- Zim update should be easily doable. When displaying a wikimedia content, a
  client application may allow the user to change the content of an article
  (as wikipedia does), and store the change as a zim update.

While all this improvement concerns the kiwix usage, I also want to explore new
use case of an advanced archive format. For example:

- Classical file system archive
- Backup
- Software distribution
- Packaging
- ...

This work is made independently from kiwix or openzim organization.
For now this is more an essay than a real project to implement this.
It may change in the future but for now there is absolutely no plan nor promise
that I (or other) will implement this format.