natural-xml-diff

Crates.ionatural-xml-diff
lib.rsnatural-xml-diff
version0.2.0
sourcesrc
created_at2022-12-19 15:53:36.03268
updated_at2023-01-03 14:44:35.984734
descriptionNatural diffing between XML documents
homepagehttps://github.com/faassen/natural-xml-diff
repositoryhttp://github.com/faassen/natural-xml-diff
max_upload_size
id741383
size201,038
Martijn Faassen (faassen)

documentation

https://docs.rs/natural-xml-diff

README

natural-xml-diff

Crates.io Documentation

The natural-xml-diff crate implements a diffing algorithm that attempts to produce correct and human readable differences between two XML documents.

API Documentation

Algorithm

The algorithm implemented by this library is based on the paper "Bridging the gap between tracking and detecting changes on XML". It is also implemented by the Java-based jndiff library.

Work in progress

This is still a work in progress!

Credits

Paligo

Structural diffing

Let's consider the following XML document, taken from the "Bridging the Gap" paper:

<?xml version="1.0"?>
<book>
  <chapter>
    <title>Text 1</title>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 5</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 11</para>
    <para>Text 12</para>
  </chapter>
</book>

We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":

<?xml version="1.0"?>
<book>
  <chapter>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 25</para>
    <para>Text 11</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 12</para>
  </chapter>
</book>

Let's present both as trees with numbered nodes (the root node, 0, is not shown). Here's document A:

graph TD;
    1[1 book]-->2
	  2[2 chapter]-->3
	  2-->5
	  3[3 title]-->4
	  4[4 Text 1]
	  5[5 para]-->6
	  6[6 Text 2]
	  1-->7
	  7[7 chapter] --> 8
	  8[8 title] --> 9
	  9[9 Text 4]
	  7 --> 10
	  10[10 para] --> 11
	  11[11 Text 5]
	  1 --> 12
	  12[12 chapter] --> 13
	  13[13 title] --> 14
	  14[14 Text 6]
	  12-->15
	  15[15 para] --> 16
	  15 --> 17
	  15 --> 18
	  16[16 Text 7]
	  17[18 img]
	  18[19 Text 8]
	  1 --> 19
	  19[19 chapter]
	  19 --> 20
	  20[20 title] --> 21
	  21[21 Text 9]
	  19 --> 22
	  22[22 para] --> 23
	  23[23 Text 10]
	  1 --> 24
	  24[24 chapter] --> 25
	  25[25 para] --> 26
	  26[26 Text 11]
	  24 --> 27
	  27[27 para] --> 28
	  28[28 Text 12]

Maintaining the tests

Some tests use test_generator to generate tests from the testdata directory. New tests in that directory aren't automatically picked up however; you have to force a recompile of the .rs files that run the tests to do so. You can do this by using a non-significant whitespace edit in each .rs file that uses test_generator and saving. I hope there's a better solution.

Commit count: 0

cargo fmt