natural-xml-diff

Crates.io	natural-xml-diff
lib.rs	natural-xml-diff
version	0.2.0
created_at	2022-12-19 15:53:36.03268+00
updated_at	2023-01-03 14:44:35.984734+00
description	Natural diffing between XML documents
homepage	https://github.com/faassen/natural-xml-diff
repository	http://github.com/faassen/natural-xml-diff
max_upload_size
id	741383
size	201,038

Martijn Faassen (faassen)

documentation

https://docs.rs/natural-xml-diff

README

natural-xml-diff

The natural-xml-diff crate implements a diffing algorithm that attempts to produce correct and human readable differences between two XML documents.

API Documentation

Algorithm

The algorithm implemented by this library is based on the paper "Bridging the gap between tracking and detecting changes on XML". It is also implemented by the Java-based jndiff library.

Work in progress

This is still a work in progress!

Credits

Paligo

Structural diffing

Let's consider the following XML document, taken from the "Bridging the Gap" paper:

<?xml version="1.0"?>
<book>
  <chapter>
    <title>Text 1</title>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 5</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 11</para>
    <para>Text 12</para>
  </chapter>
</book>

We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":

<?xml version="1.0"?>
<book>
  <chapter>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 25</para>
    <para>Text 11</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 12</para>
  </chapter>
</book>

Let's present both as trees with numbered nodes (the root node, 0, is not shown). Here's document A:

graph TD;
    1[1 book]-->2
	  2[2 chapter]-->3
	  2-->5
	  3[3 title]-->4
	  4[4 Text 1]
	  5[5 para]-->6
	  6[6 Text 2]
	  1-->7
	  7[7 chapter] --> 8
	  8[8 title] --> 9
	  9[9 Text 4]
	  7 --> 10
	  10[10 para] --> 11
	  11[11 Text 5]
	  1 --> 12
	  12[12 chapter] --> 13
	  13[13 title] --> 14
	  14[14 Text 6]
	  12-->15
	  15[15 para] --> 16
	  15 --> 17
	  15 --> 18
	  16[16 Text 7]
	  17[18 img]
	  18[19 Text 8]
	  1 --> 19
	  19[19 chapter]
	  19 --> 20
	  20[20 title] --> 21
	  21[21 Text 9]
	  19 --> 22
	  22[22 para] --> 23
	  23[23 Text 10]
	  1 --> 24
	  24[24 chapter] --> 25
	  25[25 para] --> 26
	  26[26 Text 11]
	  24 --> 27
	  27[27 para] --> 28
	  28[28 Text 12]

Maintaining the tests

Some tests use test_generator to generate tests from the testdata directory. New tests in that directory aren't automatically picked up however; you have to force a recompile of the .rs files that run the tests to do so. You can do this by using a non-significant whitespace edit in each .rs file that uses test_generator and saving. I hope there's a better solution.

Commit count: 0

natural-xml-diff

documentation

README

natural-xml-diff

Algorithm

Work in progress

Credits

Structural diffing

Maintaining the tests

cargo fmt