Crates.io | natural-xml-diff |
lib.rs | natural-xml-diff |
version | 0.2.0 |
source | src |
created_at | 2022-12-19 15:53:36.03268 |
updated_at | 2023-01-03 14:44:35.984734 |
description | Natural diffing between XML documents |
homepage | https://github.com/faassen/natural-xml-diff |
repository | http://github.com/faassen/natural-xml-diff |
max_upload_size | |
id | 741383 |
size | 201,038 |
The natural-xml-diff
crate implements a diffing algorithm that attempts to
produce correct and human readable differences between two XML documents.
The algorithm implemented by this library is based on the paper "Bridging the gap between tracking and detecting changes on XML". It is also implemented by the Java-based jndiff library.
This is still a work in progress!
Let's consider the following XML document, taken from the "Bridging the Gap" paper:
<?xml version="1.0"?>
<book>
<chapter>
<title>Text 1</title>
<para>Text 2</para>
</chapter>
<chapter>
<title>Text 4</title>
<para>Text 5</para>
</chapter>
<chapter>
<title>Text 6</title>
<para>Text 7<img/>Text 8</para>
</chapter>
<chapter>
<title>Text 9</title>
<para>Text 10</para>
</chapter>
<chapter>
<para>Text 11</para>
<para>Text 12</para>
</chapter>
</book>
We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":
<?xml version="1.0"?>
<book>
<chapter>
<para>Text 2</para>
</chapter>
<chapter>
<title>Text 4</title>
<para>Text 25</para>
<para>Text 11</para>
</chapter>
<chapter>
<title>Text 6</title>
<para>Text 7<img/>Text 8</para>
</chapter>
<chapter>
<title>Text 9</title>
<para>Text 10</para>
</chapter>
<chapter>
<para>Text 12</para>
</chapter>
</book>
Let's present both as trees with numbered nodes (the root node, 0, is not shown). Here's document A:
graph TD;
1[1 book]-->2
2[2 chapter]-->3
2-->5
3[3 title]-->4
4[4 Text 1]
5[5 para]-->6
6[6 Text 2]
1-->7
7[7 chapter] --> 8
8[8 title] --> 9
9[9 Text 4]
7 --> 10
10[10 para] --> 11
11[11 Text 5]
1 --> 12
12[12 chapter] --> 13
13[13 title] --> 14
14[14 Text 6]
12-->15
15[15 para] --> 16
15 --> 17
15 --> 18
16[16 Text 7]
17[18 img]
18[19 Text 8]
1 --> 19
19[19 chapter]
19 --> 20
20[20 title] --> 21
21[21 Text 9]
19 --> 22
22[22 para] --> 23
23[23 Text 10]
1 --> 24
24[24 chapter] --> 25
25[25 para] --> 26
26[26 Text 11]
24 --> 27
27[27 para] --> 28
28[28 Text 12]
Some tests use test_generator
to generate tests from the testdata
directory.
New tests in that directory aren't automatically picked up however; you have
to force a recompile of the .rs
files that run the tests to do so. You can
do this by using a non-significant whitespace edit in each .rs
file that
uses test_generator
and saving. I hope there's a better solution.