# roxmltree parsing strategy
XML parsing is hard. Everyone knows that. But the other problem is that it
can be represented in very different ways:
- You can preserve comment or ignore them completely or partially.
- You can represent text data as a separated node or embed it into the element node.
- You can keep CDATA as a separated node or merge it into the text node.
- You can preserve XML declaration or ignore it completely.
- ... and many more.
This document explains how *roxmltree* parses and represents the XML document.
## XML declaration
[XML declaration](https://www.w3.org/TR/xml/#NT-XMLDecl) is completely ignored.
Mostly because it doesn't contain any valuable information for us.
- `version` is expected to be `1.*`. Otherwise an error will occur.
- `encoding` is irrelevant since we are parsing only valid UTF-8 strings.
- And no one really follow the `standalone` constraints.
## DTD
Only `ENTITY` objects will be resolved. Everything else will be ignored
at the moment.
```xml
text'>
]>
text
``` it will be parsed as `\n␣␣␣␣text\n`. Same goes to an escaped one: ```xmltext
``` it will be parsed as `␣␣text␣␣`. ## CDATA CDATA will be embedded to a text node: ```xmlt x
``` it will be parsed as `te xt`. ## Text Text will be unescaped. All entity references will be resolved. ```xml ]>&b;
``` it will be parsed as `Some text`. ## Attribute-Value Normalization [Attribute-Value Normalization](https://www.w3.org/TR/xml/#AVNormalize) works as explained in the spec. ## Namespaces resolving *roxmltree* has a complete support for XML namespaces.