Crates.io | litua |
lib.rs | litua |
version | 2.0.0 |
source | src |
created_at | 2023-01-27 10:39:58.507073 |
updated_at | 2023-04-12 09:18:56.58043 |
description | Read a text document, receive its tree in Lua and manipulate it before representing it as string |
homepage | |
repository | https://github.com/typho/litua |
max_upload_size | |
id | 769541 |
size | 1,429,172 |
Read a text document, receive its tree in Lua and manipulate it before representing it as string.
Text documents occur in many contexts. Actually, we like them as a simple means to document ideas and concepts. They help us communicate. But sometimes, we want to transform them to other text formats or process its content. litua helps with that in a particular way.
You can write a text document like this:
In olden times when wishing still helped one, there lived a king whose daughters were all beautiful; and the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face.
But this text is boring. You usually care about markup. Markup are special instructions which annotate text:
In olden times when wishing still helped one, there lived a {bold king} whose daughters were all {italic beautiful}; and the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face.
In this case, the text {bold X}
and {italic Y}
has some special meaning. For example, it could mean that the text is represented with a special style (e.g. X in a bold font and Y in cursive script). In general, we define litua input syntax in the following manner:
{element[attr1=value1][attribute2=val2] text content of element}
And finally, I will tell you a secret: value1, val2, and text content of element need not be text, but can also be an element itself. Thus, the following is permitted in litua input syntax:
{bold[font-face=Bullshit Sans] {italic Blockchain managed information density}}
In this sense, litua input syntax is very similar to XML (<element attr1="value1" attribute2="val2">text content of element</element>
), LISP (e.g. (element :attr1 "value1" :attribute2 "val2" "text content of element")
), and markup languages in general. By the way, if you literally need a {
or }
in your document, you can escape these semantics by writing {left-curly-brace}
or {right-curly-brace}
respectively instead. litua input syntax files must always be encoded in UTF-8.
Let us put the element-example in litua input syntax into a text document (doc.lit
). Then we can invoke litua
:
bash$ litua doc.lit
The output is in the file with extension out
: doc.out
. And it is super-boring: It is exactly the input:
bash$ cat doc.out
{element[attr1=value1][attribute2=val2] text content of element}
It becomes interesting, if I tell you that there is a representation of this element in Lua:
local node = {
-- the string giving the node type
["call"] = "element",
-- the key-value pairs of arguments.
-- values are sequences of strings or nodes
["args"] = { ["attr1"] = { [1] = "value1" }, ["attribute2"] = { [1] = "val2" } },
-- the sequence of elements occuring in the body of a node.
-- the items of content can be strings or nodes themselves
["content"] = {
[1] = "text content of element"
},
}
For example, node.call
allows you to access the name of the markup element. node.content[1]
allows you to access the string which is the first and only content member of element
in Lua. Remember that in Lua, the first element in a collection type is stored at index 1 (not 0 as in the majority of programming languages).
Now create a Lua file hooks.lua
in the same directory (the name must start with hooks
and must end with .lua
) with the following content:
Litua.convert_node_to_string("element", function (node)
return "The " .. tostring(node.call) .. " said: " .. tostring(node.content[1])
end)
Now let us invoke litua
again:
bash$ litua doc.lit
[…]
bash$ cat doc.out
The element said: text content of element
Wow, we just modified the behavior how to process the document 😍
In fact, we used a concept called hook to modify the behavior. We register a hook with convert_node_to_string
to trigger the hook whenever litua tries to convert a node to a string. A hook is a Lua function. Let us read the Lua syntax:
Litua.convert_node_to_string("element", function (node)
return "The " .. tostring(node.call) .. " said: " .. tostring(node.content[1])
end)
Litua.convert_node_to_string
is a function, which is defined by litua whenever you run litua
."element"
, which tells litua when to call the second argument.function
and ends with the keyword end
. This is the hook. It is a function and takes one argument called node
. It can run arbitrary code and specifically it returns a string in the end which is built from the data in the node
variable. ..
is the string concatenation operator in Lua and tostring
is a builtin Lua function which converts any value into a string object.The complete set of hooks is given here:
Litua.on_setup
Litua.global
variable as you need it Litua.modify_initial_string
on_setup
and meant to optionally pre-process the source code of the text document Litua.read_new_node
Litua.modify_node
read_new_node
and allows you to actually modify a node Litua.read_modified_node
modify_node
. It allows you to look at some node after modifying it Litua.convert_node_to_string
Litua.modify_final_string
Litua.on_teardown
Litua.global
as you need it Be aware that the document always lives within one invisible top-level node called document
. So if you use a document
element in your input file and define a hook for the element document
as well, don't be surprised about the additional invocation of this hook.
I highly recommend to go through the examples in this order to get an idea how to use the hooks:
Litua is a simple text processing utility for text documents with a hierarchical structure. It reminds of tools like XSLT, but people often complain about XSLT being too foreign to common programming languages. As an alternative, I provide litua with a parser for the litua input syntax, a map of data from rust to Lua, a runtime in Lua, and writer for text files.
This is a single static executable. It only depends on basic system libraries like pthread, math and libc. It ships the entire Lua 5.4 interpreter with the executable. I expect it to work out-of-the-box on your operating system.
Call the litua executable with -h
to get information about additional arguments:
litua -h
The following document defines the syntax (see also design/litua-lexer-state-diagram.jpg
):
Node = (Text | RawString | Function){0,…}
Text = (NOT the symbols "{" or "}"){1,…}
RawString = "{<" Whitespace (NOT the string Whitespace-and-">}") Whitespace ">}"
| "{<<" Whitespace (NOT the string Whitespace-and-">>}") Whitespace ">>}"
| "{<<<" Whitespace (NOT the string Whitespace-and-">>>}") Whitespace ">>>}"
… continue up to 126 "<" characters
Function = "{" Call "}"
| "{" Call Whitespace "}"
| "{" Call Whitespace Node "}"
| "{" Call ( "[" Key "=" Node "]" ){1,…} "}"
| "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace "}"
| "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace Node "}"
Call = (NOT the symbols "}", "[" or "<")(NOT the symbols "[" or "<"){0,…}
Key = (NOT the symbol "="){1,…}
Whitespace = any of the 25 Unicode Whitespace characters
In essence, don't use "<" or "[" in function call names, or "=" in argument keys. Keep the number of opening and closing braces balanced (though this is not enforced by the syntax).
The following parts can be improved:
.parsed.expected
files are not checked in the testsuite, because rust's HashMap representation is not consistent across builds.The source code is available at Github.
See the LICENSE file (Hint: MIT license).
Please report any issues on the Github issues page.