; A CONL document is valid UTF-8, parsed line by line. ; An empty document is valid and represents "no value" ; which can be coerced to an empty map or list as appropriate. ; ; In accordance with tradition, a newline may be specified with ; either a newline (U+000A) or carriage return (U+000D), or both: newline = "(\r|\n|\r\n)" ; Within a line, you can use tabs (U+0009) or spaces (U+0020) for blanks. Other ; unicode spaces (e.g. U+200B) are treated as any other character (so that ; parsing is not dependent on Unicode version or multibyte characters). blank = "[ \t]" ; A comment begins with a ; (U+003B), and continues until the next newline. ; Any lines containing only blanks or only comments are ignored. comment = ";[^\r\n]*" ; The indent level of a line is the string of tab and space characters at the start. ; Lines that contain no non-blank characters, or only blanks followed by a comment, ; are assumed to have the same indentation as the previous line. ; ; After a newline, outdents are generated until the indent level matches a preceding line. ; (one outdent per indent token generated since that line). ; After which, if the indent is longer than the previous line, an indent token is generated. after_newline = outdent* indent? ; A quoted scalar begins with a double quote (U+0022) and ends with the same character. Quoted scalars provide escape sequences for hard to type characters quoted_scalar = "/\"([^\\\\\"\\r\\n]|\\\\(\\\\|\"|t|r|n|\\{[09a-fA-F]{1,8}\\}))*\"/" ; The escape sequences are: escape_sequences \\ = "\\" ; '\' \" = "\"" ; '"' \t = "\t" ; tab \r = "\r" ; carriage return \n = "\n" ; newline \{[09a-fA-F]{1,8}} ; unicode codepoint from U+0000 to U+10FFFF (unpaired surrogates are not allowed). ; A multiline scalar begins with three double quotes (U+0022) and ends at the ; next non-blank line at which the indent level is less than or equual to the ; current one. They may optionally contain a hint for syntax highlighters, but ; this is not part of the value. The hint may contain any character except ;/". ; As with values leading/trailing blanks are ignored but internal blanks are ; preserved. ; When parsing a multiline scalar: ; * leading and trailing whitespace and blanks are removed. ; * carraige returns are normalized so each line ends with just \n. ; * any blank lines that contain less indent than the first line of the value are treated as \n. ; * The ; character is treated as a literal semi-colon (not a comment). multiline_scalar = """ """ blank? multiline_hint blank? comment? newline indent (.*) outdent multiline_hint = "/[^\";\r\n \t]([^\r\n;\"]*[^;\r\n \t])?/" ; A map key can contain any character except ;, =, \r, \n; and cannot start with ; a blank or quote, or end with a blank. It can also be a quoted scalar: map_key = "/[^\";=\r\n \t]([^\r\n=;]*[^=;\r\n \t])?/" = quoted_scalar ; A list item is denoted by a single = sign. list_item = "=" ; A scalar value can either be a quoted scalar, a multiline scalar, or a sequence of any characters ; except ;, \r, \n. As with keys they cannot start with a blank or quote, or end with a blank. scalar = "/[^\";\r\n \t]([^\r\n;]*[^;\r\n \t])?/" = quoted_scalar = multiline_scalar ; A map section consists of a series of key value pairs, separated by newlines. map_section = map_key blank* "=" blank* scalar blank* comment? newline = map_key blank* "="? blank* comment? indent section outdent list_section = list_item blank* scalar blank* comment? newline = list_item blank* comment? indent section outdent section = map_section = list_section