1. Status of this document
This document is an early draft.
2. Introduction
2.1. Background
This section is non-normative.
MNX-Generic is a general format for representing musical scores in terms of linked graphical media, audio media and performance data.
In contrast to MNX-Common, there is no attempt to represent semantics directly in MNX-Generic. Thus, MNX-Generic can be described as a low-level, literal format that represents instances of scores, rather than their semantic content. MNX-Generic is intended to support applications which must be able to faithfully execute a visual and/or audible rendition of a score, with an awareness of the relationship between what is seen and what is heard.
MNX-Generic can be employed as a target format for applications that render semantic notation into media. And even though MNX-Generic is not a semantic format, MNX-Generic elements may cross-reference elements in a semantic source document that was rendered into MNX-Generic. This supports a connection between the original semantic markup and an MNX-Generic rendering of same.
Given MNX-Generic’s characteristics as a target format, some features of MNX-Generic are employed within MNX-Common to provide literal descriptions of rendering where semantic information does not suffice to yield the desired musical result.
The only constraints on the nature of an MNX-Generic score are:
-
The visual content of the score must be encoded in SVG.
-
The audible content of the score must be encoded either as audio media or performance data.
2.2. Use cases
This section is non-normative.
A companion document details a set of known use cases for music notation.
2.3. Audience
This section is non-normative.
This specification is intended for authors of documents and applications that use the features defined in this specification, implementors of tools that operate on documents that use the features defined in this specification, and individuals wishing to establish the correctness of documents or implementations with respect to the requirements of this specification.
This document is probably not suited to readers who do not already have at least a passing familiarity with XML technologies. In places it sacrifices clarity for precision, and brevity for completeness. More approachable tutorials and authoring guides can provide a gentler introduction to the topic.
2.4. Design notes
This section is non-normative.
Some general principles regarding the design of this specification follow.
- Address both literal encoding, not semantic encoding.
- MNX includes two separate approaches to encoding music: high-level semantic encodings described by MNX-Common (and other future modules), and low-level literal encodings described by MNX-Generic. The literal encoding attempts to eliminate cultural and semantic assumptions within its scope, while still allowing linkage between the literal and semantic layers.
- Leverage existing value in the world
- The ecosystem of the Web is broad and valuable. MNX attempts to exploit this by making use of existing patterns and tooling. Examples include the reuse of many CSS concepts, and the ability to employ completely standard SVG documents within MNX-Generic without need of alteration.
2.4.1. Extensibility
This section is non-normative.
Content TBD
2.5. Structure of this specification
This section is non-normative.
This specification is divided into the following major sections:
- § 2 Introduction
-
Non-normative materials providing a context for the HTML specification.
- § 3 Infrastructure
-
Scaffolding material on which the remainder of the specification relies
- § 4 Document structure
-
The elements that make up the MNX-Generic format.
2.5.1. How to read this specification
As described in the conformance requirements section below, this specification describes conformance criteria for a variety of conformance classes. In particular, there are conformance requirements that apply to producers, for example authors and the documents they create, and there are conformance requirements that apply to consumers, for example Web browsers. They can be distinguished by what they are requiring: a requirement on a producer states what is allowed, while a requirement on a consumer states how software is to act.
foo
attribute’s value must be a valid integer" is a
requirement on producers, as it lays out the allowed values; in contrast, the requirement "the foo
attribute’s value must be parsed using the rules for parsing integers"
is a requirement on consumers, as it describes how to process the content. Requirements on producers have no bearing whatsoever on consumers.
2.5.2. Typographic conventions
This is a note.
This is a warning.
/* this is a CSS fragment */
The defining instance of a term is marked up like this. Uses of that term are marked up like this or like this.
The defining instance of an element, attribute, or API is marked up like this
. References to that element, attribute, or API are
marked up like this
.
Other code fragments are marked up like this
.
Byte sequences with bytes in the range 0x00 to 0x7F, inclusive, are marked up like this
.
Variables are marked up like this.
In some cases, requirements are given in the form of lists with conditions and corresponding requirements. In such cases, the requirements that apply to a condition are always the first set of requirements that follow the condition, even in the case of there being multiple sets of conditions for those requirements. Such cases are presented as follows:
- This is a condition
- This is another condition
- This is the requirement that applies to the conditions above.
- This is a third condition
- This is the requirement that applies to the third condition.
2.6. Suggested reading
This section is non-normative.
The following documents might be of interest to readers of this specification.
3. Infrastructure
3.1. Terminology
3.1.1. Notational idioms
A notational idiom is a set of rules in the world for encoding music as some set of visual markings, which can be interpreted by musicians to produce an audible performance.
3.1.1.1. Conventional Western music notation (CWMN)
This notational idiom comprises a set of notational rules common to (but not limited to) Western European music from circa 1600 to the present day.
3.1.2. Score profiles
A score profile is a set of constraints on the rules in a notational idiom. Score profiles are designed to narrow the set of constructs that can be produced or consumed in MNX-Generic to a practical scope.
3.2. Common syntaxes
There are various places in MNX-Generic that accept particular data types, such as note values, numbers or durations. This section describes the conformance criteria for content in those formats, and how to parse them.
3.2.1. Common parser idioms
The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).
The White_Space characters are those that have the Unicode property "White_Space" in
the Unicode PropList.txt
data file. [UNICODE]
This should not be confused with the "White_Space" value (abbreviated "WS") of the "Bidi_Class"
property in the Unicode.txt
data file.
The control characters are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode UnicodeData.txt
data file. [UNICODE]
The uppercase ASCII letters are the characters in the range U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z.
The lowercase ASCII letters are the characters in the range U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z.
The ASCII letters are the characters that are either uppercase ASCII letters or lowercase ASCII letters.
The ASCII digits are the characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).
The alphanumeric ASCII characters are those that are either uppercase ASCII letters, lowercase ASCII letters, or ASCII digits.
The ASCII hex digits are the characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F.
The uppercase ASCII hex digits are the characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F only.
The lowercase ASCII hex digits are the characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F only.
Some of the micro-parsers described below follow the pattern of having an input variable that holds the string being parsed, and having a position variable pointing at the next character to parse in input.
For parsers based on this pattern, a step that requires the consumer to collect a sequence of characters means that the following algorithm must be run, with characters being the set of characters that can be collected:
-
Let input and position be the same variables as those of the same name in the algorithm that invoked these steps.
-
Let result be the empty string.
-
While position doesn’t point past the end of input and the character at position is one of the characters, append that character to the end of result and advance position to the next character in input.
-
Return result.
The step skip white space means that the consumer must collect a sequence of characters that are space characters. The collected characters are not used.
When a consumer is to strip line breaks from a string, the consumer must remove any U+000A LINE FEED (LF) and U+000D CARRIAGE RETURN (CR) characters from that string.
When a consumer is to strip leading and trailing white space from a string, the consumer must remove all space characters that are at the start or end of the string.
When a consumer is to strip and collapse white space in a string, it must replace any sequence of one or more consecutive space characters in that string with a single U+0020 SPACE character, and then strip leading and trailing white space from that string.
When a consumer has to strictly split a string on a particular delimiter character delimiter, it must use the following algorithm:
-
Let input be the string being parsed.
-
Let position be a pointer into input, initially pointing at the start of the string.
-
Let tokens be an ordered list of tokens, initially empty.
-
While position is not past the end of input:
-
Collect a sequence of characters that are not the delimiter character.
-
Append the string collected in the previous step to tokens.
-
Advance position to the next character in input.
-
-
Return tokens.
For the special cases of splitting a string on spaces and on commas, this algorithm does not apply (those algorithms also perform white space trimming).
3.2.2. Numbers
3.2.2.1. Rational numbers
A string is a rational number if it is either an integer, or a pair of integers separated by a U+002F SLASH whose second element is nonzero.
The rules for parsing rational numbers are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will return a pair of integers, one for the numerator and one for the denominator which must be nonzero, or an error.
-
Let input be the string being parsed.
-
Let position be a pointer into input, initially pointing at the start of the string.
-
Let fraction be an initially empty list of integers.
-
Collect a sequence of characters that are space characters. These are skipped.
-
While position is not past the end of input, and fraction contains fewer than two elements:
-
Collect a sequence of characters that are not space characters, ASCII digits, U+002D HYPHEN-MINUS or U+002F SLASH characters. This skips past leading garbage.
-
Collect a sequence of characters that are not space characters or U+002F SLASH, and let unparsed number be the result.
-
Let number be the result of parsing unparsed number using the rules for parsing signed integers.
-
If number is an error, set number to zero.
-
Append number to fraction.
-
Collect a sequence of characters that are space characters, or U+002F SLASH.
-
-
If fraction has no elements, return zero.
-
If fraction has only one element, append 1 to fraction.
-
Return the first element of fraction as the numerator and the second element of fraction as the denominator.
3.2.3. Element locations
An element location constitutes a reference to a specific element
in the document. It consists of the character #
, immediately followed by the
XML ID of the referenced element.
3.2.4. Style property lists
MNX-Generic supports a simple and compact style property list syntax, allowing a map of key-value pairs to be represented in a single string where the keys are names of style properties.
To parse a style property list:
-
Let input be the string being parsed.
-
Let defs be the result of strictly splitting the string input using U+003B SEMICOLON as a delimiter.
-
Let properties be an empty map.
-
While defs is not empty,
-
Let definition be the first element of defs, and remove it from defs.
-
Collect a sequence of characters from definition that are not U+003A COLON, and let property name be the result after stripping leading and trailing white space.
-
If property name is empty, return an error.
-
If the next character of definition is not U+003A COLON, return an error.
-
Skip the next character of definition.
-
Let property value be the remaining characters of definition, after stripping leading and trailing white space.
-
Add a new entry to properties with key property name and value property value.
-
-
Return properties.
Examples include:
color: red
-
A definition of the property
color
as having the valuered
. color: green;
-
A definition of the property
color
as having the valuegreen
. Note that a terminal;
is provided in this case, but has no effect. smufl-font: Bravura; color: red;
-
A definition of two properties:
smufl-font
with valueBravura
, andcolor
with valuered
.
3.3. Content models and categories
Each element in MNX-Generic falls into zero or more categories that group elements with similar characteristics together. Examples of content categories include event content and sequence content, among many others.
3.3.1. Element definitions
Each element in this specification has a definition that includes the following information:
- Contexts
-
A non-normative description of where the element can be used. This information is redundant with the content models of elements that allow this one as a child, and is provided only as a convenience.
- Content model
-
A normative description of what content must be included as children and descendants of the element.
- Attributes
-
A normative list of attributes that may be specified on the element (except where otherwise disallowed), along with non-normative descriptions of those attributes. (The content to the left of the dash is normative, the content to the right of the dash is not.)
- Style properties
-
A normative list of style properties that may be specified on the element (except where otherwise disallowed), along with non-normative descriptions of those attributes. Where these attributes may be inherited from ancestor elements, this is indicated.
This is then followed by a description of what the element represents, along with any additional normative conformance criteria that may apply to producers and consumers and implementations. Examples are sometimes also included.
4. Document structure
4.1. Root structure and metadata
4.1.1. The mnx
element
- Contexts:
- None: this is the top-level element.
- Content Model:
- A single, required
head
element.- Either a
collection
or ascore
element. - Either a
- Attributes:
- None.
The mnx
element encloses the document as a whole.
4.1.2. The head
element
- Contexts:
- Any.
- Content Model:
- Metadata content.
stylesheet
- Attributes:
- None.
head
element supplies overall descriptive information for an MNX-Generic document,
such as document-scoped metadata or stylesheet definitions.
4.1.3. The collection
element
- Contexts:
mnx
,collection
- Content Model:
- Any combination of
collection
andscore
elements. - Attributes:
type
- The type of the collection
The collection
element describes a collection, which is a sequence of
ordered elements that make up a compound musical document. Each child element
of the collection may itself be either a collection or a score.
The type
attribute determines the nature of the collection.
Valid collection type values include:
movements
- Each element comprises a movement of a work.
sections
- Each element comprises a section of a work, or of a movement.
parts
- Each element comprises a description of of the same music, organized for different parts.
Metadata content or style properties may be included at any level of the resulting structure, causing them to apply them only to those parts of the document.
4.1.4. The score
element
- Contexts:
mnx
,collection
.- Content Model:
- Metadata content
- Zero or one musical body elements.
- Attributes:
src
- optional relative path to an external source file
The score
element encloses a self-contained description of the score for
a portion or the entirety of a musical work.
If the src
attribute is provided, this specifies a
relative path where the score’s musical body lives. Otherwise, the body
must be provided within the content of the score
element.
4.1.5. Metadata content
Metadata content may be included in many elements to supply bibliographic data and other descriptive information.
Many elements TBD. Need to harmonize with existing metadata and bibliographic standards.
4.1.6. The title
element
- Contexts:
- Any.
- Content Model:
- Text
- Attributes:
- None.
The title
element assigns a title to its parent element in the context of the document as a whole.
4.1.7. The mnx-generic
element
- Contexts:
- Wherever a musical body is expected.
- Content Model:
- Metadata content.
- One or more
score-view
elements.- Performance content.
- One or more
- Attributes:
- None.
The mnx-generic
element is a musical body that describes an MNX-Generic
score as a whole.
The following example illustrates an entire MNX-Generic document; the elements are described individually in the remainder of this section.
<mnx-generic> <score-view id= "page1" view= "score.svg#page1" /> <score-view id= "page2" view= "score.svg#page2" /> <score-view id= "page3" view= "score.svg#page3" /> <performance-audio> <performance-audio-media src= "score.mp4" /> <performance-mapping> <performance-region start= "0" end= "0.72" view= "page1" region= "m1" /> <performance-region start= "0.72" end= "1.43" view= "page1" region= "m2" /> <performance-region start= "1.43" end= "2.99" view= "page1" region= "m3" /> <performance-region start= "2.99" end= "3.65" view= "page1" region= "m4" /> </performance-mapping> </performance-audio> <performance-data> <performance-tempo beat= "/4" bpm= "80" /> <performance-mapping> <performance-region start= "0" end= "1" view= "page1" region= "m1" /> <performance-region start= "1" end= "2" view= "page1" region= "m2" /> <performance-region start= "2" end= "3" view= "page1" region= "m3" /> <performance-region start= "3" end= "4" view= "page1" region= "m4" /> </performance-mapping> <performance-part> <instrument-sound> strings.violin</instrument-sound> <performance-event start= "0" duration= "1/4" pitch= "C4" dynamics= "100" /> <performance-event start= "1/4" duration= "1/4" pitch= "D4" dynamics= "100" /> ...following events...</performance-part> </performance-data> </mnx-generic>
4.2. Graphics media
4.2.1. The score-view
element
- Contexts:
mnx-generic
- Content Model:
- Any number of
score-mapping
elements. - Attributes:
view
- link to an SVG view of the score
The score-view
element references a specific view within a separate
SVG document, via the URL provided in the view
attribute. This URL must follow the rules for linking into SVG content.
Each score-view
element represents a single page of the score. A
default sequence of pages is established by the order of occurrence of score-view
elements within the document.
The sequence of page presentation in conjunction with performance content may differ from the default sequence, according to the mapping between performance and graphics.
4.2.2. The score-mapping
element
- Contexts:
score-view
- Content Model:
- None.
- Attributes:
graphics
- an element ID within the SVG content described by the parentscore-view
semantics
- one or more optional IDs of corresponding element(s) within source semantic documents
The score-mapping
element supplies information on the correspondence
between an SVG element in a score-view
, and sets of other
semantic elements in this or other documents.
The graphics
attribute is required, and gives a single
ID of an element in the score view’s SVG content. There is no restriction on
the nature or structure of this element, nor on its relationship to other
elements.
The optional semantics
attribute supplies one or more
IDs of elements in a semantic source document, for example an MNX-Common document.
This asserts that each of the referenced semantic source elements are considered as
generating the SVG content described by graphics
.
Note: While this element describes only a single SVG element, it is commonly the case that multiple SVG graphics may be associated with the same semantic source.
4.3. Performance content
The category of performance content includes both of the following:
-
audio media supplying a performance of an MNX-Generic score in an audio file format
-
performance data, describing a performance of an MNX-Generic score in terms of discrete, parameterized sonic events
4.3.1. The performance-audio
element
- Contexts:
mnx-generic
- Content Model:
- Metadata content.
- Zero or more
performance-tempo
elements.- Zero or one
performance-mapping
elements.- One or more
performance-audio-media
elements. - Zero or more
The performance-audio
element defines one or more audio media files
that constitute a single performance of the score, and whose contents are
presumed to be temporally synchronized with each other.
Additionally, performance-tempo
elements may establish an proportional
mapping between an arbitrary notated time unit and a time interval. This mapping
may change throughout the course of the performance. If no such elements occur,
the notated time unit is defined as equal to 1 second of performance time.
A set of optional performance-mapping
elements, if given, may establish a mapping
between the performance data and the graphical score.
4.3.2. The performance-audio-media
element
- Contexts:
performance-audio
- Content Model:
- Metadata content.
- Attributes:
src
- URL of an audio file of the score
The performance-audio-media
element includes an audio media file, via the URL provided
in the src
attribute.
4.3.3. The performance-data
element
- Contexts:
mnx-generic
- Content Model:
- Metadata content.
- One or more
performance-part
elements.- Zero or more
performance-tempo
elements.- Zero or one
performance-mapping
elements. - One or more
- Attributes:
The performance-data
element provides performance data in the form of
discrete sonic events suitable for synthesis or analysis.
It consists of some number of parts, plus optional mappings between performance time and regions of graphical media.
Additionally, performance-tempo
elements may establish an proportional
mapping between an arbitrary notated time unit and a time interval. This mapping
may change throughout the course of the performance. If no such elements occur,
the notated time unit is defined as equal to 1 second of performance time.
A set of optional performance-mapping
elements, if given, may establish a mapping
between the performance data and the graphical score.
4.3.4. The performance-part
element
- Contexts:
performance-data
- Content Model:
- Metadata content.
- Zero or more
performance-event
elements. - Zero or more
- Attributes:
instrument-sound
- the sound ID of the instrument for this part.
The performance-part
element organizes a list of performance-event
elements, within a given performance.
The instrument-sound
attribute gives the MusicXML sound ID of the instrument
for this part.
Note: performance-part
elements do not necessarily correspond to MNX-Common
4.3.5. The performance-tempo
element
- Contexts:
performance-data
,performance-audio
,interpret
- Content Model:
- None.
- Attributes:
start
- start time of this performance tempobeat
- notated time units per beatbpm
- number of beats per minute
The performance-tempo
element describes a proportional relationship between
time and an arbitrary notated time unit that may be used by the score.
This relationship applies to a time range beginning at the time in seconds
specified by start
and continuing until the next performance-tempo
element. The default value is 0.
The beat
element is a note value which
establishes a beat as some fraction or multiple of a notated time unit (which
in CWMN a whole note by convention). The default value is 1.
The bpm
element establishes a tempo, expressed as a valid floating-point number giving the number of beats per minute. The
default value is 60.
NOTE: The defaults for both of the above attributes establish a notated time unit as equal to 1 second. Thus, if no attribute values are provided, score time is equal to real performance time.
NOTE: The set of performance-tempo
elements establish a variable-rate progression
of a scoring time unit relative to performance time, similar to a MIDI tempo track.
4.3.6. The performance-event
element
- Contexts:
performance-part
,interpret
- Content Model:
- None.
- Attributes:
start
- start time of this eventduration
- duration of this eventpitch
- pitch of this eventdynamics
- dynamics for this eventtechniques
- set of performance techniques for this eventview
- optional element ID of thescore-view
containing graphics for this eventgraphics
- optional SVG elements for specific event graphics
The performance-event
element describes a single musical event in terms of its
performance parameters.
All times given are in notated time units, whose relationship to performance time is described by performance-tempo
elements. These times may be expressed in the following forms which are
syntactically distinct:
start
gives the starting time of the event. This
specifies the actual start time, not a notated start time to be interpreted by
a performer. The default value is zero.
duration
gives the duration of the event. This
specifies the actual duration to be performed, not a notated duration subject
to interpretation by a performer.
pitch
gives the pitch of the event expressed as either
a valid floating-point number providing a frequency in Hertz, or a chromatic pitch.
Note: The interpretation of pitch at the event level needs to be much more carefully nailed down. Issues include how to control unpitched instruments, the temperament (if any) applied to chromatic pitches, and no doubt more.
dynamics
gives the dynamics of the event expressed in
a scale from 0 to 127. This scale needs to be better defined; the existing
MusicXML definition as "percentage of forte" is hard to interpret clearly.
techniques
gives a set of performance techniques applying
to the event as a unordered set of space-separated tokens.
Note: These presumably correspond to articulatory variations of the instrument sound. Proper definition remains TBD.
If present, the view
and graphics
attributes
together define a set of SVG graphics in a score-view
element which comprise the visual
representation corresponding to this event. Other than the fact of this correspondence, no other
information about the graphics is encoded.
4.3.7. The performance-mapping
element
- Contexts:
performance-audio
,performance-data
- Content Model:
- Zero or more
performance-region
elements. - Attributes:
The performance-mapping
element defines a sequence of piecewise, non-
overlapping ranges in notated time which correspond to piecewise regions
within score graphics media views. In essence, it is a timeline that
correlates a performance with elements within a series of views of the score
from which that performance is derived.
The performance-region
elements in a mapping provide the detailed
descriptions of these ranges. The elements must occur in forward time order,
and the end
value of each region must be less than or
equal to the start
value of the next region.
4.3.8. The performance-region
element
- Contexts:
performance-mapping
- Content Model:
- None.
- Attributes:
start
- start time of the time region being mappedend
- end time of the time regionview
- the element ID of thescore-view
containing the visual regionregion
- the definition of the visual region itselfcursor-start
- a starting line segment for a cursorcursor-end
- an ending line segment for a cursor
The performance-region
element describes the relationship between a
performance time region expressed in notated time units, and a visual region of a
score page. This allows consumers to understand the correspondence between
regions of the graphical score and regions of one or more audio performances.
start
gives the start of the time region.
end
gives the end of the time region.
view
identifies a view of some section of the score, by
providing the XML ID of its score-view
element.
region
identifies the visual region for the mapping
using a fragment identifier in accordance with linking into SVG content.
The fragment identifier refers to the same document identified by the view
attribute.
If the pair of attributes cursor-start
and cursor-end
are defined, then a mapping is defined between
points in performance time and line segments in the visual region. Each
attribute supplies an ordered set of space-separated tokens giving the
cursor’s endpoints as successive X/Y pairs in user coordinates applicable to
the region.
The special tokens left
, right
, top
and bottom
may be used here to
define both endpoints of a cursor in terms of the corresponding edge of the
region’s SVG bounding box.
Under this mapping, a time t in the time region corresponds to a line segment in the visual region connecting two points given by the respective formulae of:
-
cursor-start.p1 + (cursor-end.p1 - cursor-start.p1) * (t - start) / (end - start).
-
cursor-start.p2 + (cursor-end.p2 - cursor-start.p2) * (t - start) / (end - start).
If either or both of cursor-start
and cursor-end
are undefined, then the entire time region corresponds
to the entire visual region, with no further decomposition.
Note: To more easily support cursor motion through curved arcs, non-parallel start and end cursors could be considered as segments of two rays whose common origin lies at the point of intersection between these cursors. Interpolation would then be performed in radial coordinates, smoothly sweeping both the angle and the distances from the origin to move the cursor’s endpoints along roughly circular arcs. Straight-line motion would be merely a special case in which the intersection lies at infinity.