#+TITLE: NeuroML2 From Whole Cloth Note: This file also exists in Markdown format in this repo, generated at an unspecified point in time with =pandoc -f org -t gfm -o nml.md nml.org=. * Introduction This document describes my current knowledge of NeuroML2 and in a way aims to replace its missing developer documentation. It is likely flawed and constantly evolving. However, this text is needed as the actual projects' documentation is sparse and sometimes confusing and, most importantly, thin on the high concepts. NeuroML2 is an effort to provide portable descriptions of neuro-scientific simulations, including networks, and morphologically detailled neurons. NeuroML2 supersedes NeuroML1 and is not a simple iteration. Therefore, NeuroML will be used synonymously with NeuroML2 and abbreviated as NML or NML2. Despite its flaws, NeuroML2 is a valiant effort and an important feature to support. * NeuroML2 and LEMS *TL;DR*: NeuroML2 is a library of definitions written in LEMS, which is written in XML. All NeuroML2 is LEMS. Here I lay out the overall look and feel of NeuroML2, links to in-depth sections follow. NeuroML2 is -- in contrast to v1 -- implemented in LEMS[fn:: Low-entropy modelling language. Using the moniker 'low-entropy' for an XML based format must be an attempt at humor.], modelling language for chemical, physical, and biological processes. Furthermore, NML2 has a kind of symbiotic relationship with LEMS in the large, which will be explored in depth later. Both NML2 and LEMS are written in XML, making the human interaction slightly awkward. Schemata (XSD) exist in the relevant git repositories. These form the basis of our exploration of NML2. The fundamental relationship between LEMS and NML2 is this: every functional item in NML2 has a matching definition in LEMS. So, to interpret a NML2 file, we need to look up these definitions in the corresponding LEMS files. An example might look like this - LEMS definition #+begin_src xml #+end_src - NML2 file #+begin_src xml #+end_src The tag ~~ introduces a new type, which can be instanced by using ~~ or the shorthand ~~, the latter being the idiomatic use, especially in NML2. For brevity, I skipped the definition of units and dimensions. Because of this model, no discussion (or implementation) can proceed without constantly switching between LEMS and NML2. Both LEMS and NML2 follow an inheritance model akin to object-oriented programming. This means - A ~ComponentType~ may specify its 'base class' using ~extends="base"~. - Whenever a value of type ~base~ is required, a derived type may be substituted. - /Thankfully/ we only have to deal with single inheritance Again an example - LEMS definition #+begin_src xml #+end_src - NML2 #+begin_src xml #+end_src There is also the concept of public and private data members. In particular access is mediated by 'exposures', which declare externally readable members. LEMS: #+begin_src xml #+end_src Here we also introduced the tags ~Parameter~ (externally settable value) and ~DerivedVariable~ (dependent quantity). ~Parameter~ values are set using XML attributes. The internal variable ~q10_fix~ is only readable through the exposure ~q10~, even in enclosing scopes. NML2: #+begin_src xml #+end_src Again we encounter two new bits of related functionality: paths and select. The rules are as follows: If a ~ComponentType~ declares a ~Child~ of type ~type~ with an exposed value ~name~, that value can be used via ~select~ by addressing the /path/ ~type/name~. ~Child~ implies a single instance of type ~type~ to be used. However, multiple instances can be part of a ~ComponentType~, which is declared using ~Children~. Addressing values from children is done slightly differently. A /single/ child's value is addressed using ~select="child_id/name"~. It is possible to accumulate over children using ~select="children[*]/name reduce="multiply"~ or ~select="children[*]/name reduce="add"~. More complicated selection is possible, see later. Example, assume the definitions from above being available LEMS: #+begin_src xml #+end_src NML2: #+begin_src xml #+end_src Here, ~prod_q10=64~ and ~first_q10=1~, although is general there is no way of knowing ~fixed_1~ being present, unless enforced otherwise. * Inheritance As alluded to above, the LEMS/NML2 model for describing relations between components is similar to the 'is-a' type inheritance model used in Python or C++. To illustrate, we use the HH model from the NML2 'stdlib found in ~Channels.xml~ #+begin_src xml #+end_src Note the following consequences of the 'is-a' model - derived items have access to their bases' ~Parameter~ values. - types can have a ~Requirement~ on the presence of certain variable in the surrounding scope. - ~HHExpRate~ can be inserted instead of a ~baseHHRate~ *or* ~baseVoltageDepRate~. *QUESTION* from the examples and discussions I have the impression that having a ~Dynamics~ item in both base and derived items will cause the item from base to overwritten. Is this correct and where is it documented? *ANSWER* Correct, but nowhere written. * Recap So Far As we have learned in the sections above, NML2 is /written in/ LEMS, where the salient definitions can be found in the LEMS files provided within the NeuroML2 repositories. This means each XML tag in an NML document, eg ~~, can be looked up in the corresponding LEMS file, in this case ~Channels.xml~. In addition, both LEMS and NML2 have -- minimal, ie a working NML2/LEMS document will likely validate, but not every validating document is working NML2/LEMS -- XSD schemata attached.[fn:: For example the use of ~sequence~ in XSD implies order, but I am pretty sure this is not upheld everywhere. Also, most often ~sequence~ is used together with ~count=unbounded~, which is semantically incorrect in many cases] To compose a concrete component from an NML2 document we will be required to - parse the related LEMS files - build an inheritance tree (since requirements are in terms of 'base classes') - extract dynamics and other items from the LEMS descriptions - obtain parameters and similar from the NML2 XML node - recursively instantiate children of the object From there, we can interpret the structure and integrate the equations of motion or produce an optimised represe first and interpret the result or export it to another format like native code or NMODL. * A Larger Example We are almost in a position to decode our first practical example #+begin_src xml #+end_src To do so, we have to find the LEMS definitions, of which will we use simplified versions. First, we take a look at the ion channel. #+begin_src xml #+end_src Recall the definitions for the ~rate~ hierarchy above. Finally, we need to inspect the ~gates~ #+begin_src xml #+end_src Finally, we can put this together to extract the meaning of ~gateHHrates~ #+begin_latex q(t=0) &= \frac{a}{a + b}\\ q'(t) &= a - \frac{q}{a + b} #+end_latex where $a$ and $b$ are defined by the ~forwardRate~ and ~reverseRate~ items. Now, we can compose this into ~ionChannelHH~ while simplifying the equations and renaming quantities #+begin_latex m(t=0) &= \frac{r_{lin}}{r_{lin} + r_{exp}}\\ m'(t) &= r_{lin} - \frac{m}{r_{lin} + r_{exp}}\\ h(t=0) &= \frac{r_{exp}}{r_{exp} + r_{sig}}\\ h(t) &= r_{exp} - \frac{h}{r_{exp} + r_{sig}}\\ g &= h m^3 \gamma #+end_latex NB. while we /are/ using the same symbols here, $r_{exp}$ signifies two independent quantities in the equations for $m$ and $h$. #+begin_latex r_{exp} &= \rho_{exp}\exp((U - \mu_{exp})/\sigma_{exp})\\\\ r_{sig} &= \frac{\rho_{sig}}{1 + \exp((\mu_{sig} - U)/\sigma_{sig})}\\ x_{lin} &= \frac{\mu_{lin} - U}{\sigma_{lin}} \\ r_{lin} &= \frac{\rho_{sig} x}{1 - \exp((- x)}} #+end_latex As notational convenience, we used greek letters for parameters. Unsurprisingly, we find the HH neuron model here. * How to Interpret NML2 In the last part we saw that /intuitively/ NML2/LEMS is quite straightforward to interpret. However, we will need to formalise the process to be able to deal with NML2 in general. Starting with this section, the document will delve into the technical details. Our current goal will be to write a tool able to translate at least the ~ionChannelHH~ model into a symbolic description. Although NML2 can be extended using LEMS, we will use the static schema provided in the NML2 repository as is. Here is a list of invariants we need to uphold - Each type must be aware of its base type, eg ~gateHHrates <- gate~, required to - check and sort ~Children~, eg given ~~ we need to collect everything deriving from ~gate~ into an array ~gates~. - compose the derived items parameters and variables from the inherited items - we cannot shorten ~A <- B <- C~ to ~A <- C~ since ~C~ might be used as either - Each item must be aware of its enclosing items #+begin_example NaConductance - gates +- m +- forwardRate | +- reverseRate | +- h +- forwardRate +- reverseRate #+end_example This is needed to facilitate path-based selection. We need to situationally account for these cases - ~A~ declares a ~~ #+begin_src xml #+end_src or #+begin_src xml #+end_src *QUESTION* is this acceptable as well? #+begin_src xml #+end_src - ~A~ declares a ~~ #+begin_src xml #+end_src The name ~children~ now relates to a collection of things deriving from ~B~, ie ~\forall T \sup B: String -> T~, akin to dynamic polymorphism/existentials. *NOTE* the dichotomy between ~~ and ~~ for ~Child~ and ~Children~ is a bit annoying. The possible instantiations of ~ionChannel~ are - ~baseIonChannel~ :: the base class - ~ionChannel~ / ~ionChannelHH~ :: the are identical, but both present for convenience - ~ionChannelKS~ :: ion channel with a ~gate~ based on a kinetic scheme - ~ionChannelVShift~ :: ion channel with a voltage offset. Practically, we only expect to support ~ionChannel~ for now and voltage shifted channels being a trivial extension later. Kinetic schemata are out of scope and ~baseIonChannel~ is of no practical use. However, it infeasible to generate a fixed set of prebuilt implementations as ~ionChannel~ instances can have an arbitrary number of ~gate~ instances which implement most of the functionality. This invalidates previous plans for implementation following that strategy. *QUESTION* Is the ~id~ field always allowed? *ANSWER* Yes, ~id~ is implied. ** Scoping - derived variables: - local: visible - enclosing: exposure - enclosed: exposure - state variables: - local: visible - enclosing: exposure - enclosed: exposure - parameters - local: visible - enclosing: private - enclosed: private - constants: - local: visible - enclosing: private - enclosed: private ** Flattening Instances In order to produce efficient representations, we will need to collapse instances and types into a flat format, eg #+begin_src xml #+end_src should be flattened into #+begin_src xml #+end_src * Synapses Next, we consider synapses, which have a different hierarchy, which is reproduced in the following, again, simplified for ease of reading [fn:: Note the comment, which we ignore is if it was fixed and fix in our own code.] #+begin_src xml #+end_src There's some additions to our knowledge of NML2 here: - ~Property~ :: Paraphrasing the LEMS documentation: 'like ~Parameter~, but may be different for each instance'. So far, we did not encounter the ~Instance~ concept in LEMS and as it is used here for the synapse weight, we are going to largely ignore it. [fn:: Arbor adds this on the library side to connections.] - ~OnEvent~ :: Contains assignments to state variables, similar to ~OnStart~, but now triggered upon incoming events, ie spikes. May choose a port. Finally, we need to discuss ~EventPorts~. In contrast to simulators like Arbor and Neuron NML2 can differentiate between sources of incoming events and selectively emit events to channels. In pratice, however, only three such ports seem to be used in NML2 [fn:: Confirmed via `rg -Io 'port="[^"]+"' | sort -u`] - in :: Receive spikes - relay :: Forward spikes to sub-components - spike :: Emit spikes *** Implementation Details - ports :: Of these we only need to support ~in~ and ~relay~, the former is covered, the latter currently not. - weight :: As Arbor uses this construction for NMODL input #+begin_example net_receive(weight) { ... } #+end_example we need to use our prior knowledge about the semantics of the parameter and hard-code it into our NMODL translator. - vpeer :: Similar to ~weight~, sometimes needed for ~gradedSynapse~, can be modelled in Arbor's NMODL dialect. * Gap Junctions Gap junctions are similar to synapses and have the following hierarchy #+begin_example xml #+end_example The issue here with our scheme is that neither ~peer~ nor ~vpeer~ exist in our implementation of NML2. However, Arbor's NMODL dialect exposes ~v_peer~ as a global property. * NMODL Export While we have hinted at NMODL and Arbor a few times so far, our implementation of NML2 has been almost completely generic. To export to NMODL though, we need to bridge a significant gap. After we have processed the instantiations into either an instance or a collapsed instance, we edit the resulting model based on special cases. At the moment, let us focus on current based models only, ie all our instances need to produce an ionic transmembrane current, dubbed ~iX~ in NMODL. Later, we might extend this to concentration models. For synapses and gap junctions this is defined directly. Ion channels produce only a conductance value, ~g~, so we add a derived variable ~iX =g (v - eX)~ where ~X~ is replaced by the ionic species. Based on the respective sections on implementation details, we edit the syntax trees to eliminate explicit definitions of ~vpeer~ and similar values, replacing them with Arbor's built-in variables. We then build individual blocks of the NMODL language from the AST by inspecting dependencies of the state variables and building the appropriate chains of expressions. All intermediates are stored as ~LOCAL~ to minimise memory accesses. The style defined by NML2, especially the * Kinetic Schemes Kinetic schemes describe ion channels by a set of discrete states and transition probabilities between those states. Again we reproduce a simplified version of the ~ionChannelKS~ hierarchy. #+begin_src xml #+end_src Apart from the type of ~gates~ this is functionally identical to ~ionChannelHH~. Now we delve into the definitions of ~gateKS~ #+begin_src xml #+end_src Again, not much is new here, apart from ~KineticScheme~, a list of states is given in ~states~ and their transition rates in ~transitions~. Next, we look into the description of these transitons #+begin_src xml #+end_src Again not much of a surprise here, but we do need to figure out what ~Link~ means. The final component is the linked state #+begin_src xml #+end_src Now, let us consider a simple example from the NML2 sources. #+begin_src xml #+end_src we can interpret this as - define two populations transitioning between states open ~o1~ and closed ~c1~ - call these fractions ~o1_occupancy~ and ~c1_occupancy~ - conserve ~o1_occupancy + c1_occupancy = 1~ for all times - transition ~c1 -> o1~ with rate ~ft/rate/r~ - transition ~o1 -> c1~ with rate ~rt/rate/r~ - calculate - ~ft/rate/r~ :: as prescribed by ~HHExpLinearRate~ - ~rt/rate/r~ :: as prescribed by ~HHExpRate~ In NMODL we would like to generate a ~KINETIC~ block like this #+begin_example KINETIC scheme { : ... snip ... gates_n_states_o1_to_c1 = gates_n_transitions_ft_rr + gates_n_transitions_rt_rr gates_n_states_c1_to_o1 = gates_n_transitions_ft_rf + gates_n_transitions_rt_rf gates_m_states_o1_to_c1 = gates_m_transitions_ft_rr + gates_m_transitions_rt_rr gates_m_states_c1_to_o1 = gates_m_transitions_ft_rf + gates_m_transitions_rt_rf ~ gates_n_states_o1_occupancy <-> gates_n_states_c1_occupancy (gates_n_states_o1_to_c1, gates_n_states_c1_to_o1) ~ gates_m_states_o1_occupancy <-> gates_m_states_c1_occupancy (gates_m_states_o1_to_c1, gates_m_states_c1_to_o1) } #+end_example where multiple rates between the same two states were merged. Note that - ~gates_n_transitions_ft_rr = 0~ - ~gates_n_transitions_ft_rf = 0~ - ~gates_m_transitions_ft_rr = 0~ - ~gates_m_transitions_ft_rf = 0~ ** Question ~KineticScheme~ ~KineticScheme~ seems to break the abstractions put in place. Looking at ~nodes~ and ~edges~ which _must_ be of kind ~children~, otherwise the semantics would not work out. Similarly, both must be located directly under the scheme in the hierarchy. However, a ~select~ statement would have expressed the same thing more idiomatically?! #+begin_src xml #+end_src * The Final Building Block So far, we have dealt exclusively with channels, but NML2 requires some more levels above this for completely specilying the dynamics - Cell :: defines dynamics of membrane potential in terms of currents (mediated by channels) - BioPhys Properties :: specifies capacitance, resistivities. - Membrane Properties :: initial potential, channels. - Channel Density :: gives the area density of a channel. In order to interact with cable models like Arbor or Neuron, we need to pull these layers into the channel descriptions -- partially at least. In the end we would like to automatically compose Arbor simulations from NML2 files, but for now, we have to extract those values manually. Working our way up #+begin_src xml #+end_src As noted under [[NMODL Export]] we compute ~iX = g(E - U)~ where ~g~ is the ~ionChannel~'s conductance, see above ~g = fopen * conductance~. The parameter ~condDensity~ added in ~channelDensityCond~ is identical to ~conductance~. Thus, we need to retain ~conductance~ and set it to the value of ~condDensity~. * Networks Networks are important for us for both actual network support and simulation inputs. We note the following about the network layout in NML2. Instances for input and synapse appear at top-level, next to cells and networks. Thus any =.nml= file is eligible to define them. Networks are defined in terms of _populations_ and _projections_, see below. *NOTE* Connections in NML2 use zero delay by default. This will break Arbor's performance. ** Populations Populations can be written in two ways inside a ~network~ - ~~ :: ~n~ duplicates of ~X~; using ~MultiInstantiate~ which seems to have no particular definition. - ~~ :: give children as list of ~~ with a 3D position, all have type ~X~. For us, there seems to be no practical difference between them. In both cases ~X~ names a ~ComponentType~ deriving ~baseCell~. ** Projections Projections specify pre- and post-synaptic populations and a connection type, eg a synapse. Then, a list of connections is given as tuples =(pre: location, post: location)= where =location= takes the form =(cell: id, segment: id, fraction: [0, 1])= Target cells are _selected_ by the relative paths and an index, eg ~target=../pop0[0]~ addresses the population ~pop0~, one tag up from here, and picks cell zero from there. If ~pop0~ has a single member, the index is optional. Alternatively it seems (?) we can address using ~pop//Cell~. *** Questions - Does ~[ix]~ query by id or offset? It queries by offset into a flat array, like in ~population~. - What's the difference between ~pop//Cell~ and ~/pop[]~? - ~pop//Cell~ :: for ~populationList~ only. NB. in ~projection/connection~ entries the path is relative to ~projection~ *not* ~connection~. - ~/pop[]~ :: see above - Why do connections have to traverse the tree and do not work relative to their pre/post populations? Seems counterintuitive. It is, but ~jLEMS~ just is that way. - How does Arbor translate segments into its internal rep? Can we directly use it like ~branch~? We have ~(segment )~ and can use ~(on-components (segment ) )~ to target locations. *** Example #+begin_src xml #+end_src ** Inputs - inputs are similar to (one-sided) connections - targets are written as - id, default = 0 - fraction, default = 0.5 - current stimuli derive =basePointCurrent= - pulse, sine, cosine - these we need to rewrite into ~i_clamp~ - relevant tags: - =explicitInput= :: - =inputList= :: - /spike/ inputs on the other hand, become ~event_generator~ - these derive from ~baseSpikeSource~ *** Example #+begin_src xml #+end_src ** Connectivity: From NML2 to Arbor In Arbor, we need these items to produce a spike-based connection - Source :: a spike detector bound to a locset, eg ~d.place('(root)', A.spike_detector(-10), 'detector'))~ - Target :: a synapse bound to a locset, eg ~d.place('(terminal)', A.synapse('expsyn'), 'synapse')~ - Connection :: Returned by the ~connections_on(here)~ callback, eg ~[A.connection((there, 'detector'), 'synapse', weight, delay)]~ * Concentration Models First, duplicate the ~ComponentType~ for reference with the usual simplifications #+begin_src xml #+end_src We remove the hard-coded ~iCa~ requirement and replace it ad-hoc with ~iX~ where ~X~ is given in ~species~. Currently, concentration models *do not work in Arbor*, since they would require access to ~area~, which is not exposed. We work around this by approximating ~A = 4 pi r^2~ which assumes a given CV is roughly spherical or a cylinder (we count only ever the mantle, not the contact faces to adjacent CVs) that has the same diameter and height. The latter works out to be the same formula as a sphere of the same radius.