--- jupytext: formats: md:myst text_representation: extension: .md format_name: myst kernelspec: display_name: Python 3 language: python name: python3 substitutions: example: | ``````{card} :class-header: sd-bg-dark sd-text-light Example LABEL ^^^ `````{tab-set} ````{tab-item} YAML :class-label: pt-0 ```{literalinclude} ../examples/tutorial/example_LABEL.yaml :language: yaml ``` ```` ````{tab-item} Drawing :class-label: pt-0 %DRAWING_TAG ```{glue:} example_LABEL ``` ```` ````` `````` --- ```{code-cell} :tags: ["remove-cell"] import pathlib import demes import demesdraw from IPython.display import set_matplotlib_formats from myst_nb import glue set_matplotlib_formats("svg") examples = pathlib.Path(".").parent.resolve() / ".." / "examples" / "tutorial" def size_max(graph): return max( max(epoch.start_size, epoch.end_size) for deme in graph.demes for epoch in deme.epochs ) ``` (sec_tutorial)= # Tutorial ## YAML A Demes model is written as a [YAML](https://en.wikipedia.org/wiki/YAML) file. A YAML file contains `property: value` pairs, and Demes imposes restrictions by allowing only properties with specific names, which have specific types of values. YAML files have the extension `.yaml`, or sometimes `.yml`. Don't worry if you've never heard of YAML before, as the details of YAML aren't particularly important. The example below gives an indication of what a Demes file looks like, but we'll explain each component gradually using additional examples. For now, select the "Drawing" tab to see a diagrammatic overview of the demographic model. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_01.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_01", ax.figure, display=False) ``` {{ example.replace("LABEL", "01") }} ```{note} The Demes specification defines a data model, not the YAML format itself. A demographic model that uses the Demes data model can be easily converted to any desired data-serialisation language. ``` ## Terminology We use the word "Demes" with a capital D to refer to the Demes data model or the Demes specification, and "deme" or "demes" with a lowercase d when we refer to a collection of individuals (a deme) or a property in the Demes data model. ## What is a deme? A deme is a collection of individuals that share a set of population parameters at any given time. Consider the simplest population model: a single deme with constant population size. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_02.yaml") ax = demesdraw.size_history(graph, inf_label=True, invert_x=True, annotate_epochs=True) glue("example_02", ax.figure, display=False) ``` {{ example.replace("LABEL", "02").replace("%DRAWING_TAG", ":selected:") }} This deme exists now (zero generations ago), and also exists as far back in time as we're interested in looking (towards infinity generations ago). In Demes, we say that the deme has an infinite `start_time`. Select the "YAML" tab above the drawing to see how this model was implemented. ```{note} In YAML, infinity is spelled `.inf`, not `inf`. ``` ```{note} Each Demes model must define at least one deme with an infinite `start_time`. ``` ### Default values Having a deme's `start_time` property be infinite is very common. So for demes without ancestors, the `start_time` may be omitted, and will default to `.inf`. Similarly, the `end_time` property of the final epoch may be omitted, and will default to `0`. The following Demes file describes an equivalent model. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_03.yaml") ax = demesdraw.size_history(graph, inf_label=True, invert_x=True, annotate_epochs=True) glue("example_03", ax.figure, display=False) ``` {{ example.replace("LABEL", "03") }} ## What is an epoch? We partition a deme's interval of existence into distinct `epochs`. A deme always has at least one epoch. The population parameters for a deme are allowed to change over time, but are fixed within an epoch. Consider a single deme that suffered a bottleneck 50 generations ago. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_04.yaml") ax = demesdraw.size_history( graph, inf_ratio=0.5, inf_label=True, invert_x=True, annotate_epochs=True ) glue("example_04", ax.figure, display=False) ``` {{ example.replace("LABEL", "04").replace("%DRAWING_TAG", ":selected:") }} To specify the change in the population size, we've introduced a second epoch into the YAML file. We now need three time values to define the epoch boundaries: the `start_time` of the deme, the `end_time` of epoch 0, and the `end_time` of epoch 1. All epochs must be listed in time-descending order (from the past towards the present). The same model can be written using default values for the deme's `start_time` and the final epoch's `end_time`. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_05.yaml") ax = demesdraw.size_history( graph, inf_ratio=0.5, inf_label=True, invert_x=True, annotate_epochs=True ) glue("example_05", ax.figure, display=False) ``` {{ example.replace("LABEL", "05") }} ### Why don't epochs have a `start_time` too? The start time of an epoch can be inferred indirectly, by looking at the deme's `start_time` (for epoch 0), or by looking at the previous epoch's `end_time` (for epoch 1, 2, etc.). ## Exponential size changes In the previous model, the population size was constant in each epoch. But what if we wanted to model a period of exponential population size growth, or an exponential decay? In any given epoch, there are actually two population size parameters, `start_size` and `end_size`, which correspond to the size at the start and end of the epoch. If both parameters have the same value, then the size is constant over the epoch. However, If an epoch's `start_size` and `end_size` properties are not equal, then the epoch is defined to have an exponentially-changing population size over the epoch's time interval. Two important implications of this system are: * We don't need to specify a rate parameter for the exponential. If this really is needed, it can be calculated from the `start_size`, `end_size`, `start_time` and `end_time` values for an epoch. * Infinitely long epochs must have a constant population size. So if a deme has an infinite `start_time`, then `start_size` and `end_size` must be equal for epoch 0. To make these ideas more concrete, let's look at an implementation of the well-known ZigZag model from [Schiffels & Durbin (2014)](https://doi.org/10.1038/ng.3015). ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_06.yaml") ax = demesdraw.size_history( graph, inf_label=True, invert_x=True, annotate_epochs=True, log_time=True, ) glue("example_06", ax.figure, display=False) ``` {{ example.replace("LABEL", "06") }} We've introduced several new features to implement this model, so let's step through it from top to bottom. * The `doi` property is a list of strings corresponding to the DOI(s) for publication(s) in which the model was described. By convention, the elements of the `doi` list are URLs, but any string value can be used. * A compact-form syntax is used for each epoch in the `epochs` list (with curly braces `{` and `}`), rather than using multiple lines. This is known as "flow style" in YAML parlance. * Epoch 0 has a `start_size`, but no `end_size`. Because this epoch has an infinite time span, the population size must be constant, so the epoch's `end_size` will be the same as the `start_size`. * Epoch 1 has an `end_size`, but no `start_size`. So the `start_size` is inherited from the `end_size` of the previous epoch. This means the `end_size` and `start_size` for this epoch are different, and there will be exponential population growth over the epoch. * Epochs 2 through 5 also inherit their `start_size` from the previous epoch, and in each case these are different from the `end_size` provided. * Epochs 2 and 4 have exponential decay, whereas epochs 3 and 5 have exponential growth. * The final epoch has a constant size. ```{warning} Other modelling frameworks may use the terms "epoch" or "epochs" to refer to time intervals that partition an entire model. However, in Demes, epochs are a deme-specific property, and each deme has its own list of epochs which do not apply to the other demes in the model. ``` ## Multiple demes ### A split event Suppose we're interested in modelling two demes, `A` and `B`. The two demes are related by a common ancestor, from which they split 1000 generations ago. In addition to `A` and `B`, we'll model their common ancestor as an additional deme, `X`. Here, we introduce the `ancestors` property of a deme, which is a list of deme names. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_07.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1, inf_ratio=0.4) glue("example_07", ax.figure, display=False) ``` {{ example.replace("LABEL", "07") }} When a deme has an ancestor, its `start_time` does **not** default to `.inf`. In this case, the `start_time` for the deme is inherited from the `end_time` of the ancestor. I.e. for the model above, both `A` and `B` have a `start_time` 1000 generations ago. ```{note} A deme cannot appear in the demes list before its ancestor(s). This means models must be written in a "top down" manner, starting with the ancestral (root) deme(s), and followed by increasingly recent demes. ``` By convention, we use a more compact form for the ancestors list. Lists can be written more compactly with YAML "flow style", which uses square brackets (`[` and `]`, with a comma separating list items). The following Demes file describes an equivalent model. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_08.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1, inf_ratio=0.4) glue("example_08", ax.figure, display=False) ``` {{ example.replace("LABEL", "08") }} ### A branch event An alternative way of modelling a population split is for the ancestral deme to remain alive after the split. We will refer to this as a *branch* event, rather than a split event. In the model below, deme `A` has `X` as an ancestor like the previous model, except here `X` continues to exist until 0 generations ago (recall that 0 is the default value for the final epoch's `end_time`). Now that `A`'s ancestor exists until 0 generations ago we must explicitly provide a `start_time` for `A`. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_09.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=w) ax = demesdraw.tubes(graph, positions=positions, seed=1, inf_ratio=0.4) glue("example_09", ax.figure, display=False) ``` {{ example.replace("LABEL", "09") }} ## Multiple ancestors When a deme has multiple ancestors, these appear in the `ancestors` list as one might expect. But for multiple ancestors we need to also specify the proportion of ancestry inherited from each ancestor. This is done using the deme's `proportions` list property. The first proportion in the `proportions` list is for the first ancestor in the `ancestors` list, the second proportion is for the second ancestor, and so on. Just like the case of a single ancestor, an ancestor can terminate at the descendant's `start_time`, or can instead continue to exist. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_10.yaml") w = 1.5 * size_max(graph) positions = dict(X=-w, Y=w, A=0) ax = demesdraw.tubes(graph, positions=positions, seed=1, inf_ratio=0.4) glue("example_10", ax.figure, display=False) ``` {{ example.replace("LABEL", "10") }} ```{note} With multiple ancestors, the `start_time` of the descendant deme does not default to the `end_time` of any of its ancestors. So the `start_time` must always be specified for a deme with multiple ancestors. ``` ## Continuous migration Let's again consider a model with demes `A` and `B`, which are related via a common ancestor `X` (a split event). ### Asymmetric migration To define continuous migration from `A` to `B`, we'll add an entry to the `migrations` list. Concretely, we are modelling migrants born in deme `A`, the `source` deme, who (potentially) have offspring in deme `B`, the `dest` deme. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_11.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_11", ax.figure, display=False) ``` {{ example.replace("LABEL", "11") }} The migrations here occur at a `rate` of `1e-4` per generation, and occur continuously over the lifetime of both `A` and `B`. ```{warning} Migration rate units are always "per generation" regardless of the chosen value for `time_units` (described later). ``` In the example above, `A` and `B` have identical existence time intervals. If the `source` and `dest` demes do not have identical start or end times, then by default the migration will occur over the period of time when both demes exist simultaneously. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_12.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_12", ax.figure, display=False) ``` {{ example.replace("LABEL", "12") }} To obtain greater control over when migrations occur, we can use the `start_time` and `end_time` migration properties. Below we model three periods of continuous migration. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_13.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_13", ax.figure, display=False) ``` {{ example.replace("LABEL", "13") }} ```{admonition} Question :class: question What do you think will happen if we list two or more migrations with time intervals that overlap? Does it matter if the `source` and `dest` demes are the same in each case? ``` ```{admonition} Answer :class: dropdown answer Overlapping migrations are allowed in general. But because it's not clear what the expected behaviour should be when overlapping migrations are defined with the same `source` and `dest`, it's an error to define such migrations. ``` ### Symmetric migration It's common to model migrants in both directions simultaneously, with the same `rate`. We could use multiple asymmetric migrations, but it's simpler to specify a list of deme names (the migration's `demes` property), instead of a `source` and `dest`. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_14.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_14", ax.figure, display=False) ``` {{ example.replace("LABEL", "14") }} The `demes` property can list arbitrarily many demes. For example, we could define symmetric migration between all pairwise combinations of four demes. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_15.yaml") w = 1.5 * size_max(graph) positions = dict(alpha=0, beta=w, gamma=2 * w, delta=3 * w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_15", ax.figure, display=False) ``` {{ example.replace("LABEL", "15") }} When the following three conditions are satisfied: * a symmetric migration is defined between more than two demes, * and the demes' start and/or end times are not all identical, * and the migration's `start_time` and/or `end_time` are omitted, then the time intervals for the migration are resolved separately for each pair of participating demes. E.g. for the model below, migration between `A` and `B` occurs at all times because both `A` and `B` exist for all time. However, migrations between `A` and `C`, and between `B` and `C`, are limited to the period of time after 100 generations ago, because `C` does not exist before then. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_16.yaml") w = 1.5 * size_max(graph) positions = dict(A=0, B=w, C=2 * w) ax = demesdraw.tubes(graph, positions=positions, inf_ratio=0.6, seed=1) glue("example_16", ax.figure, display=False) ``` {{ example.replace("LABEL", "16") }} ## A pulse of admixture To model migration that is limited to a very short period of time, we can define one or more `pulses`. A pulse has a `proportions` list property and a `sources` list property (analogous to the `proportions` and `ancestors` properties of a `deme`). Each pulse proportion defines the proportion of the `dest` deme that is made up of ancestry from the corresponding source deme at the instant after the pulse's `time`. ```{note} The exact duration of a pulse is not defined by the Demes specification. Software which implements a continuous-time model (such as a coalescent simulator) might treat a pulse as occurring instantaneously. In contrast, software which implements a discrete-time model is free to treat the pulse as occurring over a single time step (such as a single generation). ``` ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_17.yaml") w = 1.5 * size_max(graph) positions = dict(X=0, A=-w, B=w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_17", ax.figure, display=False) ``` {{ example.replace("LABEL", "17") }} ```{admonition} Excercise :class: exercise The `dest` deme in an admixture pulse could instead be modelled using multiple ancestors. Try doing this for deme `B` in the model above. What do you think are the advantages of one approach over the other? ``` ```{admonition} Question :class: question How should one interpret multiple pulses that occur at the same `time`? Does it matter whether the `sources` and `dest` demes are the same in each pulse? ``` ````{admonition} Answer :class: dropdown answer When multiple pulses are specified with the same `time`, the migration pulses occur in the order in which they are written. Consider the following two pulses into deme `A` at time 100. ```yaml pulses: - sources: [B] dest: A time: 100 proportions: [0.1] - sources: [C] dest: A time: 100 proportions: [0.2] ``` The second pulse replaces 20% of `A`'s ancestry, including 20% of the ancestry that was inherited from `B` in the first pulse. So immediately after time 100, `A` has 20% ancestry from `C` but only 8% ancestry from `B`. As this may be confusing, we recommend avoiding the use of multiple pulses in this way, and instead implement the model using multiple `sources` with the desired final ancestry `proportions`. ```yaml pulses: - sources: [B, C] dest: A time: 100 proportions: [0.08, 0.2] ``` More complex models involving multiple simultaneous pulses are possible, but we caution that they can be difficult to reason about. ```` (sec_tutorial_defaults)= ## Setting defaults To avoid duplication in a Demes graph with many features, it's possible to set default values for some properties. Suppose we wish to define multiple demes, each with only one epoch, and each with a constant population size. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_18.yaml") w = 1.5 * size_max(graph) positions = dict(alpha=0, beta=w, gamma=2 * w, delta=3 * w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_18", ax.figure, display=False) ``` {{ example.replace("LABEL", "18") }} The epoch defaults can be overridden by providing an explicit value inside the desired epoch. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_19.yaml") w = 1.5 * size_max(graph) positions = dict(alpha=0, beta=w, gamma=2 * w, delta=3 * w) ax = demesdraw.tubes(graph, positions=positions, seed=1, inf_ratio=0.5) glue("example_19", ax.figure, display=False) ``` {{ example.replace("LABEL", "19") }} It's also possible to provide defaults for properties of a deme, such as the `start_time` and `ancestors`. ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_20.yaml") w = 1.5 * size_max(graph) positions = dict(alpha=-2 * w, beta=-w, X=0, gamma=w, delta=2 * w) ax = demesdraw.tubes(graph, positions=positions, seed=1) glue("example_20", ax.figure, display=False) ``` {{ example.replace("LABEL", "20") }} ```{admonition} Question :class: question How would the model be interpreted if we failed to override the `start_time` and `ancestors` deme defaults for deme `X`? ``` ```{admonition} Answer :class: dropdown answer If we didn't override the default `start_time`, then `X` would have both a `start_time` and `end_time` of 1000. This would be invalid, because (a) there would be no deme with an infinite `start_time`, and (b) the time span over which `X` existed would be zero. If we didn't override the default `ancestors` list, then `X` would be in its own `ancestors` list. This would be invalid, because each ancestor in the `ancestors` list must already be defined (earlier in the `demes` list). This requirement has the pleasant side-effect that the directed graph of ancestor/descendant relations cannot have cycles. Alas, it is not possible to model time travel using Demes. ``` Defaults for `migration` and `pulse` objects may also be specified for elements of the `migrations` and `pulses` lists. ## Time units and generation time In the previous examples, we've exclusively set `time_units` to "generations". This is appropriate in many cases, but sometimes other units are preferred. For example, it's sometimes more natural to describe times using *years*, or even *thousands of years*. However, most simulation software operates using generations as the canonical unit of time. In Demes, the `time_units` property may be any string, but "generations" is special. If the `time_units` are **not** "generations", then an additional `generation_time` property must be specified. This latter property can then be used by the simulator to convert from the chosen `time_units` into units of generations. ```{warning} The units for the `rate` of `migrations` are always per generation, even when the `time_units` are not generations. Only the various `start_time`, `end_time`, and `time` properties should match the `time_units`. ``` ```{code-cell} :tags: ["remove-cell"] graph = demes.load(examples / "example_21.yaml") w = 1.1 * size_max(graph) positions = dict(ancestral=0, AMH=0, YRI=0, OOA=3*w/2, CEU=w, CHB=2*w) ax = demesdraw.tubes( graph, positions=positions, seed=1, log_time=True, num_lines_per_migration=5 ) glue("example_21", ax.figure, display=False) ``` {{ example.replace("LABEL", "21") }} ## Selfing and cloning Epochs also have `selfing_rate` and `cloning_rate` properties, which default to 0 if not specified. ```{todo} Give examples. ```