DAP to Netcdf Translation Rules

Two translations are currently available.

DAP 2 Protocol to netCDF-3
DAP 2 Protocol to netCDF-4

netCDF-3 Translation Rules

The current set of translation rules to convert an OPeNDAP DAP protocol version 2 DDS to netCDF-3 is designed to mimic as closely as possible those currently used by the libnc-dap system. Please note that the translation is still subject to change to respond to unforeseen problems and user suggestions.

For illustrative purposes, the following example will be used.

Dataset {
  Int32 f1;
  Structure {
    Int32 f11;        
    Structure {
      Int32 f1[3];
      Int32 f2;
    } FS2[2]; 
  } S1; 
  Structure {
    Grid {
      Array:
        Float32 temp[lat=2][lon=2];
      Maps:
        Int32 lat[lat=2];
        Int32 lon[lon=2];
    } G1;
  } S2;
  Grid {
      Array:
        Float32 G2[lat=2][lon=2];
      Maps:
        Int32 lat[2];
        Int32 lon[2];
  } G2;
  Int32 lat[lat=2];
  Int32 lon[lon=2];
} D1;

Variable Definition

The set of variables is defined by the fields with primitive base types as they occur in Sequences, Grids, and Structures. The field names are modified to be fully qualified initially. For the above, the set of variables, the variables are as follows.

f1
S1.f11
S1.FS2.f1
S1.FS2.f2
S2.G1.temp
S2.G1.lat
S2.G1.lon
S2.G2.G2
S2.G2.lat
S2.G2.lon
lat
lon

Variable Dimension Translation

A variable's rank is determined from three sources.

The variable has the dimensions associated with the field it represents (e.g. S1.FS2.f1[3] in the above example).
The variable inherits the dimensions associated with any containing structure that has a rank greater than zero. These dimensions precede those of case 1. Thus, we have in our example, f1[2][3], where the first dimension comes from the containing Structure FS2[2].
The variable's set of dimensions are altered if any of its containers is a DAP DDS Sequence. This is discussed more fully below.

Dimension translation

For dimensions, the rules are as follows.

Fields in dimensioned structures inherit the dimension of the structure; thus the above list would have the following dimensioned variables.
- S1.FS2.f1 -> S1.FS2.f1[2][3]
- S1.FS2.f2 -> S1.FS2.f2[2]
- S2.G1.temp -> S2.G1.temp[lat=2][lon=2]
- S2.G1.lat -> S2.G1.lat[lat=2]
- S2.G1.lon -> S2.G1.lon[lon=2]
- S2.G2.G2 -> S2.G2.lon[lat=2][lon=2]
- S2.G2.lat -> S2.G2.lat[lat=2]
- S2.G2.lon -> S2.G2.lon[lon=2]
- lat -> lat[lat=2]
- lon -> lon[lon=2]
Collect all of the dimension specifications from the DDS, both named and anonymous (unnamed) For each unique anonymous dimension with value NN create a netCDF dimension of the form "<array>_<i>=NN", where
is the fully qualified name of the variable and i is the i'th (inherited) dimension of the array where the anonymous dimension occurs. For our example, this would create the following dimensions.
- S1.FS2.f1_0 = 2 ;
- S1.FS2.f1_1 = 3 ;
- S1.FS2.f2_0 = 2 ;
- S2.G2.lat_0 = 2 ;
- S2.G2.lon_0 = 2 ;
If, however, the anonymous dimension is the single dimension of a MAP vector in a Grid then the dimension is given the same name as the map vector This leads to the following.
- S2.G2.lat_0 -> S2.G2.lat
- S2.G2.lon_0 -> S2.G2.lon
For each unique named dimension "<name>=NN", create a netCDF dimension of the form "<name>=NN", where name has the qualifications removed. If this leads to duplicates (i.e. same name and same value), then the duplicates are ignored. This produces the following.
- S2.G2.lat -> lat
- S2.G2.lon -> lon
Note that this produces duplicates.
At this point the only dimensions left to process should be named dimensions with the same name as some dimension from step number 3, but with a different value. For those dimensions create a dimension of the form "<name>M=NN" where M is a counter starting at 1. The example has no instances of this.
Finally and if needed, define a single UNLIMITED dimension named "unlimited" with value zero.

This leads to the following set of dimensions.

dimensions:
  unlimited = UNLIMITED;
  lat = 2 ;
  lon = 2 ;
  S1.FS2.f1_0 = 2 ;
  S1.FS2.f1_1 = 3 ;
  S1.FS2.f2_0 = 2 ;

Variable Name Translation

The steps for variable name translation are as follows.

Take the set of variables captured above. Thus for the above DDS, the following fields would be collected.
- f1
- S1.f11
- S1.FS2.f1
- S1.FS2.f2
- S2.G1.temp
- S2.G1.lat
- S2.G1.lon
- S2.G2.G2
- S2.G2.lat
- S2.G2.lon
- lat
- lon
All grid array variables are renamed to be the same as the containing grid and the grid prefix is removed. In the above DDS, this results in the following changes.
1. G1.temp -> G1
2. G2.G2 -> G2
Note that, for example, the G1.lon keeps that name. Also note that libnc-dap just drops the grid map variables, so this is one place where the translation differs from libnc-dap, but in a compatible way.

It is important to note that if this process could produce duplicate variables (i.e. with the same name); in that case they are all assumed to have the same content and the duplicates are ignored. If it turns out that the duplicates have different content, then the translation will not detect this. YOU HAVE BEEN WARNED.

The final netCDF-3 schema (minus attributes) is then as follows.

netcdf t {
dimensions:
        unlimited = UNLIMITED
        lat = 2 ;
        lon = 2 ;
        S1.FS2.f1_0 = 2 ;
        S1.FS2.f1_1 = 3 ;
        S1.FS2.f2_0 = 2 ;
variables:
        int f1 ;
        int lat(lat) ;
        int lon(lon) ;
        int S1.f11 ;
	int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
        int S1.FS2.f2(S1_FS2_f2_0) ;
        float S2.G1(lat, lon) ;
	int S2.G1.lat(lat) ;
	int S2.G1.lon(lon) ;
        float G2(lat, lon) ;
        int G2.lat(lat) ;
        int G2.lon(lon) ;
}

In actuality, the unlimited dimension is dropped because it is unused.

There are differences with the original libnc-dap here because libnc-dap technically was incorrect. The original would have said this, for example.

int S1.FS2.f1(lat, lat) ;

Note that this is incorrect because it dimensions S1.FS2.f1(2,2) rather than S1.FS2.f1(2,3).

Translating DAP DDS Sequences

Any variable (as determined above) that is contained directly or indirectly by a Sequence is subject to revision of its rank using the following rules.

Let the variable be contained in Sequence Q1, where Q1 is the innermost containing sequence. If Q1 is itself contained (directly or indirectly) in a sequence, or Q1 is contained (again directly or indirectly) in a structure that has rank greater than 0, then the variable will have an initial UNLIMITED dimension. However, all dimensions coming from "above" and including (in the containment sense) the innermost Sequence, Q1, will be removed and replaced by the single UNLIMITED dimension. The size associated with that UNLIMITED is zero, which means that its contents are inaccessible through the netcdf-3 API. Again, this differs from libnc-dap, which leaves out such variables. Again, however, this difference is compatible.
If the variable is contained in a single Sequence (i.e. not nested) and all containing structures have rank 0, then the variable will have an initial dimension whose size is the record count for that Sequence. The name of the new dimension will be the name of the Sequence.

Consider this example.

Dataset {
  Structure {
    Sequence {
      Int32 f1[3];
      Int32 f2;
    } SQ1;
  } S1[2]; 
  Sequence {
    Structure {
      Int32 x1[7];
    } S2[5];
  } Q2;
} D;

The corresponding netcdf-3 translation is pretty much as follows (the value for dimension Q2 may differ).

dimensions:
    unlimited = UNLIMITED ; // (0 currently)
    S1.SQ1.f1_0 = 2 ;
    S1.SQ1.f1_1 = 3 ;
    S1.SQ1.f2_0 = 2 ;
    Q2.S2.x1_0 = 5 ;
    Q2.S2.x1_1 = 7 ;
    Q2 = 5 ;
variables:
    int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
    int S1.SQ1.f2(unlimited) ;
    int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;

Note that for example S1.SQ1.f1_0 is not actually used because it has been folded into the unlimited dimension.

Note that there is a performance cost because the translation code has to walk the data to determine how many records are associated with the sequence. Since libnc-dap did something similar, it can be assumed that the cost is not prohibitive.

netCDF-4 Translation Rules

The DAP to netCDF-4 translation is enabled if the "--enable-netcdf-4" option is specified at configure time. This translation includes some elements of the libnc-dap translation, but attempts to provide a simpler (but not, unfortunately, simple) set of translation rules than is used for the netCDF-3 translation. Please note that the translation is still subject to change to respond to unforeseen problems or to suggested improvements.

This text will use this running example.

Dataset {
  Int32 f1[fdim=10];
  Structure {
    Int32 f11;        
    Structure {
      Int32 f1[3];
      Int32 f2;
    } FS2[2]; 
  } S1; 
  Grid {
    Array:
      Float32 temp[lat=2][lon=2];
    Maps:
      Int32 lat[2];
      Int32 lon[2];
  } G1;
  Sequence {
    Float64 depth;
  } Q1;
} D

Variable Definition

The rules for choosing variables is as follows.

Start with the names of the top-level fields of the DDS. The term top-level means that the object is a direct subnode of the Dataset object. In our example, this produces the set [f1, S1, G1, Q1].
Replace all Grid objects with the fully qualified list of array and map fields of the grid. Our variable set then becomes [f1, S1, G1.temp, G1.lat, G1.lon, Q1]. Note that the libnc-dap practice of re-naming the array variable to be that of the Grid is not used.
Attempt to remove the prefix Grid name from the top-level Grid array and map variables. If that eventually conflicts with some other name, then leave the conflicting Grids alone. Our variable set then becomes [f1, S1, temp, lat, lon, Q1].
If the Grid array name is the same as the Grid name, then remove the prefix Grid name (not shown).

Dimension Definition

The rules for choosing and defining dimensions is as follows.

Collect the set of dimensions (named and anonymous) directly associated with the variables as defined above. This means that dimensions within user-defined types are ignored. From our example, the dimension set is [fdim=10,lat=2,lon=2,2,2]. Note that the unqualified names are used.
If an anonymous dimension is associated with a Grid Map variable, then given the dimension, the name of the map. Our dimension set now becomes [fdim=10,lat=2,lon=2,lat=2,lon=2].
All remaining anonymous dimensions are given the name "<var>_NN", where "<var>" is the unqualified name of the variable in which the anonymous dimension appears and NN is the relative position of that dimension in the dimensions associated with that array. No instances of this rule occur in the running example.
Remove duplicate dimensions (those with same name and value). Our dimension set now becomes [fdim=10,lat=2,lon=2].
The final case occurs when there are dimensions with the same name but with different values. For this case, the size of the dimension is appended to the dimension name.

Type Definition

The rules for choosing user-defined types are as follows.

For every Structure, Sequence, and non-top-level Grid, netcdf-4 compound type is created whose fields are the fields of the Structure, Sequence, or Grid. The name of the type is the same as the Structure or Grid name suffixed with "_t". However, the compound types derived from Sequences are instead suffixed with "_record_t".
The types of the fields are the types of the corresponding field of the Structure, Sequence, or Grid. Note that this type might be itself a user-defined type.
From the example, we get the following compound types.
```
compound FS2_t {
    int f1[3];
    int f2;
};
compound S1_t {
    int f11;
    FS2_t FS2[2];  
};
compound Q1_record_t {
    double depth;
};
```
For all sequences of name X, also create this type.
```
    X_record_t (*) X_t
```
In our example, this produces the following type.
```
    Q1_record_t (*) Q1_t
```
If a Sequence, Q has a single field F, whose type is a primitive type, T, (e.g., int, float, string), then do not apply the previous rule, but instead replace the whole sequence with the the following field.
```
    T (*) Q.f
```
Attempt to maximally shorten the type names as long as there is no conflict.

Choosing a Translation

The decision about whether to translate to netCDF-3 (libnc-dap) or netCDF-4 is determined by applying the following rules in order.

If the NC_CLASSIC_MODEL flag is set on nc_open(), then netcdf-3 (i.e. libnc-dap) translation is used.
If the NC_NETCDF4 flag is set on nc_open(), then netCDF-4 translation is used.
If the URL is prefixed with the string "[mode=netcdf3]" or "[mode=libnc-dap]", then the libnc-dap translation is used.
If the URL is prefixed with the string "[mode=netcdf4]", then the netCDF-4 translation described below is used.
If none of the above is specified, then the default is "[mode=libnc-dap]".

Defined Client Parameters

Currently, a limited set of client parameters is recognized. Parameters not listed here are ignored, but no error is signalled.

Parameter Name	Legal Values	Semantics
[mode=...]	libnc‑dap\|netcdf3\|netcdf4	Specify the translation to be applied to the DAP data source on the client side.
[show=...]	das\|dds\|url	This causes information to appear as specific global attributes. The tags may be combined using comma with no spaces (e.g. "show=dds,url"). The currently recognized tags are "dds" to display the underlying DDS, "das" similarly, and "url" to display the url used to retrieve the data.