This section describes the rules which apply for storing single data fields in data files.
Group and field Names used within NeXus follow a naming convention described by the following rules:
[1] | The class name is the value assigned to the NX_class attribute of an HDF5 group in the NeXus data file. This class name is different than the name of the HDF5 group. This is important when not using the NAPI to either read or write the HDF5 data file. |
Regular expression pattern for NXDL group and field names
It is recommended that all group and field names contain only these characters:
and that they begin with a lower case letter. This is the regular expression used to check this recommendation.
1 | [a-z_][a-z\d_]*
|
The length should be limited to no more than 63 characters (imposed by the HDF5 rules for names).
It is recognized that some facilities will construct group and field names with upper case letters. NeXus data files with upper case characters in the group or field names might not be accepted by all software that reads NeXus data files. Hence, group and field names that do not pass the regular expression above but pass this expression:
1 | [A-Za-z_][\w_]*
|
will be flagged as a warning during data file validation.
Use of underscore in descriptive names
Sometimes it is necessary to combine words in order to build a descriptive name for a data field or a group. In such cases lowercase words are connected by underscores.
1 | number_of_lenses
|
For all data fields, only names from the NeXus base class dictionaries should be used. If a data field name or even a complete component is missing, please suggest the addition to the NIAC: The NeXus International Advisory Committee. The addition will usually be accepted provided it is not a duplication of an existing field and adequately documented.
Note
The NeXus base classes provide a comprehensive dictionary of terms that can be used for each class. The expected spelling and definition of each term is specified in the base classes. It is not required to provide all the terms specified in a base class. Terms with other names are permitted but might not be recognized by standard software. Rather than persist in using names not specified in the standard, please suggest additions to the NIAC: The NeXus International Advisory Committee.
NeXus stores multi-dimensional arrays of physical values in C language storage order, where the last dimension is the fastest varying. This is the rule. Good reasons are required to deviate from this rule.
It is possible to store data in storage orders other than C language order.
As well it is possible to specify that the data needs to be converted first before being useful. Consider one situation, when data must be streamed to disk as fast as possible and conversion to C language storage order causes unnecessary latency. This case presents a good reason to make an exception to the standard rule.
In order to indicate that the storage order is different from C storage order two additional data set attributes, offset and stride, have to be stored which together define the storage layout of the data. Offset and stride contain rank numbers according to the rank of the multidimensional data set. Offset describes the step to make when the dimension is multiplied by 1. Stride defines the step to make when incrementing the dimension. This is best explained by some examples.
Offset and Stride for 1 D data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | * raw data = 0 1 2 3 4 5 6 7 8 9
size[1] = { 10 } // assume uniform overall array dimensions
* default stride:
stride[1] = { 1 }
offset[1] = { 0 }
for i:
result[i]:
0 1 2 3 4 5 6 7 8 9
* reverse stride:
stride[1] = { -1 }
offset[1] = { 9 }
for i:
result[i]:
9 8 7 6 5 4 3 2 1 0
|
Offset and Stride for 2D Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | * raw data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
size[2] = { 4, 5 } // assume uniform overall array dimensions
* row major (C) stride:
stride[2] = { 5, 1 }
offset[2] = { 0, 0 }
for i:
for j:
result[i][j]:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
* column major (Fortran) stride:
stride[2] = { 1, 4 }
offset[2] = { 0, 0 }
for i:
for j:
result[i][j]:
0 4 8 12 16
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
* "crazy reverse" row major (C) stride:
stride[2] = { -5, -1 }
offset[2] = { 4, 5 }
for i:
for j:
result[i][j]:
19 18 17 16 15
14 13 12 11 10
9 8 7 6 5
4 3 2 1 0
|
Offset and Stride for 3D Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | * raw data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
size[3] = { 3, 4, 5 } // assume uniform overall array dimensions
* row major (C) stride:
stride[3] = { 20, 5, 1 }
offset[3] = { 0, 0, 0 }
for i:
for j:
for k:
result[i][j][k]:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
35 36 37 38 39
40 41 42 43 44
45 46 47 48 49
50 51 52 53 54
55 56 57 58 59
* column major (Fortran) stride:
stride[3] = { 1, 3, 12 }
offset[3] = { 0, 0, 0 }
for i:
for j:
for k:
result[i][j][k]:
0 12 24 36 48
3 15 27 39 51
6 18 30 42 54
9 21 33 45 57
1 13 25 37 49
4 16 28 40 52
7 19 31 43 55
10 22 34 46 58
2 14 26 38 50
5 17 29 41 53
8 20 32 44 56
11 23 35 47 59
|
description | matching regular expression |
---|---|
integer | NX_INT(8|16|32|64) |
floating-point | NX_FLOAT(32|64) |
array | (\\[0-9\\])? |
valid item name | ^[A-Za-z_][A-Za-z0-9_]*$ |
valid class name | ^NX[A-Za-z0-9_]*$ |
NeXus supports numeric data as either integer or floating-point numbers. A number follows that indicates the number of bits in the word. The table above shows the regular expressions that matches the data type specifier.
NeXus dates and times should be stored using the ISO 8601 [2] format, e.g. 1996-07-31T21:15:22+0600. The standard also allows for time intervals in fractional seconds with 1 or more digits of precision. This avoids confusion, e.g. between U.S. and European conventions, and is appropriate for machine sorting.
[2] | ISO 8601: http://www.w3.org/TR/NOTE-datetime |
strftime() format specifiers for ISO-8601 time
%Y-%m-%dT%H:%M:%S%z
Note
Note that the T appears literally in the string, to indicate the beginning of the time element, as specified in ISO 8601. It is common to use a space in place of the T, such as 1996-07-31 21:15:22+0600. While human-readable (and later allowed in a relaxed revision of the standard), compatibility with libraries supporting the ISO 8601 standard is not assured with this substitution. The strftime() format specifier for this is “%Y-%m-%d %H:%M:%S%z”.
Given the plethora of possible applications of NeXus, it is difficult to define units to use. Therefore, the general rule is that you are free to store data in any unit you find fit. However, any data field must have a units attribute which describes the units, Wherever possible, SI units are preferred. NeXus units are written as a string attribute (NX_CHAR) and describe the engineering units. The string should be appropriate for the value. Values for the NeXus units must be specified in a format compatible with Unidata UDunits [3] Application definitions may specify units to be used for fields using an enumeration.
[3] | The UDunits specification also includes instructions for derived units. At present, the contents of NeXus units attributes are not validated in data files. |
NeXus allows to store multi dimensional arrays of data. In most cases it is not sufficient to just have the indices into the array as a label for the dimensions of the data. Usually the information which physical value corresponds to an index into a dimension of the multi dimensional data set. To this purpose a means is needed to locate appropriate data arrays which describe what each dimension of a multi dimensional data set actually corresponds too. There is a standard HDF facility to do this: it is called dimension scales. Unfortunately, at a time, there was only one global namespace for dimension scales. Thus NeXus had to come up with its own scheme for locating axis data which is described here. A side effect of the NeXus scheme is that it is possible to have multiple mappings of a given dimension to physical data. For example a TOF data set can have the TOF dimension as raw TOF or as energy.
There are two methods of linking each data dimension to its respective dimension scale. The preferred method uses the axes attribute to specify the names of each dimension scale. The original method uses the axis attribute to identify with an integer the axis whose value is the number of the dimension. After describing each of these methods, the two methods will be compared. A prerequisite for both methods is that the data fields describing the axis are stored together with the multi dimensional data set whose axes need to be defined in the same NeXus group. If this leads to data duplication, use links.
The preferred method is to define an attribute of the data itself called axes. The axes attribute contains the names of each dimension scale as a colon (or comma) separated list in the order they appear in C. For example:
Preferred way of denoting axes
1 2 3 4 5 6 7 | data:NXdata
time_of_flight = 1500.0 1502.0 1504.0 ...
polar_angle = 15.0 15.6 16.2 ...
some_other_angle = 0.0 0.0 2.0 ...
data = 5 7 14 ...
@axes = polar_angle:time_of_flight
@signal = 1
|
The original method is to define an attribute of each dimension scale called axis. It is an integer whose value is the number of the dimension, in order of fastest varying dimension. That is, if the array being stored is data with elements data[j][i] in C and data(i,j) in Fortran, where i is the time-of-flight index and j is the polar angle index, the NXdata group would contain:
Original way of denoting axes
1 2 3 4 5 6 7 8 9 10 11 | data:NXdata
time_of_flight = 1500.0 1502.0 1504.0 ...
@axis = 1
@primary = 1
polar_angle = 15.0 15.6 16.2 ...
@axis = 2
@primary = 1
some_other_angle = 0.0 0.0 2.0 ...
@axis = 1
data = 5 7 14 ...
@signal = 1
|
The axis attribute must be defined for each dimension scale. The primary attribute is unique to this method of linking.
There are limited circumstances in which more than one dimension scale for the same data dimension can be included in the same NXdata group. The most common is when the dimension scales are the three components of an (hkl) scan. In order to handle this case, we have defined another attribute of type integer called primary whose value determines the order in which the scale is expected to be chosen for plotting, i.e.
If there is more than one scale with the same value of the axis attribute, one of them must have set primary=1. Defining the primary attribute for the other scales is optional.
Note
The primary attribute can only be used with the first method of defining dimension scales discussed above. In addition to the signal data, this group could contain a data set of the same rank and dimensions called errors containing the standard deviations of the data.
In general the method using the axes attribute on the multi dimensional data set should be preferred. This leaves the actual axis describing data sets unannotated and allows them to be used as an axis for other multi dimensional data. This is especially a concern as an axis describing a data set may be linked into another group where it may describe a completely different dimension of another data set.
Only when alternative axes definitions are needed, the axis method should be used to specify an axis of a data set. This is shown in the example above for the some_other_angle field where axis=1 denotes another possible primary axis for plotting. The default axis for plotting carries the primary=1 attribute.
Both methods of linking data axes will be supported in NeXus utilities that identify dimension scales, such as NXUfindaxis().
There are very different types of detectors out there. Storing their data can be a challenge. As a general guide line: if the detector has some well defined form, this should be reflected in the data file. A linear detector becomes a linear array, a rectangular detector becomes an array of size xsize times ysize. Some detectors are so irregular that this does not work. Then the detector data is stored as a linear array, with the index being detector number till ndet. Such detectors must be accompanied by further arrays of length ndet which give azimuthal_angle, polar_angle and distance for each detector.
If data from a time of flight (TOF) instrument must be described, then the TOF dimension becomes the last dimension, for example an area detector of xsize vs. ysize is stored with TOF as an array with dimensions xsize, ysize, ntof.
Monitors, detectors that measure the properties of the experimental probe rather than the sample, have a special place in NeXus files. Monitors are crucial to normalize data. To emphasize their role, monitors are not stored in the NXinstrument hierarchy but on NXentry level in their own groups as there might be multiple monitors. Of special importance is the monitor in a group called control. This is the main monitor against which the data has to be normalized. This group also contains the counting control information, i.e. counting mode, times, etc.
Monitor data may be multidimensional. Good examples are scan monitors where a monitor value per scan point is expected or time-of-flight monitors.
Any program whose aim is to identify the default plottable data should use the following procedure:
Start at the top level of the NeXus data file.
Loop through the groups with class NXentry until the next step succeeds.
Open the NXentry group and loop through the subgroups with class NXdata until the next step succeeds.
Open the NXdata group and loop through the fields for the one field with attribute signal="1". Note: There should be only one field that matches.
This is the default plottable data.
Having found the default plottable data and its dimension scales: make the plot.
Consult the NeXus API section, which describes the routines available to program these operations. In the course of time, generic NeXus browsers will provide this functionality automatically.