Global Specifications ===================== File structure -------------- The ESCDF file format is based on `HDF5 `__. An HDF5 file contains mainly two types of objects: groups and datasets (see the HDF5 documentation for further details). The groups are arranged in a way that is similar to a file system. The root group of an HDF5 file is denoted as "/", while "/foo" refers to a group named "foo" that is stored in the file root group. The data stored following the ESCDF specifications should be stored within one group, which we will hereafter refer to as the ESCDF root group. Quite often this will be the actual root group of the HDF5 file, but it can also be any other group within the file. This means that a given HDF5 file might contain more than one ESCDF root group, thus storing information about several independent systems/calculations. The contents of the ESCDF root group are further divided into global variables and groups. The global variables are used for a general description of the file, mainly the file format convention, while the groups contain the actual data. The names of the allowed groups are the following: - **system** - **basis\_sets** - **densities** - **potentials** - **states** - **extensions** As suggested by their names, the groups are used to store different types of data. A detailed description of the global variables and the groups can be found in the following sections. The simplest use case consist in storing the information generated by a single calculation for a given system, but one might also want to store information about several calculation and/or systems in the same file. Here is how the file structure could look like for differenct use cases: - One system, one calculation: | ``/system`` | ``/densities`` | ``/basis_sets`` | ``/basis_sets/foo`` | ``/basis_sets/bar`` In this case the file contains the description of one system and the data of one density, but two basis sets named **foo** and **bar**. - Many systems, one calculation: | ``/system/foo`` | ``/system/bar`` | ``/densities`` In this case the file contains the description of two systems named **foo** and **bar**, but only one density. This could correspond to a case where the total density of two systems is obtained from a single calculation. - Many systems, one calculation per system: | ``/id1/system`` | ``/id1/basis_sets`` | ``/id1/densities`` | ``/id2/system`` | ``/id2/basis_sets`` | ``/id2/densities`` In this case the file contains two ESCDF root groups named **id1** and **id2**. Each one of these groups contains the description of one system, one basis set and the data of one density. Global variables ---------------- The ESCDF root group must have the following attributes: - **file\_format** - **file\_format\_version** - **Conventions** The ESCDF root group may also have the following optional attributes: - **history** - **title** Global attributes provide general information on the file format being used, as well as the contents and history of the file. - **file\_format**: char(80) (always ``ESCDF``) The name of the data standard. - **file\_format\_version**: float (e.g., ``1.1``, ``1.2``, ``2.0``, etc.) Version number for the data standard. - **Conventions**: char(80) (e.g., ``http://esl.cecam.org/``) Where the data standard specifications can be found on the Internet. - **history**: char(1024) Each code modifying/writing the file is encouraged to add a line about itself in the history attribute. char(1024) allows for 12 additions of at most 80 characters. - **title**: char(80) A short description of the contents of the file (i.e., the physical system). Global conventions ------------------ Flag-like variables ~~~~~~~~~~~~~~~~~~~ HDF5 does not support a boolean datatype. Flag-like variables should be stored as char(3), with allowed values ``yes`` and ``no``. When such attributes are written, they should be written in full length and small letters. When they are read, only the first character needs to be checked (i.e., ``y`` or ``n``). Dimensional variables (physical units) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most variables are dimensionless. If a variable does have physical dimensions, the default is to use Hartree atomic units. However, different units can be specified by including two optional attributes to the variable (if it is a dataset): - **scale\_to\_atomic\_units**: double The appropriate value in atomic units is obtained by multiplying the number found in the variable by this scaling factor. For example, if an energy variable is recorded in eV, **scale\_to\_atomic\_units** should be set to ``0.036749326``. - **units**: char(80) The name or definition of the units being used. This attribute is only used for informative purposes; only **scale\_to\_atomic\_units** should be used to read the file. Links of interest ----------------- The following links constitute useful inspirations for the development of the ESCDF specifications, API, and library: - `NoMaD metainfo `__ - `HDF Group Mailing List `__ - `NeXus, a common data format for neutron, x-ray and muon science `__ - `Data Explorer, a multi-platform graphical browser for data files - HDF support under way `__ - `Single-pass NetCDF, from Elizabeth Fischer `__