Global Specifications

File structure

The ESCDF file format is based on HDF5. An HDF5 file contains mainly two types of objects: groups and datasets (see the HDF5 documentation for further details). The groups are arranged in a way that is similar to a file system. The root group of an HDF5 file is denoted as “/”, while “/foo” refers to a group named “foo” that is stored in the file root group.

The data stored following the ESCDF specifications should be stored within one group, which we will hereafter refer to as the ESCDF root group. Quite often this will be the actual root group of the HDF5 file, but it can also be any other group within the file. This means that a given HDF5 file might contain more than one ESCDF root group, thus storing information about several independent systems/calculations.

The contents of the ESCDF root group are further divided into global variables and groups. The global variables are used for a general description of the file, mainly the file format convention, while the groups contain the actual data. The names of the allowed groups are the following:

  • system
  • basis_sets
  • densities
  • potentials
  • states
  • extensions

As suggested by their names, the groups are used to store different types of data. A detailed description of the global variables and the groups can be found in the following sections.

The simplest use case consist in storing the information generated by a single calculation for a given system, but one might also want to store information about several calculation and/or systems in the same file. Here is how the file structure could look like for differenct use cases:

  • One system, one calculation:
/system
/densities
/basis_sets
/basis_sets/foo
/basis_sets/bar

In this case the file contains the description of one system and the data of one density, but two basis sets named foo and bar.

  • Many systems, one calculation:
/system/foo
/system/bar
/densities

In this case the file contains the description of two systems named foo and bar, but only one density. This could correspond to a case where the total density of two systems is obtained from a single calculation.

  • Many systems, one calculation per system:
/id1/system
/id1/basis_sets
/id1/densities
/id2/system
/id2/basis_sets
/id2/densities

In this case the file contains two ESCDF root groups named id1 and id2. Each one of these groups contains the description of one system, one basis set and the data of one density.

Global variables

The ESCDF root group must have the following attributes:

  • file_format
  • file_format_version
  • Conventions

The ESCDF root group may also have the following optional attributes:

  • history
  • title

Global attributes provide general information on the file format being used, as well as the contents and history of the file.

  • file_format: char(80) (always ESCDF) The name of the data standard.
  • file_format_version: float (e.g., 1.1, 1.2, 2.0, etc.) Version number for the data standard.
  • Conventions: char(80) (e.g., http://esl.cecam.org/) Where the data standard specifications can be found on the Internet.
  • history: char(1024) Each code modifying/writing the file is encouraged to add a line about itself in the history attribute. char(1024) allows for 12 additions of at most 80 characters.
  • title: char(80) A short description of the contents of the file (i.e., the physical system).

Global conventions

Flag-like variables

HDF5 does not support a boolean datatype. Flag-like variables should be stored as char(3), with allowed values yes and no. When such attributes are written, they should be written in full length and small letters. When they are read, only the first character needs to be checked (i.e., y or n).

Dimensional variables (physical units)

Most variables are dimensionless. If a variable does have physical dimensions, the default is to use Hartree atomic units. However, different units can be specified by including two optional attributes to the variable (if it is a dataset):

  • scale_to_atomic_units: double The appropriate value in atomic units is obtained by multiplying the number found in the variable by this scaling factor. For example, if an energy variable is recorded in eV, scale_to_atomic_units should be set to 0.036749326.
  • units: char(80) The name or definition of the units being used. This attribute is only used for informative purposes; only scale_to_atomic_units should be used to read the file.