Tutorial¶
In this tutorial we’ll see how to add HDF5 serialization to classes. Let’s start with defining a simple class:
In [1]: class Snek:
...: def __init__(self, length):
...: self.length = length
...: def __repr__(self):
...: return '≻:' + '=' * self.length + '>···'
...:
In [2]: Snek(10)
Out[2]: ≻:==========>···
To make this Snek HDF5 serializable, we need to answer these questions three:
- How is the Snek serialized to HDF5?
- How is the HDF5 converted back into a Snek?
- What is your favourite colour the unique tag identifying the Snek class?
To define how the Snek is serialized to HDF5, we add a to_hdf5
method. This method is passed a hdf5_handle
, which is a h5py.File
or h5py.Group
defining the (current) root of the HDF5 file where the object should be added.
For de-serialization, the from_hdf5
classmethod should be implemented. Again, this method is passed a hdf5_handle
. It should return the deserialized object.
Finally, the subscribe_hdf5()
class decorator is used to define a unique type_tag
which identifies this class.
Note
The type_tag
needs to be unique across all projects using fsc.hdf5_io
. For this reason, you should always prepend it with the name of your module.
In [3]: from fsc.hdf5_io import subscribe_hdf5, HDF5Enabled
In [4]: @subscribe_hdf5('my_snek_module.snek')
...: class HDF5Snek(Snek, HDF5Enabled):
...: def to_hdf5(self, hdf5_handle):
...: hdf5_handle['length'] = self.length
...: @classmethod
...: def from_hdf5(cls, hdf5_handle):
...: return cls(hdf5_handle['length'][()])
...:
In [5]: HDF5Snek(12)
Out[5]: ≻:============>···
Notice also that we inherit from HDF5Enabled
. This abstract base class checks for the existence of the HDF5 (de-)serialization functions, and adds methods to_hdf5_file
and from_hdf5_file
to save and load directly to a file.
Now we can use the save()
and load()
methods to save and load Sneks in HDF5 format:
In [6]: from fsc.hdf5_io import save, load
In [7]: from tempfile import NamedTemporaryFile
In [8]: mysnek = HDF5Snek(12)
In [9]: with NamedTemporaryFile() as f:
...: save(mysnek, f.name)
...: snek_clone = load(f.name)
...:
In [10]: snek_clone
Out[10]: ≻:============>···
You can also save and load lists or dictionaries containing Sneks:
In [11]: with NamedTemporaryFile() as f:
....: save([HDF5Snek(2), HDF5Snek(4)], f.name)
....: snek_2, snek_4 = load(f.name)
....:
In [12]: print(snek_2, snek_4)
≻:==>··· ≻:====>···
A common use case is to serialize all the attributes of an object, a base
class SimpleHDF5Mapping
exists for this case. A subclass needs to
define a lists HDF5_ATTRIBUTES
of attributes that should be serialized.
The attribute names must be the same as the arguments accepted by the
constructor.
We can re-write the Snek
as
In [13]: from fsc.hdf5_io import SimpleHDF5Mapping
In [14]: @subscribe_hdf5('my_snek_module.simplified_snek')
....: class SimplifiedHDF5Snek(Snek, SimpleHDF5Mapping):
....: HDF5_ATTRIBUTES = ['length']
....:
In [15]: new_snek = SimplifiedHDF5Snek(9)
In [16]: with NamedTemporaryFile() as f:
....: save(new_snek, f.name)
....: new_snek_clone = load(f.name)
....:
In [17]: new_snek_clone
Out[17]: ≻:=========>···
We can extend the Snek functionality by adding a list of friends:
In [18]: @subscribe_hdf5('my_snek_module.snek_with_friends')
....: class SnekWithFriends(SimplifiedHDF5Snek):
....: HDF5_ATTRIBUTES = SimplifiedHDF5Snek.HDF5_ATTRIBUTES + ['friends']
....: def __init__(self, length, friends):
....: super().__init__(length)
....: self.friends = friends
....:
In [19]: snek_with_friends = SnekWithFriends(3, friends=[mysnek, new_snek])
In [20]: snek_with_friends
Out[20]: ≻:===>···
In [21]: snek_with_friends.friends