geners is hosted by Hepforge, IPPP Durham

Geners — Generic Serialization for C++

The "geners" package is designed to address the problem of C++ object persistence in situations where the most typical data access pattern is "write once read many" (WORM). This access pattern is very common in scientific projects — a data recording device or a simulation program creates the original set of objects which is later reused (typically, for the purposes of data analysis and presentation of results) by other programs. "Geners" is, more or less, a set of tools and conventions which allows its users to develop C++ classes that can be converted to and from a storable stream of bytes in a well-organized and type-safe manner. Serialization of STL containers is supported, including the ones added in the C++11 standard. Independent versioning of each class definition is allowed. "Geners" code depends only on the standard C++ facilities and on two well-established portable data compression libraries, zlib and bzip2. These libraries are included by default in many Linux distributions.

The functionality of "geners" is somewhat similar to that provided by the boost serialization package. Compared to boost serialization, "geners" has a number of important differences (both advantages and drawbacks). Some of the advantages are listed below:
  • "Geners" archives provide random access to stored objects.
  • The archives can be searched using object labels assigned at the time the objects are stored. The object metadata can be examined without accessing the objects themselves.
  • File-based "geners" archives have built-in support for data compression (combining compression with serialization is somewhat cumbersome with boost). Random access and compression can be used together.
  • With proper wrappers for stored classes (e.g., generated by SWIG), "geners" archives can be easily accessed from scripting languages.
  • "Geners" can be used to create and serialize very large "archive-based" objects (e.g., collections of tuples) which do not fit in the computer memory.
  • Serializable classes do not have to be implemented using template-based code, so compilation times can be shorter.
  • "Geners" is a small package with very limited external dependencies.
Compared to boost serialization, "geners" drawbacks are:
  • Only binary archives are currently implemented, there is no text or XML storage. As in boost, binary format is not portable across different platforms (archives created on 32-bit machines can not be read on 64-bit machines, little endian architectures are not compatible with big endian, etc). It would not be difficult to maintain binary portability, but at this time the package author simply does not have access to a sufficiently wide selection of hardware.
  • Serialization and deserialization are handled by separate methods of a class (in boost, these tasks can often be accomplished by a single template). For simple classes, this usually results in higher line counts of serialization-specific code than what is necessary for boost.
  • Implementing non-intrusive serialization is less transparent.
  • There is no explicit support for solving the pointer aliasing problem — pointer management is left completely up to the user. In "geners", every pointer serialization results in deep save and restore. Multiple saves of the same pointer result in multiple copies of the pointee object in the archive. Serialization of pointers to pointers is not supported and cyclic pointer arrangements are not detected.
It should be pointed out that, in scientific applications, storing multiple copies of an object is often the right thing to do — if the goal is to capture object evolution in time. The "no stored pointers" policy simplifies construction of archive APIs for scripting languages which do not have built-in support for pointers, such as Python or Tcl. For scientific applications that have to process significant amounts of data, "geners" advantages over boost serialization (in particular, random access to stored objects) easily outweigh the drawbacks. "Geners" is more tightly coupled to the C++ stream facilities than boost serialization, and the random access functionality does explicitly rely on the C++ streams interface.

If you are a habitual ROOT user for object I/O, you will be pleasantly surprised by the fact that "geners" serialization does not rely on inheritance (nothing like TObject is needed), so that addition of serialization capabilities to your high performance classes does not have to cripple them by imposing method dispatch via a virtual function table. Default constructor presence is not relied upon either.

The easiest way to learn how to use "geners" is to take a look at the programs included in the examples directory of the "geners" distribution. Read the example code (it is reasonably well commented), run it, modify it and see what happens. To give you a glimpse of the "geners" API, a few code snippets are linked below:

hello_geners.cc — illustrates very basic usage of the archive I/O
SimpleSerializable1.hh, SimpleSerializable1.cc — example serializable class

The package code distribution comes with a number of command line tools which permit examination of "geners" archive catalogs and, with some limitations, of the archived objects.

Download Geners

Contact: Igor Volobouev