|
Geners — Generic Serialization for C++
The "geners" package is designed to address the problem of C++ object
persistence in situations where the most typical data access pattern is
"write once read many" (WORM). This access pattern is very common in
scientific projects — a data recording device or a simulation program
creates the original set of objects which is later reused (typically,
for the purposes of data analysis and presentation of results) by other
programs.
"Geners" is, more or less, a set of tools and conventions which allows
its users to develop C++ classes that can be converted to and from a
storable stream of bytes in a well-organized and type-safe manner.
Serialization of STL containers is supported, including the ones added
in the C++11 standard. Independent versioning of each class
definition is allowed. "Geners" code depends only on the standard C++
facilities and on two well-established portable data compression
libraries, zlib and bzip2. These libraries are included by default in many Linux distributions.
The functionality of "geners" is somewhat similar to that provided by the boost serialization
package. Compared to boost serialization, "geners" has a number of
important differences (both advantages and drawbacks). Some of the
advantages are listed below:
- "Geners" archives provide random access to stored objects.
- The archives can be searched using object labels assigned at the
time the objects are stored. The object metadata can be examined
without accessing the objects themselves.
- File-based
"geners" archives have built-in support for data
compression (combining compression with serialization is somewhat
cumbersome with boost). Random access and compression can be used
together.
- With proper wrappers for stored classes (e.g., generated by SWIG), "geners" archives can be easily accessed from scripting languages.
- "Geners" can be used to create and serialize very large
"archive-based" objects (e.g., collections of tuples) which do not fit
in the computer memory.
- Serializable classes do not have to be implemented using template-based code, so compilation times can be shorter.
- "Geners" is a small package with very limited external dependencies.
Compared to boost serialization, "geners" drawbacks are:
- Only binary archives are currently implemented, there is no text
or XML storage. As in boost, binary format is not portable
across different platforms (archives created on 32-bit machines can not
be read on 64-bit machines, little endian architectures are not
compatible with big endian, etc). It would not be difficult to maintain
binary portability, but at this time the package author simply does not
have access to a sufficiently wide selection of hardware.
- Serialization and deserialization are handled by separate methods
of a class (in boost, these tasks can often be accomplished by a single template). For simple classes, this
usually results in higher line counts of serialization-specific code
than what is necessary for boost.
- Implementing non-intrusive serialization is less transparent.
- There is no explicit support for solving the pointer aliasing problem
— pointer management is left completely up to the user. In "geners", every pointer serialization results in deep save and restore. Multiple saves
of the same pointer result in multiple copies of the pointee object in
the archive.
Serialization of pointers to pointers is not supported and cyclic pointer arrangements are not detected.
It should be pointed out that, in scientific applications, storing
multiple copies of an object is often the right thing to do — if the
goal is to capture object evolution in time. The "no stored pointers"
policy simplifies construction of archive APIs for scripting languages
which do
not have built-in support for pointers, such as Python or Tcl. For
scientific applications that have to process significant amounts of
data, "geners" advantages over boost serialization (in particular,
random access to stored objects)
easily outweigh the drawbacks. "Geners" is more tightly coupled to the
C++ stream facilities than boost serialization, and the random access functionality
does explicitly rely on the C++ streams interface.
If you are a habitual ROOT user for
object I/O, you will be pleasantly surprised by the fact that "geners"
serialization does not rely on inheritance (nothing like TObject
is needed), so that addition of serialization capabilities to your high
performance classes does not have to cripple them by imposing method
dispatch via a virtual function table. Default constructor presence is
not relied upon either.
The easiest way to learn how to use "geners" is to take a look at the
programs included in the examples directory of the "geners"
distribution. Read the example code (it is reasonably well commented),
run it, modify it and see what happens. To give you a glimpse of the
"geners" API, a few code snippets are linked below:
hello_geners.cc — illustrates very basic usage of the archive I/O
SimpleSerializable1.hh, SimpleSerializable1.cc — example serializable class
The package code distribution comes with a number of command line tools which permit examination of "geners" archive catalogs and, with some limitations, of the archived objects.
Download Geners
Contact: Igor Volobouev
|