Brief thoughts on metadata
Let us say that we have a data object and we want to manage metadata about the object. There are multiple ways of handling this.
1. Encapsulated Pointer
class MetaObject { meta_info *md_data; Data* data; }
The meta-object contains information about the data and a pointer to the object that it describes.
Advantages:
- Works everywhere.
- Intuitive.
- Meta-objects could be a template class in modern languages
Disadvantages:
- Pointer dereferencing may take more time than other methods of implementation.
2. Extending the class
class Data { ... /* and also */ meta_info *md_data; }
Advantages:
- Intuitive.
- Maintaining fewer classes keeps the code base simpler.
Disadvantages:
- Requires the ability to modify the original class.
- Using one object for two purposes is bad style.
3. Serialization
struct SerializedData { struct meta_info md_data; Data data; /* not a pointer */ }
The metadata and data are merged together in one structure in memory.
Advantages:
- Possible speed advantage from placing the data and metadata adjacent in memory, due to memory cacheing and fewer reads.
Disadvantages:
- Using one object for two purposes is bad style.
- The metadata must have a defined length for this to work as intended.
- If the data is already in memory, you have to use more memory to create the SerializedData set.
- Any code designed to run on a set of Data objects must be rewritten to run on a set of SerializedData objects, assuming that the language does not allow you to define an iterator.
Serialization is best used when the data is written or read directly to/from a separate medium such as a hard drive or a network, and when it is known that no intermediary step will expect to work on a set of unserialized data.
The popular convention seems to be to use some other method to manage metadata in memory and to include a serialization function for writing to disk or across a network.
4. Shared key
#define DATA_LEN 1234 Data[DATA_LEN] = []; /* an indexed set of data objects */ meta_info[DATA_LEN] = []; /* an indexed set of metadata objects */
An old technique from the '80s that has been abandoned for a reason. May be useful when it is known that the set of data will not change. Advantages:
- Can be used if your library needs to iterate over an array of the basic Data object.
- Possible speed advantage in using integer keys on static buffers.
- Increased code complexity.
- Any change to the order of one list will not be reflected in the other list, increasing the likelihood of bugs.