Support for Versioning

The use-case for support for versioning that Typedpy addresses is deserializing different versions of the dictionary representing the structure, to match the latest Structure definition.

For example, suppose you saved a JSON object that adheres to some strict schema, and occasionally that schema had to evolve to include new fields, or update existing ones. When you read it again, you always want to map to the latest schema, which is what you are currently using. Importantly, this is not meant to be used to manage evolving schema changes in a performant production system. In such a case, use a proper schema management framework, such as Alembic.

The requirement: Typedpy expects the dictionary to have a “version” field, that has increasing consecutive integer values, starting from 1. Next, you need to provide a list of mappers. Each mapper defines how to convert from the corresponding version to the next. These mappers are, in general, similar to the regular “serialization mappers” of Typedpy, but they add 2 new features:

  1. Constant(val) - the value is a constant. This is useful when you introduced a new field in a more recent version and you want to define that when converting older versions, we need to populate it with a default value.

  2. Deleted - The newer version no longer has this field, and it should be dropped

Typedpy has two types of API’s to handle it.

Low-Level API

Example of usage:

versions_mapping = [
        # Mapping from version 1 to 2
    {
        "j": Constant(100),
        "old_bar._mapper": {
            "a": FunctionCall(func=lambda x: [i * 2 for i in x], args=["a"]),
        },
        "old_m": Constant({"abc": "xyz"})
    },

        # mapping from version 2 to 3
    {
        "old_bar._mapper": {
            "s": "sss",
            "sss": Deleted
        },
        "bar": "old_bar",
        "m": "old_m",
        "old_m": Deleted,
        "old_bar": Deleted,
    },

        # mapping from version 3 to 4
    {
        "i": FunctionCall(func=lambda x: x * 100, args=["i"])
    }
]

in_version_1 = {
    "version": 1,
    "old_bar": {
        "a": [5, 8, 2],
        "sss": "john",
    },
    "i": 2,
    "old_m": {"a": "aa", "b": "bb"}
}

assert convert_dict(in_version_1, versions_mapping) == {
    "version": 4,
    "bar": {
        "a": [10, 16, 4],
        "s": "john",
    },
    "i": 200,
    "j": 100,
    "m": {"abc": "xyz"},
}

Explanation: The function convert_dict will apply all the applicable mappings based on the version of the input. In this case, it needs to replay all of them, since the raw input is at version 1.

To further illustrate, let’s follow the steps.

After the first mapper (i.e. version 2), the input looks like:

{'version': 2, 'old_bar': {'a': [10, 16, 4], 'sss': 'john'}, 'i': 2, 'old_m': {'abc': 'xyz'}, 'j': 100}

After the the second mapper (i.e. version 3), the input looks like:

{'version': 3, 'i': 2, 'j': 100, 'bar': {'a': [10, 16, 4], 's': 'john'}, 'm': {'abc': 'xyz'}}

And finally, after the last mapper, it looks like:

{'version': 4, 'i': 200, 'j': 100, 'bar': {'a': [10, 16, 4], 's': 'john'}, 'm': {'abc': 'xyz'}}

High-Level API

Typedpy defines a special Structure that is effectively a marker for a versioned structure: Versioned

class Versioned(version, **kwargs)[source]

Marks a structure as can be deserialized from multiple versions. The version is expected to start with 1 and increase by 1 in every update. It is expected to have a class attribute of “_versions_mapping”, with an ordered list of the mappings. The first mapping maps version 1 to 2, the second 2 to 3, etc.

Arguments:

_versions_mapping: optional

An array of mappers that outlines how to convert older versions to the latest version.

The example below clarifies how to use Versioned together with deserializer.

class Foo(Versioned, ImmutableStructure):
    bar: Bar
    i: Integer
    j: Integer
    m: Map[String, String]

    _versions_mapping = [
        {   # version 1 -> 2
            "j": Constant(100),
            "old_bar._mapper": {
                "a": FunctionCall(func=lambda x: [i * 2 for i in x], args=["a"]),
            },
            "old_m": Constant({"abc": "xyz"})
        },

        {   # version 2 -> 3
            "old_bar._mapper": {
                "s": "sss",
                "sss": Deleted
            },
            "bar": "old_bar",
            "m": "old_m",
            "old_m": Deleted,
            "old_bar": Deleted,
        },

        {   # version 3 -> 4
            "i": FunctionCall(func=lambda x: x * 100, args=["i"])
        }

    ]


# Once defined, the deserializer applies the versions conversion as the first step.
# For example:
in_version_1 = {
    "version": 1,
    "old_bar": {
        "a": [5, 8, 2],
        "sss": "john",
    },
    "i": 2,
    "old_m": {"a": "aa", "b": "bb"}
}
assert Deserializer(Foo).deserialize(in_version_1) == Foo(
    bar=Bar(a=[10, 16, 4], s="john"),
    m={"abc": "xyz"},
    i=200,
    j=100,
)

Note that the version field is populated automatically when instantiating Foo, to be the latest. From TypedPy 2.19.4, “version” is a mandatory field in the payload to be deserialized.

What About Serialization?

Serialization is typically not an issue, since you probably want to serialize an object based on the latest (i.e current) version of the object definition.