Welcome to compress_pickle’s documentation!

Standard python pickle, thinly wrapped with standard compression libraries

https://img.shields.io/badge/code%20style-black-000000.svg https://dev.azure.com/lucianopazneuro/lucianopazneuro/_apis/build/status/lucianopaz.compress_pickle?branchName=master https://codecov.io/gh/lucianopaz/compress_pickle/branch/master/graph/badge.svg https://img.shields.io/pypi/v/compress_pickle.svg https://img.shields.io/badge/License-MIT-purple.svg

The standard pickle package provides an excellent default tool for serializing arbitrary python objects and storing them to disk. Standard python also includes broad set of data compression packages. compress_pickle provides an interface to the standard pickle.dump, pickle.load, pickle.dumps and pickle.loads functions, but wraps them in order to direct the serialized data through one of the standard compression packages. This way you can seemlessly serialize data to disk or to any file-like object in a compressed way.

compress_pickle supports python >= 3.6, and can run on Ubuntu, macOs and Windows. If you must support python 3.5, please install compress_pickle==v1.1.1.

Supported compression protocols:

Furthermore, compress_pickle supports the lz4 compression protocol, that isn’t part of the standard python compression packages. This is provided as an optional extra requirement that can be installed as:

pip install compress_pickle[lz4]

Installation

compress_pickle is available in PyPI

pip install compress_pickle

compress_pickle does not have external requirements, it only depends on standard python packages, and is platform independent.

Usage

compress_pickle provides two main functions: dump and load. dump serializes a python object that can be pickled and writes it to a path or any file-like object. dump and load handle the opening a file object from a specified str, bytes or os.PathLike path. The desired compression level can be inferred from the supplied path’s extension. For example, to store a regular dictionary to disk without compression, one can simply provide the .pkl extension or specify that the compression protocol must be None or "pickle":

>>> from compress_pickle import dump, load
>>> obj = dict(key1=[None, 1, 2, "3"] * 10000, key2="Test key")
>>> fname1 = "uncompressed_data.pkl"  # We can save to an uncompressed pickle file
>>> dump(obj, fname1)

If instead of writing to a path in the disk, one wants to write to a file-like obejct, for example io.BytesIO, we do it like this:

>>> from compress_pickle import dump, load
>>> import io
>>> obj = dict(key1=[None, 1, 2, "3"] * 10000, key2="Test key")
>>> stream = io.BytesIO()  # Users must close the stream manually after they finish using it.
>>> dump(obj, stream)

or to an open file stream:

>>> from compress_pickle import dump, load
>>> obj = dict(key1=[None, 1, 2, "3"] * 10000, key2="Test key")
>>> with open("file", "wb") as f:
>>>     dump(obj, f, compression=None, set_default_compression=False)

The load function uncompresses and loads the serialized objects from a specified path or file-like object. The compression protocol can be inferred from the path’s extension, or a compression protocol can be speficied. Note that by default, load and dump set the compression protocol’s default extension to the supplied path before loading or saving. This behavior can be changed with the set_default_extension parameter.

>>> obj2 = load(fname1)
>>> obj2["key1"] == obj["key1"]
True
>>> id(obj2) != id(obj)
>>> obj2["key2"] == obj["key2"]
True

To compress the saved data, we can supply a different known extension, or specify the compression protocol we want:

>>> fname2 = "gzip_compressed_data.gz"  # The compression is inferred from the extension
>>> dump(obj, fname2)
>>> obj2 = load(fname2)
>>> obj2["key1"] == obj["key1"]
True
>>> id(obj2) != id(obj)
>>> obj2["key2"] == obj["key2"]
True
>>> # Now we specify the compression protocol and don't set the default extension
>>> fname3 = "gzip_compressed_data"  # The compression must be specified
>>> dump(obj, fname3, compression="lzma", set_default_extension=False)
>>> obj2 = load(fname3, compression="lzma", set_default_extension=False)
>>> obj2["key1"] == obj["key1"]
True
>>> id(obj2) != id(obj)
>>> obj2["key2"] == obj["key2"]
True

We can check that the compressed files actually take up less disk space with standard os.path.getsize.

>>> os.path.getsize(fname1)
70134
>>> os.path.getsize(fname2)
285
>>> os.path.getsize(fname3)
232

compress_pickle also provides the dumps and loads functions that serializes and compresses an object and returns the resulting bytes or unserializes and uncompresses an object from a given bytes object. For example

>>> from compress_pickle import dumps, loads
>>> obj = " ".join(["content"] * 100)
>>> b = dumps(obj, compression="gzip")
>>> b
b'\x1f\x8b\x08\x00H;\xc9]\x02\xffj`\x99\x1a\xcc\x00\x01=\xfe\xc9\xf9y%\xa9y%\nT\xa2\xa7\xe8\x01\x00\x00\x00\xff\xff\x03\x00\x9e\x98\xd6$^\x00\x00\x00'
>>> loads(b, compression="gzip")
'content content content content content content content content content content'

For more information please refer to the API.

Available compression and pickling protocols

compress_pickle supports many compression protocols and pickling backends. To see the compression protocols and pickling backends that are available to you, try to run:

>>> from compress_pickle import get_known_picklers, get_known_compressions
>>> get_known_picklers()
...
>>> get_known_compressions()
...

These are the names of the available compression and pickling protocols. Furthermore, compress_pickle also registers a mapping between the known compressions, and the associated filename extensions. To see the mapping from known extensions to compression protocols run this:

>>> from compress_pickle import get_registered_extensions
>>> get_registered_extensions()
...

As a final comment, compress_pickle can also set the filename extension to the registerred default value. To see the mapping between compression protocols and the associated default filename extension run this:

>>> from compress_pickle import get_default_compression_mapping
>>> get_default_compression_mapping()
...

Acknowledgements

Many the ideas used in this package were suggested on stackoverflow. However, I did not find any PyPI package that centralized their implementations, so I wrote this small package. Any suggestions or input is very welcome. Also, please report any problems you may encounter. Pull requests are more than welcome.

Indices and tables