Spaces:
Paused
Paused
| Metadata-Version: 2.1 | |
| Name: cloudpickle | |
| Version: 3.0.0 | |
| Summary: Pickler class to extend the standard pickle.Pickler functionality | |
| Home-page: https://github.com/cloudpipe/cloudpickle | |
| License: BSD-3-Clause | |
| Author: The cloudpickle developer team | |
| Author-email: [email protected] | |
| Requires-Python: >=3.8 | |
| Description-Content-Type: text/markdown | |
| Classifier: Development Status :: 5 - Production/Stable | |
| Classifier: Intended Audience :: Developers | |
| Classifier: License :: OSI Approved :: BSD License | |
| Classifier: Operating System :: POSIX | |
| Classifier: Operating System :: Microsoft :: Windows | |
| Classifier: Operating System :: MacOS :: MacOS X | |
| Classifier: Programming Language :: Python :: 3.8 | |
| Classifier: Programming Language :: Python :: 3.9 | |
| Classifier: Programming Language :: Python :: 3.10 | |
| Classifier: Programming Language :: Python :: 3.11 | |
| Classifier: Programming Language :: Python :: 3.12 | |
| Classifier: Programming Language :: Python :: Implementation :: CPython | |
| Classifier: Programming Language :: Python :: Implementation :: PyPy | |
| Classifier: Topic :: Software Development :: Libraries :: Python Modules | |
| Classifier: Topic :: Scientific/Engineering | |
| Classifier: Topic :: System :: Distributed Computing | |
| # cloudpickle | |
| [](https://github.com/cloudpipe/cloudpickle/actions) | |
| [](https://codecov.io/github/cloudpipe/cloudpickle?branch=master) | |
| `cloudpickle` makes it possible to serialize Python constructs not supported | |
| by the default `pickle` module from the Python standard library. | |
| `cloudpickle` is especially useful for **cluster computing** where Python | |
| code is shipped over the network to execute on remote hosts, possibly close | |
| to the data. | |
| Among other things, `cloudpickle` supports pickling for **lambda functions** | |
| along with **functions and classes defined interactively** in the | |
| `__main__` module (for instance in a script, a shell or a Jupyter notebook). | |
| Cloudpickle can only be used to send objects between the **exact same version | |
| of Python**. | |
| Using `cloudpickle` for **long-term object storage is not supported and | |
| strongly discouraged.** | |
| **Security notice**: one should **only load pickle data from trusted sources** as | |
| otherwise `pickle.load` can lead to arbitrary code execution resulting in a critical | |
| security vulnerability. | |
| Installation | |
| ------------ | |
| The latest release of `cloudpickle` is available from | |
| [pypi](https://pypi.python.org/pypi/cloudpickle): | |
| pip install cloudpickle | |
| Examples | |
| -------- | |
| Pickling a lambda expression: | |
| ```python | |
| >>> import cloudpickle | |
| >>> squared = lambda x: x ** 2 | |
| >>> pickled_lambda = cloudpickle.dumps(squared) | |
| >>> import pickle | |
| >>> new_squared = pickle.loads(pickled_lambda) | |
| >>> new_squared(2) | |
| 4 | |
| ``` | |
| Pickling a function interactively defined in a Python shell session | |
| (in the `__main__` module): | |
| ```python | |
| >>> CONSTANT = 42 | |
| >>> def my_function(data: int) -> int: | |
| ... return data + CONSTANT | |
| ... | |
| >>> pickled_function = cloudpickle.dumps(my_function) | |
| >>> depickled_function = pickle.loads(pickled_function) | |
| >>> depickled_function | |
| <function __main__.my_function(data:int) -> int> | |
| >>> depickled_function(43) | |
| 85 | |
| ``` | |
| Overriding pickle's serialization mechanism for importable constructs: | |
| ---------------------------------------------------------------------- | |
| An important difference between `cloudpickle` and `pickle` is that | |
| `cloudpickle` can serialize a function or class **by value**, whereas `pickle` | |
| can only serialize it **by reference**. Serialization by reference treats | |
| functions and classes as attributes of modules, and pickles them through | |
| instructions that trigger the import of their module at load time. | |
| Serialization by reference is thus limited in that it assumes that the module | |
| containing the function or class is available/importable in the unpickling | |
| environment. This assumption breaks when pickling constructs defined in an | |
| interactive session, a case that is automatically detected by `cloudpickle`, | |
| that pickles such constructs **by value**. | |
| Another case where the importability assumption is expected to break is when | |
| developing a module in a distributed execution environment: the worker | |
| processes may not have access to the said module, for example if they live on a | |
| different machine than the process in which the module is being developed. By | |
| itself, `cloudpickle` cannot detect such "locally importable" modules and | |
| switch to serialization by value; instead, it relies on its default mode, which | |
| is serialization by reference. However, since `cloudpickle 2.0.0`, one can | |
| explicitly specify modules for which serialization by value should be used, | |
| using the | |
| `register_pickle_by_value(module)`/`/unregister_pickle_by_value(module)` API: | |
| ```python | |
| >>> import cloudpickle | |
| >>> import my_module | |
| >>> cloudpickle.register_pickle_by_value(my_module) | |
| >>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by value | |
| >>> cloudpickle.unregister_pickle_by_value(my_module) | |
| >>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference | |
| ``` | |
| Using this API, there is no need to re-install the new version of the module on | |
| all the worker nodes nor to restart the workers: restarting the client Python | |
| process with the new source code is enough. | |
| Note that this feature is still **experimental**, and may fail in the following | |
| situations: | |
| - If the body of a function/class pickled by value contains an `import` statement: | |
| ```python | |
| >>> def f(): | |
| >>> ... from another_module import g | |
| >>> ... # calling f in the unpickling environment may fail if another_module | |
| >>> ... # is unavailable | |
| >>> ... return g() + 1 | |
| ``` | |
| - If a function pickled by reference uses a function pickled by value during its execution. | |
| Running the tests | |
| ----------------- | |
| - With `tox`, to test run the tests for all the supported versions of | |
| Python and PyPy: | |
| pip install tox | |
| tox | |
| or alternatively for a specific environment: | |
| tox -e py312 | |
| - With `pytest` to only run the tests for your current version of | |
| Python: | |
| pip install -r dev-requirements.txt | |
| PYTHONPATH='.:tests' pytest | |
| History | |
| ------- | |
| `cloudpickle` was initially developed by [picloud.com](http://web.archive.org/web/20140721022102/http://blog.picloud.com/2013/11/17/picloud-has-joined-dropbox/) and shipped as part of | |
| the client SDK. | |
| A copy of `cloudpickle.py` was included as part of PySpark, the Python | |
| interface to [Apache Spark](https://spark.apache.org/). Davies Liu, Josh | |
| Rosen, Thom Neale and other Apache Spark developers improved it significantly, | |
| most notably to add support for PyPy and Python 3. | |
| The aim of the `cloudpickle` project is to make that work available to a wider | |
| audience outside of the Spark ecosystem and to make it easier to improve it | |
| further notably with the help of a dedicated non-regression test suite. | |