β

Why Python Pickle is Insecure

Nadia Alramli's Blog 769 阅读

Python pickle is a powerful serialization module. It is the most common method to serialize and deserialize Python object structures. The pickle module has an optimized cousin called cPickle that is written in C. In this post I'm going to refer to both modules by the name pickle unless I mention otherwise. The security issues I'm going to discuss apply to both of them.

What This is All About

Pickle was never claimed to be secure. In the pickle documentation there is a warning in red that says:

Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

This clearly states that pickle is insecure. Many think this is because it can load classes other than what you expect and may trick you to run their functions. But the actual security risk is far more dangerous. Unpickling can be exploited to execute arbitrary commands on your machine!
Take this little example:

import pickle
pickle.loads("cos\nsystem\n(S'ls ~'\ntR.") # This will run: ls ~

Or of you are running windows try this instead:

import pickle
pickle.loads("cos\nsystem\n(S'dir'\ntR.") # This will run: dir

You can replace ls and dir with any other command.

I will use pickletools.dis to disassemble the pickle and show you how this is working:

import pickletools
print pickletools.dis("cos\nsystem\n(S'ls ~'\ntR.")

Output:

: c    GLOBAL     'os system'
: (    MARK
: S        STRING     'ls ~'
: t        TUPLE      (MARK at 11)
: R    REDUCE
: .    STOP

Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object. In other words the pickled instructions in our example are:

  1. Push self.find_class(module_name, class_name) i.e. push os.system
  2. Push the string 'ls ~'
  3. Build tuple from topmost stack items
  4. Apply callable to argtuple, both on stack. i.e. os.system(*('ls ~',))

The example is not exploiting a bug in pickle. Reduce is a vital step to instantiate objects from their classes. Take this example where I am unpickling an instance of the built-in object class:

import pickletools
import pickle
print pickletools.dis(pickle.dumps(object()))

Output:

: c    GLOBAL     'copy_reg _reconstructor'
: p    PUT        0
: (    MARK
: c        GLOBAL     '__builtin__ object'
: p        PUT        1
: g        GET        1
: N        NONE
: t        TUPLE      (MARK at 28)
: p    PUT        2
: R    REDUCE
: p    PUT        3
: .    STOP

Note the REDUCE step. To create an instance of the class object, pickle has to get the __builtin__.object class and then apply it to the given arguments.

As of 2.3 Python abandoned any pretense that it might be safe to load pickles received from untrusted parties. Because no sufficient security analysis has been done to guarantee this and there isn't a use case that warrants the expense of such an analysis. As a result all tests for __safe_for_unpickling__ or for copy_reg.safe_constructors were removed from the unpickling code.Source: pickletools.py source code comments

How to Make Unpickling Safer

To make unpickling saferThere is no 100% safety guarantee. pickle was never intended to be secure., you need to control exactly which classes will get created. In pickle this can be done by overriding the find_class method. For example:

import sys
import pickle
import StringIO
 
class SafeUnpickler(pickle.Unpickler):
    PICKLE_SAFE = {
        'copy_reg': set(['_reconstructor']),
        '__builtin__': set(['object'])
    }
    def find_class(self, module, name):
        if not module in self.PICKLE_SAFE:
            raise pickle.UnpicklingError(
                'Attempting to unpickle unsafe module %s' % module
            )
        __import__(module)
        mod = sys.modules[module]
        if not name in self.PICKLE_SAFE[module]:
            raise pickle.UnpicklingError(
                'Attempting to unpickle unsafe class %s' % name
            )
        klass = getattr(mod, name)
        return klass
 
    @classmethod
    def loads(cls, pickle_string):
        return cls(StringIO.StringIO(pickle_string)).load()
 
 
SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os

To extend the PICKLE_SAFE dictionary with your pickle safe classes and modules:

SafeUnpickler.PICKLE_SAFE.update({'__main__': set(['MyClass1', 'MyClass2']), 'MyModule': set(['MyClass3'])})

You need to be really careful with what you include in the PICKLE_SAFE dictionary. The __builtin__ module contains the eval method. Which can be as dangerous as the os.system method.

In cPickle this has to be implemented a bit differently. There is a special attribute called find_global that needs to be set to a function that accepts a module name and a class name, and returns the corresponding class object. cPickle.Unpickler can't be subclassed directly, instead we are going to wrap it in another class:

import sys
import cPickle
import StringIO
 
class SafeUnpickler(object):
    PICKLE_SAFE = {
        'copy_reg': set(['_reconstructor']),
        '__builtin__': set(['object'])
    }
 
    @classmethod
    def find_class(cls, module, name):
        if not module in cls.PICKLE_SAFE:
            raise cPickle.UnpicklingError(
                'Attempting to unpickle unsafe module %s' % module
            )
        __import__(module)
        mod = sys.modules[module]
        if not name in cls.PICKLE_SAFE[module]:
            raise cPickle.UnpicklingError(
                'Attempting to unpickle unsafe class %s' % name
            )
        klass = getattr(mod, name)
        return klass
 
    @classmethod
    def loads(cls, pickle_string):
        pickle_obj = cPickle.Unpickler(StringIO.StringIO(pickle_string))
        pickle_obj.find_global = cls.find_class
        return pickle_obj.load()
 
 
SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os

As you can see, this solution works. But it is hardly practical for many cases. You need to tell pickle what you want in advance and specifically. The moral of the story according to the pickle documentation

You should be really careful about the source of the strings your application unpickles.

Safer Alternatives

Fortunately, there are alternatives to pickle. They may not be as powerful when it comes to serializing python objects and classes. But for most cases all we need to serialize is basic types and simple data structures.

JSON

JSON is a lightweight computer data interchange format. Its human-readable format gives it an advantage over pickle. The json.org website provides a comprehensive listing of existing JSON bindings, including Python. The json module is now a standard part of python since 2.6.

YAML

YAML is a human-readable data serialization format. YAML has additional features lacking in JSON such as extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order. PyYAML is a Python binding for YAML. PyYAML allows sophisticated object instantiation to be executed which opens the potential for an injection attack. According to the PyYAML documentation, you need to use yaml.safe_load function to load data from untrusted sources.

Others

Depending on your application there are many other alternatives like: XML, Protocol Buffers, Thrift...There is a useful comparison of data serialization formats in wikipedia

作者:Nadia Alramli's Blog
原文地址:Why Python Pickle is Insecure, 感谢原作者分享。

发表评论