python之pickle 大型 NumPy 数组

lautakyan007 阅读:42 2024-10-01 17:34:08 评论:0

我有一个很大的 3d numpy 数组,我想保留它。我的第一种方法是简单地使用 pickle ,但这似乎导致了一个解释不清的错误。

test_rand = np.random.random((100000,200,50)) 
with open('models/test.pkl', 'wb') as save_file: 
    pickle.dump(test_rand, save_file, -1) 
 
--------------------------------------------------------------------------- 
error                                     Traceback (most recent call last) 
<ipython-input-18-511e30b08440> in <module>() 
      1 with open('models/test.pkl', 'wb') as save_file: 
----> 2         pickle.dump(test_rand, save_file, -1) 
      3  
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in dump(obj, file, protocol) 
   1368  
   1369 def dump(obj, file, protocol=None): 
-> 1370     Pickler(file, protocol).dump(obj) 
   1371  
   1372 def dumps(obj, protocol=None): 
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in dump(self, obj) 
    222         if self.proto >= 2: 
    223             self.write(PROTO + chr(self.proto)) 
--> 224         self.save(obj) 
    225         self.write(STOP) 
    226  
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save(self, obj) 
    329  
    330         # Save the reduce() output and finally memoize the object 
--> 331         self.save_reduce(obj=obj, *rv) 
    332  
    333     def persistent_id(self, obj): 
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj) 
    417  
    418         if state is not None: 
--> 419             save(state) 
    420             write(BUILD) 
    421  
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save(self, obj) 
    284         f = self.dispatch.get(t) 
    285         if f: 
--> 286             f(self, obj) # Call unbound method with explicit self 
    287             return 
    288  
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save_tuple(self, obj) 
    560         write(MARK) 
    561         for element in obj: 
--> 562             save(element) 
    563  
    564         if id(obj) in memo: 
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save(self, obj) 
    284         f = self.dispatch.get(t) 
    285         if f: 
--> 286             f(self, obj) # Call unbound method with explicit self 
    287             return 
    288  
 
C:\Users\g1dak02\AppData\Local\Continuum\Anaconda\lib\pickle.pyc in save_string(self, obj, pack) 
    484                 self.write(SHORT_BINSTRING + chr(n) + obj) 
    485             else: 
--> 486                 self.write(BINSTRING + pack("<i", n) + obj) 
    487         else: 
    488             self.write(STRING + repr(obj) + '\n') 
 
error: integer out of range for 'i' format code 

所以我的两个问题如下:
  • 这个错误究竟发生了什么?
  • 我应该如何将阵列保存到磁盘?

  • 我正在使用 Python 2.7.8 和 NumPy 1.9.0。

    请您参考如下方法:

    关于#1,这是一个错误……而且是一个旧错误。有一个启发性的,尽管出奇的古老,关于这个的讨论在这里:http://python.6.x6.nabble.com/test-gzip-test-tarfile-failure-om-AMD64-td1830323.html

    报错原因在这里:http://www.littleredbat.net/mk/files/grimoire.html#contents_item_2.1

    The simplest and most basic type are integers, which are represented as a C long. Their size is therefore dependent on the platform you're using; on a 32-bit machine, they can range from -2147483647 to 2147483647. Python programs can determine the highest possible value for an integer by looking at sys.maxint; the lowest possible value will usually be -sys.maxint - 1.



    这个错误并不常见,因为大多数人在面对一个非常大的 numpy 时数组,将使用 np.savenp.savez利用 numpy 的精简 pickle 格式数组(请参阅 __reduce__ 数组的 numpy 方法,这是 np.save 在幕后调用的方法)。

    为了表明它只是数组对于 pickle 来说太大了…
    >>> import numpy as np 
    >>> import pickle 
    >>> test_rand = np.random.random((100000,200,50)) 
    >>> x = pickle.dumps(test_rand[:20000], -1) 
    >>> x = pickle.dumps(test_rand[:30000], -1) 
    Traceback (most recent call last): 
      File "<stdin>", line 1, in <module> 
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 194, in dumps 
        dump(obj, file, protocol, byref, fmode)#, strictio) 
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 184, in dump 
        pik.dump(obj) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump 
        self.save(obj) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save 
        f(self, obj) # Call unbound method with explicit self 
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 181, in save_numpy_array 
        pik.save_reduce(_create_array, (f, args, state, npdict), obj=obj) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 401, in save_reduce 
        save(args) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save 
        f(self, obj) # Call unbound method with explicit self 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple 
        save(element) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save 
        f(self, obj) # Call unbound method with explicit self 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple 
        save(element) 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save 
        f(self, obj) # Call unbound method with explicit self 
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 486, in save_string 
        self.write(BINSTRING + pack("<i", n) + obj) 
    struct.error: 'i' format requires -2147483648 <= number <= 2147483647 
    >>>  
    

    然而,这适用于整个阵列......
    >>> x = test_rand.__reduce__() 
    >>> type(x) 
    <type 'tuple'> 
    >>> x[0]      
    <built-in function _reconstruct> 
    >>> x[1] 
    (<type 'numpy.ndarray'>, (0,), 'b') 
    >>> x[2][0:3] 
    (1, (100000, 200, 50), dtype('float64')) 
    >>> len(x[2][4]) 
    8000000000 
    >>> x[2][4][:100] 
    'Y\xa4}\xdf\x84\xdf\xe1?\xfe\x1fd\xe3\xf2\xab\xe2?\x80\xe4\xfe\x17\xfb\xd6\xc2?\xd73\x92\xc9N]\xe8?\x90\xbc\xe3@\xdcO\xc9?\x18\x9dX\x12MG\xc4?(\x0f\x8f\xf9}\xf6\xb1?\xd0\x90O\xe2\x9b\xf1\xed?_\x99\x06\xacY\x9e\xe2?\xe7\xf8\x15\xa8\x13\x91\xe2?\x96}\xffH\xda\xc3\xd4?@\t\xae_"\xe0\xda?y<%\x8a' 
    

    如果你想烧坏你的风扇, print x .

    您还会注意到 x[0] 中的函数与数据一起保存。它是一个独立的函数,可以从 pickle 数据生成一个 numpy 数组。


    标签:Python
    声明

    1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

    关注我们

    一个IT知识分享的公众号