PyPDF2 pdf 文件写入提示如下错误:PyPDF2.utils.PdfReadError: Illegal character in Name Object

哈哈 阅读:1104 2021-03-31 12:46:56 评论:0

 

今天学习PyPDF2 pdf文件写入其他指定pdf 文件提示如下错误信息:

Traceback (most recent call last): 
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream 
    return NameObject(name.decode('utf-8')) 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte 
During handling of the above exception, another exception occurred: 
Traceback (most recent call last): 
  File "D:\python35\Lib\apps\backstage\views\busi_contract_manage_view.py", line 703, in post 
    merge_pdf_result = merge_pdf(final_files, pdf_path) 
  File "D:\python35\Lib\apps\utils\doc_convert_util.py", line 86, in merge_pdf 
    pdf_writer.write(new_file) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 482, in write 
    self._sweepIndirectReferences(externalReferenceMap, self._root) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, data[i]) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences 
    newobj = data.pdf.getObject(data) 
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject 
    retval = readObject(self.stream, self) 
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 66, in readObject 
    return DictionaryObject.readFromStream(stream, pdf) 
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream 
    value = readObject(stream, pdf) 
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 60, in readObject 
    return NameObject.readFromStream(stream, pdf) 
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream 
    raise utils.PdfReadError("Illegal character in Name Object") 
PyPDF2.utils.PdfReadError: Illegal character in Name Object
  • 分析上面报错部分,可以看出错误来源于E:\python_workspace\TornadoDemo\venv\Lib\site-packages\PyPDF2\generic.py", line 484。generic.py文件第484行,原始内容为:
try: 
    return NameObject(name.decode('utf-8')) 
except (UnicodeEncodeError, UnicodeDecodeError) as e: 
    # Name objects should represent irregular characters 
    # with a '#' followed by the symbol's hex number 
    if not pdf.strict: 
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning) 
        return NameObject(name) 
    else: 
        raise utils.PdfReadError("Illegal character in Name Object") 
  • 需要将上述原始内容,修改为如下内容:
try: 
            return NameObject(name.decode('utf-8')) 
        except (UnicodeEncodeError, UnicodeDecodeError) as e: 
            # Name objects should represent irregular characters 
            # with a '#' followed by the symbol's hex number 
            try: 
                return NameObject(name.decode('gbk')) 
            except (UnicodeEncodeError, UnicodeDecodeError) as e: 
                if not pdf.strict: 
                    warnings.warn("Illegal character in Name Object", utils.PdfReadWarning) 
                    return NameObject(name) 
                else: 
                    raise utils.PdfReadError("Illegal character in Name Object") 
  • 接着,修改utils.py文件中的第238行。utils.py文件中的第238行原始内容如下所示:
r = s.encode('latin-1') 
if len(s) < 2: 
    bc[s] = r 
return r 
  • 需要将上述原始内容,修改为如下内容:
try: 
    r = s.encode('latin-1') 
except Exception as e: 
    r = s.encode('utf-8') 
if len(s) < 2: 
    bc[s] = r 
return r

 

pypdf2 指定pdf文件写入其他pdf 文件

# encoding:utf-8 
from PyPDF2 import PdfFileReader, PdfFileWriter 
 
readFile = 'D:\\1.pdf' 
outFile = 'D:\\2.pdf' 
pdfFileWriter = PdfFileWriter() 
 
# 获取 PdfFileReader 对象 
pdfFileReader = PdfFileReader(readFile)  # 或者这个方式:pdfFileReader = PdfFileReader(open(readFile, 'rb')) 
# 文档总页数 
numPages = pdfFileReader.getNumPages() 
 
 
for index in range(0, numPages): 
    pageObj = pdfFileReader.getPage(index) 
    pdfFileWriter.addPage(pageObj) 
    # 添加完每页,再一起保存至文件中 
    pdfFileWriter.write(open(outFile, 'wb')) 
pdfFileWriter.addBlankPage() 
pdfFileWriter.addBlankPage() 
pdfFileWriter.write(open(outFile, "wb")) 

 

声明

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

搜索
关注我们

一个IT知识分享的公众号