共计 3950 个字符,预计需要花费 10 分钟才能阅读完成。
问题
简述
使用 python requests 上传文件时, 报
OverflowError: string longer than 2147483647 bytes 错误.
detail
问题代码
data = {}
with open("bigfile", "rb") as f:
r = requests.post(PUBLISH_URL, data=data, files={"xxx": f})
traceback
Traceback (most recent call last):
File "test.py", line 52, in <module>
main()
File "test.py", line 49, in main
publish()
File "test.py", line 41, in publish
r = requests.post(PUBLISH_URL, data=cfg, files={file_key: ("./test.apk", f)})
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1097, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 897, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 873, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/ssl.py", line 743, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 709, in send
v = self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
分析过程
requests 将 file obj 全部读入内存了
没想到 requests 的实现这么粗暴, 直接 file.read(), 实际也是如此, 发送大文件时, 内存快速上涨. 代码如下:
requests/models.py
@staticmethod
def _encode_files(files, data):
"""Build the body for a multipart/form-data request.
Will successfully encode files when passed as a dict or a list of
tuples. Order is retained if data is a list of tuples but arbitrary
if parameters are supplied as a dict.
The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
or 4-tuples (filename, fileobj, contentype, custom_headers).
"""
if (not files):
raise ValueError("Files must be provided.")
elif isinstance(data, basestring):
raise ValueError("Data must not be a string.")
new_fields = []
fields = to_key_val_list(data or {})
files = to_key_val_list(files or {})
for field, val in fields:
if isinstance(val, basestring) or not hasattr(val, '__iter__'):
val = [val]
for v in val:
if v is not None:
# Don't call str() on bytestrings: in Py3 it all goes wrong.
if not isinstance(v, bytes):
v = str(v)
new_fields.append((field.decode('utf-8') if isinstance(field, bytes) else field,
v.encode('utf-8') if isinstance(v, str) else v))
for (k, v) in files:
# support for explicit filename
ft = None
fh = None
if isinstance(v, (tuple, list)):
if len(v) == 2:
fn, fp = v
elif len(v) == 3:
fn, fp, ft = v
else:
fn, fp, ft, fh = v
else:
fn = guess_filename(v) or k
fp = v
if isinstance(fp, (str, bytes, bytearray)):
fdata = fp
elif hasattr(fp, 'read'):
fdata = fp.read() # 这里将所有文件都读入内存
elif fp is None:
continue
else:
fdata = fp
rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
rf.make_multipart(content_type=ft)
new_fields.append(rf)
body, content_type = encode_multipart_formdata(new_fields)
return body, content_type
官方文档推荐使用 requests-toolbelt
https://2.python-requests.org…
In the event you are posting a very large file as a multipart/form-data request, you may want to stream the request. By default, requests does not support this, but there is a separate package which does – requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.
使用 requests-toolbelt 的写法
from requests_toolbelt import MultipartEncoder
data = {}
with open("bigfile", "rb") as f:
data["xxx"] = ("filename", f)
m = MultipartEncoder(fields=data)
r = requests.post(PUBLISH_URL, data=m, headers={'Content-Type': m.content_type})
总结
requests 的发送文件的实现十分粗暴, 会直接读全部文件内容到内存再 sign, ssl sign 大于 2GB 会报错, 官方文档推荐使用 requests-toolbelt 上传大文件.
分块上传当然也是一个方案 (如果服务器支持).
正文完