python 怎么处理http post 的请求参数

Python012

python 怎么处理http post 的请求参数,第1张

import httplib, urllib

from urlparse import urlparse

def httppost(url, **kwgs):

httpClient = None

conn = urlparse(url)

try:

params = urllib.urlencode(dict(kwgs))

header = {"Content-type": "application/x-www-form-urlencoded","Accept": "text/plain", }

httpClient = httplib.HTTPConnection(conn.netloc, conn.port, timeout=30)httpClient.request("POST", conn.path, params, header)response = httpClient.getresponse()

print response.status

print response.reason

print response.read()

print response.getheaders()

except Exception, e:

print e

finally:

if httpClient:

httpClient.close()

从表面上看,Python爬虫程序运行中出现503错误是服务器的问题,其实真正的原因在程序,由于Python脚本运行过程中读取的速度太快,明显是自动读取而不是人工查询读取,这时服务器为了节省资源就会给Python脚本反馈回503错误。其实只要把爬取的速度放慢一点就好了。比如读取一条记录或几条记录后适当添加上time.sleep(10),这样就基本上不会出现503错误了。我本人在使用中一般是在每一次读取后都运行time.sleep(1)或time.sleep(3),具体的数值根据不同的网站确定。

http.client包就实现了相应操作的,具体你可以看python的官方教程,下面是我从里面截取的POST示例:

>>> import http.client, urllib.parse

>>> params = urllib.parse.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})

>>> headers = {"Content-type": "application/x-www-form-urlencoded",

...            "Accept": "text/plain"}

>>> conn = http.client.HTTPConnection("bugs.python.org")

>>> conn.request("POST", "", params, headers)

>>> response = conn.getresponse()

>>> print(response.status, response.reason)

302 Found

>>> data = response.read()

>>> data

b'Redirecting to <a href="http://bugs.python.org/issue12524">http://bugs.python.org/issue12524</a>'

>>> conn.close()