输入账号密码,打开开发者工具,在Network页勾选上Preserve Log(显示持续日志),点击登录,查看Session请求,找到其请求的URL与Form Data、Headers。此时除Cookies与authenticity_token无法直接获得外,其余模拟登录所需参数皆已获得。
我们进入登录页点击登录后,浏览器会向服务器发送这些信息,所以这些信息是在登录页就已设置好的。所以我们在登录页源码中搜索authenticity_token,果然找到了它的值。在Response-Headers中观察到有一个set-cookies的字段,这个就是设置cookies的过程。下面给出代码示例。
相关推荐:《Python视频教程》
具体是卡在这里:1是为了图方便,下载图片用了urllib.urlretrieve, 可是cookie时绑定到urllib2上的,所以获取cookie失败
2是发现登录知乎必须对login/email接口提交两次,第一次提交即使captcha对了也不行,必须两次,太坑了
3就算是登录成功了,再去拉主页,还是提示没登录,原来get 主页也要设置headers,不然还是说没登录上
修改如下:
#!/usr/bin/python
import urllib
import urllib2
import cookielib
import re
import time
hosturl = 'http://www.zhihu.com'
posturl = 'http://www.zhihu.com/login/email'
captcha_pre = 'http://www.zhihu.com/captcha.gif?r='
#set cookie
cj = cookielib.CookieJar()
cookie_support = urllib2.HTTPCookieProcessor(cj)
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)
#get xsrf
h = urllib2.urlopen(hosturl)
html = h.read()
xsrf_str = r'<input type="hidden" name="_xsrf" value="(.*?)"/>'
xsrf = re.findall(xsrf_str, html)[0]
print xsrf
#get captcha
def get_captcha():
captchaurl = captcha_pre + str(int(time.time() * 1000))
print captchaurl
data = urllib2.urlopen(captchaurl).read()
f = file('captcha.jpg',"wb")
f.write(data)
f.close()
captcha = raw_input('captcha is: ')
print captcha
return captcha
#post data
def post_data(captcha,xsrf):
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1WOW64rv:14.0) Gecko/20100101 Firefox/14.0.1',
'Referer' : 'http:www.zhihu.com'}
postData = {'_xsrf' : xsrf,
'password' : 'yyy',
'captcha' : captcha,
'email' : 'xxx',
'remember_me' : 'true',
}
#request
postData = urllib.urlencode(postData)
print postData
request = urllib2.Request(posturl, postData, headers)
response = urllib2.urlopen(request)
text = response.read()
return text
#post it
captcha=get_captcha()
print captcha
text=post_data(captcha,xsrf)
print text
#post again
captcha=get_captcha()
text=post_data(captcha,xsrf)
print text
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1WOW64rv:14.0) Gecko/20100101 Firefox/14.0.1',
'Referer' : 'http:www.zhihu.com'}
request = urllib2.Request(url='http://www.zhihu.com', headers=headers)
response = urllib2.urlopen(request)
print response.read()