新浪微博爬虫遇到的cookie rejected 问题解决办法

最近做了个新浪微博爬虫,用到了httpclient-4.3.3,程序运行的很好,就是一直会出现 cookie rejected警告,,日志如下:

2014-06-05 10:27:17.417 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.ef542aa2.538fd58b.ec8a8e2c", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:23 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:17.422 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.ef632aa2.538fd58b.c6dd669e", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:佩佩菜_523502014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.75d37a79.538fd58d.077976a4", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:25 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.75e37a79.538fd58d.575a338c", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:通吃一条街呵呵2014-06-05 10:27:29.119 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.9fcc12df.538fd597.fcf0e3af", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:35 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:29.120 [main] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.9fd812df.538fd597.e804e263", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:dxedflog4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog).log4j:WARN Please initialize the log4j system properly.log4j:WARN See #noconfig for more info.2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – 读取系统配置:D:\Workspaces\eurlanda\DAP_EurlandaSpider\WebRoot\WEB-INF\classes\config.properties2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.weibo.dely=122014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.task.saveDely=12014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.task.dely=1682014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.core.socket.retryCount=32014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.work_thread_num=102014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.core.socket.readTimeout=52014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.core.socket.serverPort=70772014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.core.socket.connectTimeout=52014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global – jspider.work.schedule=* * 18-9 ? * 1-5|* * * ? * 1,7|* * * * * ?2014-06-05 10:27:30.254 [Thread-0] INFO c.e.s.c.sina_weibo.SinaWeiBoCrawler – ———– 抓取日期2010-02-23 00:00:00的数据———–2014-06-05 10:27:30.869 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.ae2f61ad.538fd599.2711e9ab", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.870 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.ae3b61ad.538fd599.cec3bfae", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.881 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.18d93dd.538fd599.add86b40", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.882 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.18ee3dd.538fd599.d7522db2", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:31.089 [18721437752] INFO c.e.s.c.sina_weibo.SinaWeiBoClient – 搜索无结果。2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS1="000000ea.39486d50.538fd599.66e98262", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies – Cookie rejected [U_TRS2="000000ea.395a6d50.538fd599.84218ee8", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。

CookieSpecProvider easySpecProvider = new CookieSpecProvider() {public CookieSpec create(HttpContext context) {return new BrowserCompatSpec() {@Overridepublic void validate(Cookie cookie, CookieOrigin origin)throws MalformedCookieException {// Oh, I am easy}};}};Registry<CookieSpecProvider> reg = RegistryBuilder.<CookieSpecProvider>create().register(CookieSpecs.BEST_MATCH,new BestMatchSpecFactory()).register(CookieSpecs.BROWSER_COMPATIBILITY,new BrowserCompatSpecFactory()).register("mySpec", easySpecProvider).build();RequestConfig requestConfig = RequestConfig.custom().setCookieSpec("mySpec").build();CloseableHttpClient httpclient = HttpClients.custom().setDefaultCookieSpecRegistry(reg).setDefaultRequestConfig(requestConfig).build();

闽南的花市,一开始是来自漳州百花村,

新浪微博爬虫遇到的cookie rejected 问题解决办法

相关文章:

你感兴趣的文章:

标签云: