二级网址列表如何采集?
是这样的,按正常的采集方法获取到的列表是列表http://www.*.com/web/exhi/exhi_search.aspx?Industry=9&Country=0&Start=2012-05-01T2012-05-31&page=1中的列表是
http://www.*.com/pages/exhi/201112/50977/index.shtml
http://www.*.com/pages/exhi/201202/53407/index.shtml
http://www.*.com/pages/exhi/201106/42875/index.shtml
http://www.*.com/pages/exhi/201108/46286/index.shtml
以下只抽取一个链接说明:
如:http://www.*.com/pages/exhi/201112/50977/index.shtml
正常来说这个已经是内容页的链接,但是这个站比较特殊,列表进去后只是一个简介,不是真实的内容页,需要点击“查看详情”才进入真正的内容页http://www.*.com/pages/exhi/201112/50977/exhi_detail_gaikuang.shtml
有人成功采集过这样的案例吗?分析地址也有些规则:
http://www.*.com/pages/exhi/201112/50977/index.shtml
http://www.*.com/pages/exhi/201112/50977/exhi_detail_gaikuang.shtml
简介页与真正内容页只是index.shtml与exhi_detail_gaikuang.shtml的区别,且所有的列表都是这样的,变的参数只是中间数字部份。
页:
[1]