17173新闻分页采集插件
本帖最后由 zhouchanglin 于 2010-11-26 18:03 编辑使用方法:压缩包解压后,把php插件用在采集内容时,第4步里有相关设置,分页设置,请看规则的设置
测试可以
分页设置见采集规则里的,就是在源码里追加火车头可以识别的分页网址代码,处理后源码尾部多了分页的代码,这样火车就可以识别了<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>韩国科幻MMORPG《BERKANIX》首测(图)_网络游戏新闻_17173.com全球游戏门户第一站</title>
<meta name="keywords" content="
<link rel="stylesheet" href="http://ue1.17173.itc.cn/spp/u2/spp_style.css" type="text/css" />
<script src="http://ue1.17173.itc.cn/spp/spp_core.js" type="text/javascript"></script>
<script src="http://ue1.17173.itc.cn/spp/u2/spp_ui.js" type="text/javascript"></script>
<li
<!-- END 17173 Site Census -->
<script type="text/javascript" src="http://js.sohu.com/mail/pv/pv.js"></script>
<script type="text/javascript">
var copybq="17173.com门户站(www.17173.com)"
document.body.oncopy = function () {
setTimeout( function () {
var text = clipboardData.getData("text");
if (text) {
if(copybq){
text = text + "\r\n本文来自:" + copybq + "详细出处参考:"+location.href;
}else{
text = text + "\r\n本文来自: 17173.com网络游戏第一门户站(www.17173.com) 详细出处参考:"+location.href;
}
clipboardData.setData("text", text);
}
}, 100 )
}
</script></body>
</html>
以下是插件处理后追加的包含分页信息的代码:
分页开始<A href=http://news.17173.com/content/2010-11-13/20101113083759885,2.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,3.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,4.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,5.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,6.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,7.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,8.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,9.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,10.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,11.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,12.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,13.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,14.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,15.shtml target=_blank></A><A href=http://news.17173.com/content/2010-11-13/20101113083759885,16.shtml target=_blank></A>分页结束
好多西。谢谢楼主了。研究一下 采集17173的确很难! 没整明白是怎么个用法 这插件是放在哪儿?能告诉一下具体怎么利用吗? 试验下看看 不知道效果咋样 17173分页插件.rar这个不知道用到什么地方 谢谢分享 {:1_204:}拉拉 好多西。谢谢楼主了。研究一下 哪里有分页采集的教程,给一份,谢谢~
页:
[1]
2