好象时间间隔设置不灵,采集时很容易被封
我采集的是新浪的爱问知识人,才20条就被封,系统是设的单线程,2秒。http://iask.sina.com.cn/browse/get_class2.php?fatherid=222&status=K 【序号】:166
【返回信息】:web在线发布错误 注意查看返回源代码(只显示一次,以下雷同):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<html>
<head>
<title>̳ (վԴPHP̡PHP롢Ѷ̬...) - վ (ѽվԴѽ) - powered by Discuz!</title>
<meta http-equiv="Content-Type" c>
<meta name="keywords" c>
<meta name="description" c>
<meta name="generator" c>
<meta name="MSSmartTagsPreventParsing" c>
<meta http-equiv="MSThemeCompatible" c>
<style type="text/css"><!--
a { text-decoration: none; color: #003366 }
a:hover { text-decoration: underline }
body { scrollbar-base-color: #FAFAFA; scrollbar-arrow-color: #DDE3EC; font-size: 12px; background-color: #FFFFFF }
table { font: 12px Tahoma, Verdana; color: #000000 }
input,select,textarea { font: 11px Tahoma, Verdana; color: #000000; font-weight: normal; background-color: #FAFAFA }
form { margin: 0; padding: 0}
select { font: 11px Tahoma; color: #000000; font-weight: normal; background-color: #FAFAFA }
.nav { font: 12px Tahoma, Verdana; color: #000000; font-weight: bold }
.nav a { color: #000000 }
.header { font: 11px Tahoma, Verdana; color: #FFFFFF; font-weight: bold; background-color: #509090 }
.header a{ color: #FFFFFF }
.category{ font: 11px Tahoma; color: #000000; background-color: #FAFAFA }
.tableborder{ background: #D6E0EF; border: 1px solid #DDE3EC }
.singleborder{ font-size: 0px; line-height: 1px; padding: 0px; background-color: #FAFAFA }
.smalltxt{ font: 11px Tahoma }
.outertxt{ font: 12px Tahoma, Verdana; color: #000000 }
.outertxt a{ color: #000000 }
.bold { font-weight: bold }
.altbg1 { background: #FAFAFA }
.altbg2 { background: #FFFFFF }
.maintable{ width: 98%; background-color: #FFFFFF }
--></style><script language="JavaScript" src="include/common.js"></script>
</head>
<body leftmargin="0" rightmargin="0" topmargin="0" >
<table bgcolor="#FFFFFF" width="98%" cellpadding="0" cellspacing="0" border="0" align="center">
<tr>
<td width="100%">
<table border="0" cellspacing="0" cellpadding="0" width="98%" align="center" class="outertxt">
<tr>
<td rowspan="2" width="0"><img src="images/spacer.gif" width="0" height="0"></td>
<td rowspan="2" valign="top"><a href="index.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f"><img src="images/default/logo.jpg" alt="̳ (վԴPHP̡PHP롢Ѷ̬...)" border="0"></a></td><td height="80" align="right">
</td>
</tr>
<tr>
<td align="right" class="smalltxt"><span class="bold">»</span>
<span class="bold">342401799: </span> <a href="logging.php?action=logout&PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f">˳</a>
| <a href="pm.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f" target="_blank">Ϣ</a>
|<a href="memcp.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f"></a>
| <a href="member.php?action=list&PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f">Ա</a>
| <a href="search.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f"></a>
| <a href="faq.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f"></a>
</td><td rowspan="2" width="0"><img src="images/spacer.gif" width="0" height="0"></td>
</tr>
</table>
</td></tr></table>
<center>
<div class="maintable"><br><table cellspacing="0" cellpadding="0" border="0" width="98%" align="center" style="table-layout: fixed">
<tr><td class="nav" width="90%" align="left" nowrap> <a href="index.php?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f">̳ (վԴPHP̡PHP롢Ѷ̬...)</a> » ʾϢ</td>
<td align="right" width="10%"> <a href="#bottom"><img src="images/default/arrow_dw.gif" border="0" align="absmiddle"></a></td>
</tr></table><br>
<table cellspacing="1" cellpadding="4" width="98%" align="center" class="tableborder">
<tr class="header"><td>̳ (վԴPHP̡PHP롢Ѷ̬...) ʾϢ</td></tr>
<tr><td class="altbg2" align="center">
<table border="0" width="90%" cellspacing="0" cellpadding="0">
<tr><td align="center" class="smalltxt">
<br>֤<br><br>
</td></tr></table>
</td></tr></table>
<br><br></div><a name="bottom"></a>
<div class="maintable">
</div>
<div class="maintable">
<table cellspacing="2" cellpadding="0" align="center" style="font-size: 11px; font-family: Tahoma, Arial"><tr>
<a href="http://www.alipay.com" target="_blank"><img src="images/default/alipay.gif" border="0" align="absmiddle" alt="本论坛支付平台由支付宝提供
携手打造安全诚信的交易社区"></a> </td><td>
Powered by <a href="http://www.discuz.net" target="_blank"><b>Discuz!</b></a> <b style="color:#FF9900">4.1.0</b>
© 2001-2006 <a href="http://www.comsenz.com" target="_blank">Comsenz Inc.</a>
<br>Processed in 0.028544 second(s), 4 queries
</td></tr></table><br>
</div>
<div class="maintable">
<table cellspacing="0" cellpadding="1" width="100%" class="outertxt">
<tr><td>
<table cellspacing="0" cellpadding="4" width="100%" class="smalltxt">
<tr class="altbg1"><td>ʱΪ GMT+8, ʱ 2006-11-9 17:36</td>
<td align="right"><a href="member.php?action=clearcookies&PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f" class="bold"> Cookies</a> - <a href="mailto: admin@qinvent.com" class="bold">ϵ</a> - <a href="http://www.qinvent.com/" target="_blank" class="bold">緢</a>
- <a href="archiver/?PHPSESSID=20c2453a061f0ed6b7b25d89bb4aa85f" target="_blank" class="bold">Archiver</a>
</td>
</tr>
<tr style="font-size: 0px; line-height: 0px; spacing: 0px; padding: 0px; background-color: #509090"><td colspan="3"> </td></tr></table>
</td></tr></table>
</div>
</center><br>
</body></html>
【采集地址】:http://edu.chinaz.com/Get/Club/Google/0510201552448058771.asp
【采集时间】:2006-11-9 19:37:21
用3.0的在线发布到dz论坛的时候总出现这个 采集深度 为 0然后弹窗这个 的确需要解决 我的win2000装了.net2.0还是用不了火车头3。0,用XP就可以,可是我又不喜欢用XP,真是郁闷呀 设置间隔倍数 不管用。采集的网址 最后不是以1为倍数的,我采的一个网站是25 ,50,75..,以25翻倍的,程序里测试的网址的时候,还是以1为倍数逐一递增的。
管理员,能否在这1楼把收集到的问题做个列表,以后发贴的人可以先查一下要发的内容是不是已经有人发过了,不然页数多了挺麻烦的。谢谢 原帖由 hoverlee 于 2006-11-7 10:29 发表
发一个小BUG:
如采集的信息是网址,而网址里面有含有 &,采集后&不能正常显示,会显示为 "%26"
我也碰到了这个问题 discuz5.0采集的内容不能换行,都在一堆了,很难看!能换行就好了! 采集纯英文的文章 所有的格式标签都被自动清除 如< p>