sz319 发表于 2009-8-3 14:48:22

一个简单的网站可防火车头采集

http://www.pacszhanghong.cn/front2/Knowledge.aspx

这个网站很简单,但是火车头就是不能采集。这是为什么

懷念過去↑文 发表于 2009-8-3 15:32:05

有防采集吗?
没有。。。。。

火车头 发表于 2009-8-3 15:34:42

http://www.pacszhanghong.cn/front2/Knowledge.aspx?pageindex=1
到.
http://www.pacszhanghong.cn/front2/Knowledge.aspx?pageindex=10

测试没问题

懷念過去↑文 发表于 2009-8-3 15:35:46

呵呵 车头动作真快··

sz319 发表于 2009-8-3 16:04:03

不对吧,我下的最新版本,试了好多次都不行
获取源码提示:
/indentify.aspx

返回头信息:
HTTP/1.1 302 Found
Content-Length:132
Cache-Control:private
Content-Type:text/html; charset=utf-8
Date:Mon, 03 Aug 2009 07:56:32 GMT
Location:/indentify.aspx
Set-Cookie:ASP.NET_SessionId=fl3rlbvbchzokz2jikbyty55; path=/; HttpOnly
Server:Microsoft-IIS/6.0
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET

如图:




我下个旧版本的试试

懷念過去↑文 发表于 2009-8-3 18:22:24

这个嘛 你采集不到不能证明火车头采集器就无法采集这个页面 好好学习吧

sz319 发表于 2009-8-3 21:34:04

能告诉我怎么设置的吗,我搞了好久还是不行啊,

sz319 发表于 2009-8-4 09:10:45

在站点的里规则测试可以获取,
在任务里获取不了,
截取封包
正常的:
接收自   GET /front2/Knowledge.aspx?pageindex=1 HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Host: www.pacszhanghong.cn
Connection: Close

发送?HTTP/1.1 302 Found
Connection: close
Date: Tue, 04 Aug 2009 00:58:15 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: /indentify.aspx
Set-Cookie: ASP.NET_SessionId=zgpdvu45z1xfrr45d4axbz55; path=/; HttpOnly
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 132

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/indentify.aspx">here</a>.</h2>
</body></html>
接收   GET /indentify.aspx HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Host: www.pacszhanghong.cn
Cookie: ASP.NET_SessionId=zgpdvu45z1xfrr45d4axbz55
Connection: Close

发送HTTP/1.1 302 Found
Connection: close
Date: Tue, 04 Aug 2009 00:58:15 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: /front2/Knowledge.aspx?pageindex=1
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 690

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/front2/Knowledge.aspx?pageindex=1">here</a>.</h2>
</body></html>


不正常的:
GET /front2/Knowledge.aspx?pageindex=1 HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Referer: http://www.pacszhanghong.cn/front2/Knowledge.aspx?pageindex=1
Host: www.pacszhanghong.cn
Connection: Close

发送?HTTP/1.1 302 Found
Connection: close
Date: Tue, 04 Aug 2009 00:56:19 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: /indentify.aspx
Set-Cookie: ASP.NET_SessionId=ve4s4l45m3e1u5555neschva; path=/; HttpOnly
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 132

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/indentify.aspx">here</a>.</h2>
</body></html>
接收   GET /indentify.aspx HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Referer: http://www.pacszhanghong.cn/front2/Knowledge.aspx?pageindex=1
Host: www.pacszhanghong.cn
Connection: Close

发送?HTTP/1.1 302 Found
Connection: close
Date: Tue, 04 Aug 2009 00:56:19 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: /front2/index.aspx
Set-Cookie: ASP.NET_SessionId=4wajusy1xp1zpxigmcfueyme; path=/; HttpOnly
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 674

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/front2/index.aspx">here</a>.</h2>
</body></html>

最后一个包不同,应该是两个地方获取方式不同,
不知道你们上面是怎么设置的,我这里是不行。

sz319 发表于 2009-8-4 10:06:02

网站万能信息采集器可以正常获取,可惜要收费的
页: [1]
查看完整版本: 一个简单的网站可防火车头采集