火车采集器软件交流官方论坛

 找回密码
 加入会员
搜索
火车采集器V9版免费下载火车浏览器 - 可视采集,万能群发,全自动脚本工具
查看: 5027|回复: 4

采集英文站好多字符不能正确采,文章会不完整

[复制链接]
发表于 2007-3-17 09:07:03 | 显示全部楼层 |阅读模式
比如原来是这样

Remember that old television series “The Twilight Zone”? It featured sci-fi fantasies that metaphorically demonstrated people’s hopes, fears and despairs. The actors would be cruising along enjoying a somewhat normal life and then suddenly they entered “a new dimension” where everything was confusing and intimidating.

用default编码采集后的就是这样的

Remember that old television series 揟he Twilight Zone? It featured sci-fi
fantasies that metaphorically demonstrated people抯 hopes, fears and despairs.
The actors would be cruising along enjoying a somewhat normal life and then
suddenly they entered 揳 new dimension?where everything was confusing and
intimidating.

用UTF-8编码采集就是这样的

Remember that old television series The Twilight Zone? It featured sci-fi
fantasies that metaphorically demonstrated peoples hopes, fears and despairs.
The actors would be cruising along enjoying a somewhat normal life and then
suddenly they entered a new dimension where everything was confusing and
intimidating.

里面的标点几种标点符号都不能正常识,常见的用  '   &   " "   这些符号要是出现在文章里就会中断文章或是出现乱码

请教高手怎么解决
发表于 2007-3-17 14:22:12 | 显示全部楼层
试试用替换功能呢
这是个问题,希望火车能解决.
 楼主| 发表于 2007-3-17 19:38:48 | 显示全部楼层
原帖由 netdream 于 2007-3-17 14:22 发表
试试用替换功能呢
这是个问题,希望火车能解决.


替换不能完全解决这个问题,比如
正常是 series “The  采集后成了这个  series 揟he   里面的 "T  这两个字符变了了一个 揟  这样的乱码,没有规则的!


希望能解决下这个问题不然采集E文很郁闷
发表于 2007-6-20 18:33:26 | 显示全部楼层
最好增加自定义编码的功能。。。好像哪个采集器有这个功能来着  小蜜蜂?
发表于 2007-8-27 15:53:25 | 显示全部楼层
现在3.1的还是有这个问题,采集charset=iso-8859-1的页面就是这样,编码的选择能多增加几个就好了

[ 本帖最后由 ocpsys 于 2007-8-27 15:55 编辑 ]
您需要登录后才可以回帖 登录 | 加入会员

本版积分规则

QQ|手机版|Archiver|火车采集器官方站 ( 皖ICP备06000549 )

GMT+8, 2024-11-18 06:27

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表