新浪博客评论采集,进入死胡同,下不去了
几经周折,了解到评论是用JSON方法实现的,也看到了所谓的源代码,但源码格式是这样的:# {"code":"A00006",data:"\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u9760\u8fd1\u6211\u6e29\u6696\u4f60<\/a><\/span>2010-12-09 08:24:03<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
\u5bb6\u957f\u662f\u5b69\u5b50\u7684\u4e00\u9762\u955c\u5b50\uff0c\u8fd9\u8bdd\u6ca1\u9519\u3002\u5b66\u4e60\u4e86\uff0c\u670b\u53cb\uff01 [\"\u6b22\u559c\"] <\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t\t
\r\n\t\t\t\t
\u535a\u4e3b\u56de\u590d\uff1a<\/span>2010-12-15 09:48:53<\/em><\/span><\/p>\r\n\t\t\t\t
\u5475\u5475\uff0c\u76f8\u4e92\u5b66\u4e60<\/p>\r\n\t\t\t<\/div>\r\n\t\t<\/div>\r\n\t<\/li>\t\t
# \r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u5feb\u4e50\u5929\u4f7f<\/a><\/span>2010-12-09 08:25:32<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
# \u6069\uff0c\u5bb6\u957f\u662f\u5b69\u5b50\u7684\u699c\u6837\uff0c\u6240\u4ee5\u8fd9\u6837\u4e2a\u699c\u6837\u8981\u505a\u597d\u3002<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u5c0f\u5b87\u5988<\/a><\/span>2010-12-09 10:50:54<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
# \u771f\u7684\u662f\u8a00\u4f20\u8eab\u6559\uff01<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u65e0\u654c\u5988\u5988<\/a><\/span>2010-12-09 12:02:09<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
# [\"\u9876\"] <\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u9752\u5c9b\u65b0\u7231\u5a74\u65e9\u6559\u4e2d\u5fc3<\/a><\/span>2010-12-09 12:15:45<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
\u572828\u2014\u201430\u4e2a\u6708\u91cc\uff0c\u60a8\u5b69\u5b50\u7684\u7cbe\u7ec6\u52a8\u4f5c\u80fd\u529b\u5f97\u5230\u4e86\u5f88\u597d\u7684\u53d1\u5c55\u3002\u4ed6\u80fd\u591f\u6a21\u4eff\u60a8\u753b\u51fa\u7ebf\u6761\u548c\u5706\u5708\uff0c\u53cc\u624b\u4e5f\u80fd\u66f4\u597d\u7684\u5236\u4f5c\u548c\u642d\u5efa\u4e1c\u897f\u3002\u4ed6\u7684\u5f88\u591a\u5927\u52a8\u4f5c\u5df2\u7ecf\u6bd4\u8f83\u719f\u7ec3\uff0c\u73b0\u5728\u4ed6\u8fd8\u4f1a\u8fdb\u884c\u4e00\u4e9b\u5c0f\u7684\u8c03\u6574\uff0c\u4ece\u800c\u4f7f\u8fd9\u4e9b\u52a8\u4f5c\u66f4\u6709\u6548\u7387\u3002
\u3000\u3000\u5728\u63a2\u7d22\u6d3b\u52a8\u4e2d\uff0c\u4ed6\u9010\u6e10\u5f00\u59cb\u7528\u66f4\u6210\u719f\u7684\u65b9\u5f0f\u53bb\u601d\u8003\uff0c\u5f00\u59cb\u7406\u89e3\u5404\u79cd\u7269\u4f53\u548c\u7ecf\u5386\u90fd\u662f\u7531\u4e0d\u540c\u7684\u90e8\u5206\u7ec4\u6210\u7684\u3002\u60a8\u4f1a\u53d1\u73b0\uff0c\u60a8\u7684\u5b69\u5b50\u5bf9\u7269\u4f53\u7684\u53d8\u5316\u5f88\u611f\u5174\u8da3\uff0c\u4e5f\u5f00\u59cb\u7406\u89e3\u8fd9\u6837\u7684\u53d8\u5316\u3002
\u3000\u3000\u968f\u7740\u5b69\u5b50\u8bed\u8a00\u80fd\u529b\u7684\u63d0\u9ad8\uff0c\u4ed6\u9010\u6e10\u80fd\u7528\u8bed\u8a00\u8868\u8fbe\u6bd4\u8f83\u590d\u6742\u7684\u601d\u60f3\uff0c\u8fd8\u53ef\u4ee5\u7528\u8bed\u8a00\u6765\u8c08\u8bba\u5404\u79cd\u6982\u5ff5\u3002\u4ed6\u4eec\u559c\u6b22\u91cd\u590d\u513f\u6b4c\u548c\u6545\u4e8b\uff0c\u7279\u522b\u662f\u5bb6\u5ead\u6545\u4e8b\u3002
# \u60a8\u5b69\u5b50\u7684\u60c5\u611f\u5728\u8fd9\u6bb5\u65f6\u95f4\u5185\u8fd8\u4f1a\u7ee7\u7eed\u53d1\u5c55\u3002\u4ed6\u4eec\u80fd\u591f\u7236\u6bcd\u8868\u8fbe\u5173\u5fc3\u548c\u79ef\u6781\u7684\u60c5\u611f\uff0c\u4f46\u6709\u65f6\u4e5f\u4f1a\u5bf9\u523a\u8033\u7684\u566a\u97f3\u3001\u67d0\u4e9b\u52a8\u7269\u548c\u964c\u751f\u7684\u5730\u65b9\u611f\u5230\u6050\u60e7\u3002\u60a8\u7684\u5173\u7231\u548c\u5b89\u6170\u6b64\u65f6\u4f1a\u663e\u5f97\u66f4\u4e3a\u91cd\u8981\u3002<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u60c5\u6df1\u96e8\u8499<\/a><\/span>2010-12-09 15:35:58<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
# \u662f\u554a,\u8eab\u6559\u6bd4\u8bed\u8a00\u66f4\u91cd\u8981\u5462.<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u5b81\u9732\u513f<\/a><\/span>2010-12-09 19:26:29<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
# [\"\u704c\u6c34\"] \u8a00\u4f20\u8eab\u6559\u662f\u81f3\u7406\u3002<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\t
<\/td>\r\n\t\t\t<\/tr>\r\n\t\t<\/table>\r\n\t\t
\r\n\t\t\t
\u5b88\u6797\u4eba<\/a><\/span>2010-12-10 17:41:16<\/em> [\u4e3e\u62a5]<\/a><\/span><\/p>\r\n\t\t\t
\u7236\u6bcd\u662f\u5b69\u5b50\u7684\u6a21\u4eff\u5bf9\u8c61<\/div>\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t<\/div>\r\n\t<\/li>\t 已经不是UTF,GB2312的问题,目前个人已经无解了 写1个处理原有ajax的插件,保存时使用该插件 放出当当内容简介和导读的代码
<?php
$product_id = $_GET['product_id'];
$lines = file('http://product.dangdang.com/callback.php?type=detail&product_id='.$product_id);
$jsonObject = $lines;
$decodedObject = json_decode($jsonObject);
//print_r($decodedObject);
?>
<div class="abstract"><?php print_r( $decodedObject->abstract->all) ?> </div>
<div class="content"><?php print_r( $decodedObject->content->all) ?> </div>
<div class="authorintro"><?php print_r( $decodedObject->authorintro->all) ?> </div>
<div class="catalog"><?php print_r( $decodedObject->catalog->all) ?> </div>
<div class="mediafeedback"><?php print_r( $decodedObject->mediafeedback->all) ?> </div>
<div class="extract"><?php print_r( $decodedObject->extract->all) ?> </div>
</body> 对了。假设免费用户没有插件可用内容分页的方法来实现 这样的情况,是不是免费用户,就不能采集了呢? Json只能使用UTF8编码才能Decode。
页:
[1]