|
我的测试任务比较简单,就先测试三页:
一、起始网址:http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=0&p=1&f=S&l=50&Query=CCL%2F435%2F%24+AND+APT%2F1&d=PTXT
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=0&p=(*)&f=S&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&Page=Next&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)
二、问题来了,在二级网址获取中,分析网页的二级网址代码:- <TR><TD valign=top>1</TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>8,898,149</A></TD>
- <TD valign=baseline><IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text"></TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>Biological data structure having multi-lateral, multi-scalar, and
- multi-dimensional relationships between molecular features and other data
- </A></TD>
- <DOCS: 127369>
- <TR><TD valign=top>2</TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>8,895,821</A></TD>
- <TD valign=baseline><IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text"></TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>Plants and seeds of hybrid corn variety CH786873
- </A></TD>
- <DOCS: 127369>
- <TR><TD valign=top>3</TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=3&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>8,895,820</A></TD>
- <TD valign=baseline><IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text"></TD>
- <TD valign=top><A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=3&p=1&f=G&l=50&d=PTXT&S1=(435%2F$.CCLS.+AND+(A+or+B%3F).KD.)&OS=CCL/435/$+AND+APT/1&RS=(CCL/435/$+AND+APT/1)>Maize hybrid X08C981
- </A></TD>
- <DOCS: 127369>
复制代码 可以从代码中看到,每个记录中,都有两个相同的链接,而且用“手动填写链接地址规则”,我是这么写的,脚本规则“<TD valign=top><A HREF=[参数]>[标签:号码]</A></TD>”,实际链接:“http://patft.uspto.gov[参数1]”,但是你看上面的代码,这个明明有两个,怎么使其变为唯一,不然内容规则弄好以后,采集不到任何内容,显示重复。
这个问题怎么解决?
下面是一些截图:
在运行第一个(测试网页一个三个)起始网址时出现:
在运行第二第三个起始网址时出现问题:
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有帐号?加入会员
x
|