如何用R语言爬取网页表格数据节省一天工作时间

Python010

如何用R语言爬取网页表格数据节省一天工作时间,第1张

方法/步骤fromurllib.requestimporturlopen用于打开网页fromurllib.errorimportHTTPError用于处理链接异常frombs4importBeautifulSoup用于处理html文档importre用正则表达式匹配目标字符串例子用关于抓取百度新闻网页的某些图片链接fromurllib.requestimporturlopenfromurllib.errorimportHTTPErrorfrombs4importBeautifulSoupimportreurl="/"try:html=urlopen(url)exceptHTTPErrorase:print(e)try:bsObj=BeautifulSoup(html.read())images=bsObj.findAll("img",{"src":re.compile(".*")})forimageinimages:print(image["src"])exceptAttributeErrorase:print(e)importjava.io.BufferedReaderimportjava.io.IOExceptionimportjava.io.InputStreamReaderimportjava.net.HttpURLConnectionimportjava.net.MalformedURLExceptionimportjava.net.URLpublicclassCapture{publicstaticvoidmain(String[]args)throwsMalformedURLException,IOException{StringstrUrl="/"URLurl=newURL(strUrl)HttpURLConnectionhttpConnection=(HttpURLConnection)url.openConnection()InputStreamReaderinput=newInputStreamReader(httpConnection.getInputStream(),"utf-8")BufferedReaderbufferedReader=newBufferedReader(input)Stringline=""StringBuilderstringBuilder=newStringBuilder()while((line=bufferedReader.readLine())!=null){stringBuilder.append(line)}Stringstring=stringBuilder.toString()intbegin=string.indexOf("")intend=string.indexOf("")System.out.println("IPaddress:"+string.substring(begin,end))}

journalism 畅通词汇 

英 ['dʒɜːnəlɪzəm]     美 ['dʒɜːrnəlɪzəm]    

n. 新闻工作;新闻业;新闻写作。

When he left school he took up journalism.

他离开学校后开始从事新闻工作。

In journalism, timeliness and accuracy should be equally important.

在新闻杂志业中,及时与准确应该同等重要。

近义词:

reportage 

英 [rɪ'pɔːtɪdʒ]     美 [rɪ'pɔːrtɪdʒ]    

n. 新闻报道;报告文学

The journalists wrote a sensational reportage of the scandal.

记者们对丑闻做了耸人听闻的报道。

His reportage fully complied with the facts.

他写的报告文学充分尊重了客观事实。