我在爬取一些网站。
链接不正确。页面打不开。
所以我想在原始数据中添加一个链接
或者可能有比我想到的更好的方法。
如果有好的方法请告诉我
-Ex-
[[错误的地址]]
/qna/detail.nhn?d1id=7&dirId=70111&docId=280474152
[[你想添加的文本]]
我想在我的代码前面添加一个地址(# 公告URL)
Http: // ~ naver.com
library(httr)library(rvest)library(stringr)# Bulletin URLlist.url = 'http://kin.naver.com/qna/list.nhn?m=expertAnswer&dirId=70111'# Vector to store title and bodytitles = c()contents = c()# 1 to 10 page bulletin crawlingfor(i in 1:10){ url = modify_url(list.url, query=list(page=i)) # Change the page in the bulletin URL h.list = read_html(url, encoding = 'utf-8') # Get a list of posts, read and save html files from url # Post link extraction title.link1 = html_nodes(h.list, '.title') #class of title title.links = html_nodes(title.link1, 'a') #title.link1 to a로 article.links = html_attr(title.links, 'href') #Extract attrribute for(link in article.links){ h = read_html(link) # Get the post # title title = html_text(html_nodes(h, '.end_question._end_wrap_box h3')) title = str_trim(repair_encoding(title)) titles = c(titles, title) # content content = html_nodes(h, '.end_question .end_content._endContents') ## Mobile question content no.content = html_text(html_nodes(content, '.end_ext2')) content = repair_encoding(html_text(content)) ## Mobile question content ## ex) http://kin.naver.com/qna/detail.nhn?d1id=8&dirId=8&docId=235904020&qb=7Jes65Oc66aE&enc=utf8§ion=kin&rank=19&search_sort=0&spq=1 if (length(no.content) > 0) { content = str_replace(content, repair_encoding(no.content), '') } content <- str_trim(content) contents = c(contents, content) print(link) }}# saveresult = data.frame(titles, contents)
回答:
如果在for循环之前添加article.links <- paste0("http://kin.naver.com", article.links)
,这似乎可以工作(运行)。