2021-03-30

R web scraping function getPageNumber error

I am building a webscraper and trying to understand why my getPage Number Function does not work. The function worked last night and tonight I have been having an error getting the right output

library(rvest)
library(RCurl)
library(XML)
library(stringr)

    getPageNumber <- function(URL) {
      parsedDocument <- read_html(URL)
      results_per_page <- length(parsedDocument %>% html_nodes(".sr-list"))
      total_results <- parsedDocument %>%
        toString() %>%
        str_match(., 'num_results":"(.*?)"') %>% 
        .[,2] %>%
        as.integer()
      pageNumber <- tryCatch(ceiling(total_results / results_per_page), error = function(e) {1})
      return(pageNumber)
    }
    getPageNumber("https://academic.oup.com/dnaresearch/search-results?rg_IssuePublicationDate=01%2F01%2F2010%20TO%2012%2F31%2F2010&fl_SiteID=5275&page=")

The output I am getting is NA, when it should be numeric number



from Recent Questions - Stack Overflow https://ift.tt/3u2Csgo
https://ift.tt/eA8V8J

No comments:

Post a Comment