I’m trying to scrape some tables (election data) using the XML package. Browsing SO,

Question

0

Asked: May 26, 20262026-05-26T10:41:24+00:00 2026-05-26T10:41:24+00:00

I’m trying to scrape some tables (election data) using the XML package. Browsing SO,

0

I’m trying to scrape some tables (election data) using the XML package. Browsing SO, I found out how to scrape a single url using:

library(XML)
url <- "http://www.elecciones2011.gob.ar/paginas/paginas/dat99/DPR99999A.htm"
total <- readHTMLTable(url)
n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
df<-as.data.frame(total[[which.max(n.rows)]])

With the above code I get a nice enough result. I’m also able (with the readLines function and some tweaking) to get a vector with all the urls I want to scrape. Like this:

base_url <- "http://www.elecciones2011.gob.ar/paginas/paginas/"
urls <- paste(
  base_url,
  c(
    "dat02/DPR02999A", 
    "dat03/DPR03999A", 
    "dat04/DPR04999A", 
    "dat05/DPR05999A", 
    "dat06/DPR06999A", 
    "dat07/DPR07999A", 
    "dat08/DPR08999A", 
    "dat09/DPR09999A", 
    "dat10/DPR10999A", 
    "dat11/DPR11999A", 
    "dat12/DPR12999A", 
    "dat13/DPR13999A", 
    "dat14/DPR14999A", 
    "dat15/DPR15999A", 
    "dat16/DPR16999A", 
    "dat17/DPR17999A", 
    "dat18/DPR18999A", 
    "dat19/DPR19999A", 
    "dat20/DPR20999A", 
    "dat21/DPR21999A", 
    "dat22/DPR22999A", 
    "dat23/DPR23999A", 
    "dat24/DPR24999A"
  ),
  ".htm",
  sep = "" 
)

What I’d like to do is to create a function that runs the readHTMLTable function in all the urls and store the results in a vector or data frame (in one or many, whatever is easier). I’m quite new with R, and I’m particularly bad at functions. I tried something like…

tabla<- for (i in urls){
        readHTMLTable(urls)
        }

…but it’s not even close.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T10:41:24+00:00

The most basic approach, using a loop. This just wraps the code you supplied inside a for.

tabla <- list()
for(i in seq_along(urls))
{
    total <- readHTMLTable(urls[i])
    n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
    tabla[[i]] <- as.data.frame(total[[which.max(n.rows)]])
}

A more elegant approach, using lapply. Now the code supplied is put inside a function, which is called for each url.

tabla <- lapply(urls, function(url) {
    total <- readHTMLTable(url)
    n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
    as.data.frame(total[[which.max(n.rows)]])
})

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to scrape some tables (election data) using the XML package. Browsing SO,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply