I’ve been searching for a while any answer to my question. I read this, this and this and some others related but I still get not answer.
My problem is quite simple (I hope it is) but the answer is not (at least for myself), I want to import some economic data from this web which is an indicator for Nicaraguan economic activity measured each month, so far I’ve tried this:
library(XML)
u <- "http://www.bcn.gob.ni/estadisticas/trimestrales_y_mensuales/siec/datos/4.IMAE.htm"
u <- htmlParse(u,encoding="UTF-8")
imae <- readHTMLTable(doc=u, header=T)
imae
library(httr)
u2 <- "http://www.bcn.gob.ni/estadisticas/trimestrales_y_mensuales/siec/datos/4.IMAE.htm"
page <- GET(u2, user_agent("httr"))
x <- readHTMLTable(text_content(page), as.data.frame=TRUE)
with no success as you can imagine. The first chunk of code gave me this output
$`NULL`
BANCO CENTRAL DE NICARAGUA NA NA NA NA NA NA NA NA NA NA NA NA NA
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <U+633C><U+3E64>ndice Mensual de Actividad Económica(IMAE) <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 (Base: 1994=100) <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Año Ene Feb Mar Abr May Jun Jul Ago Sep Oct Nov Dic Promedio
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 1994 101.6 107.6 100.1 95.7 94.7 92.8 92.1 96.8 98.5 97.4 101.7 121.1 100.0
8 Fuente: BCN. <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
I tried using skip.rows=1:5 but it doesn’t really change the main result which is too much NA. Is there anybody who can shed some light on this question?
The expected result is a data.frame with the information shown in this web
As I mentioned in my comment, the problem is most likely arising because of the poorly coded table.
You can try an approach something like the following (tested on Ubuntu using RStudio). It requires that you have wget and html tidy installed. If you don’t want to install these useful programs, jump to the updated part of this answer.
Download the page and “tidy” it up.
Proceed with R as you normally would
If we view the output of the above
readHTMLTable, we would see that we need to skip a few rows. Let’s run it again:Update: A little function to help out
If you can live with having to do some text cleanup for the accented characters, the W3C offers an online implementation of html tidy. This allows you to write a basic function like the following:
Usage is simple: