I’m reading in data about an HTTP access log. I’ve got a file with columns for the ip address, year, month, day, hour and requested URL. I read the file in like this:
ipdata = scan(file="sample_r.log", what=list(ip="", year=0, month=0, day=0, hour=0, verb="", url=""))
This seems to work. R-Studio says that ipdata is a list[7] and “names(ipdata)” returns
[1] "ip" "year" "month" "day" "hour" "verb" "url"
So that seems cool. I wanted to do something fun, like graph some data for a specific hour. I tried doing a subset:
s <- subset(ipdata, ipdata$hour==3)
This data looks remarkably different than the first data frame. s is a list[297275] and the following doesn’t work right:
> table(ipdata$verb)
GET POST
2870709 1596748
> table(s$verb)
character(0)
Am I going about this the correct way? What I typically do is wrap my data frame in a table() and then barplot or dotplot it. Is R a good way to do this? I want to say “Show me all of the top URLs in hour 3”, for example. Or “How many times did this IP address show up per hour?”
Update It looks like by using read.table instead of scan I was able to get a data frame. Apparently scan returns a list of lists or something? Definitely confusing to a n00b like myself but I’m feeling good about it now.
If you ran
…. you would probably see that it was pretty much the same as the results of your read.table() operation.
read.tableis a wrapper forscanand does a lot of formatting and consistency checking.