I have an apache log I am looking to filter on HTTP status codes. Ive done the math manually since my ruby program isn’t returning the right numbers. I also plan to use this same code to possibly output requency of vitsits by IP and URLs visited but not if I cant get my code working.
heres what i got
class Numeric
def percent_of(n)
self.to_f / n.to_f * 100.0
end
end
stat_hash = Hash.new(0)
url_hash = Hash.new(0)
ip_hash = Hash.new(0)
#lineArray= Array.new()
file = File.open("./test_log", 'r')
total = 0
#load hash
file.each_line do |line|
total += 1
lnarr = line.chomp.split #Split is messed up needs to split to array first i think then hash from array similar to Lab 10
#Array needs to split to {IP,Date/time, URL, Status, size}
#http://httpd.apache.org/docs/1.3/logs.html
stat_hash[lnarr[-2]] += 1
url_hash[lnarr[-4]] += 1
ip_hash[lnarr[0]] +=1
end
for i in 0..stat_hash.length-1 do
percent = stat_hash.percent_of(total) #current equation will not work. Hash does populate with the http status do but math
#does not output any average. returns undefined method but method is defined at top.
status = stat_hash[i]
end
puts total
#puts (stat_hash[i]/total)
stat_hash.sort.each { |status| puts "#{status}:"+ percent}
The test log I am running against can be found here:
http://dl.dropbox.com/u/71927/test_log
I’ve done it manually and am expecting
200:90%
301:8%
401:1%
404:1%
but I am getting
200: 97%
301: 1%
304: 8%
403: 2%
which is simply their number of time they occur. if you add them up its 108 and there are 108 lines in the log file with status codes in them.
EDIT: As a follow up to this question percentages to round up as I need them to using the ceiling function of the float data type with ‘.ceil’ and got my data to parse specific data by writing in command line options with the optparser
I’m not sure if you copied/pasted correctly, but I don’t see how you’re successfully accessing the
percentvariable outside of your for loop.I’d drop the
percent_offunction and simplify the end using the (more Ruby-esque)mapfunction like this:I’d also recommend using something more descriptive than
total(such astotal_occurencesor similar). I couldn’t access your test_log file, but I ran your code with my change against a dummy log I had and it produced accurate results.