So, I have a regexp that I have tested on Rubular and from the CLI (using the pry gem). This parses a custom Apache log format. When I feed input to it in pry, it works as expected (e.g. $~ is populated.) Rubular also reports correct matching and grouping for various lines of input. When run from the code below, no matches.
I have also tried messing with String.chomp! and the \n character, in case that was throwing off the match, but various permutations have no effect.
I’m sure it’s something a more experienced Rubyist could shed some light on.
Rubular link: http://www.rubular.com/r/fycHVYZdZz
Here is the relevant code, regex, and input — and thanks in advance:
log_regex = %r{
(?<ip>(([0-9]{1,3}\.){3}[0-9]{1,3}))
\s-\s
(?<src_ip>.*)
-\s
(?<date>\[.*\])
\s
(?<url>".+")
\s
(?<response>\d{3})
\s
(?<length>\d+)
\s
(?<referer>".+")
\s
(?<useragent>".*")
\s(?<host>.*)?
/ix
}
logfile = ARGV[0]
def process_log(log_regex,logfile)
IO.foreach(logfile, 'r') do |line|
line.chomp!
log_regex.match(line) do |m|
puts m['ip']
end
end
end
process_log(log_regex,logfile)
Sample input:
209.123.123.123 - - [05/Jul/2012:11:02:01 -0700] "GET /url/mma/rss2.0.xml HTTP/1.1" 301 0 "-" "FeedBurner/1.0 (http://www.FeedBurner.com)" xml.somewhere.com
You probably want to take a close look at the definition of your regex. Your flags are inside the definition of the pattern, instead of following the closing of
%rwhere they belong:should be:
From IRB:
Both PRY and IRB return the same results for the above tests.