Sorry for my bad English.
I have a log file from a Web server with 120,000 lines.
Example of input file:
10.160.0.10;16.11.2011 12:56;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″
10.160.0.100;14.11.2011 7:22;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″
10.160.0.100;14.11.2011 10:45;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″
10.160.0.100;14.11.2011 10:53;/;-;”Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)”
I need to compare the IP address in the first line with IP in the second line
and at the same time
to compare the last box that contains the version of the web browser with version in the second line.
And second line with third line etc.
If first IP is same as second IP and together first version is same as second version
then add to the end of line info example #1 (that will be mean that it is first user)
If IP or version are different then add to the end of line #2 (second user).
It identifies users based on IP address and User-Agent field (based on different versions of a web browser).
Example of ouput file:
10.160.0.10;16.11.2011 12:56;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″;#1
10.160.0.100;14.11.2011 7:22;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″;#2
10.160.0.100;14.11.2011 10:45;/;-;”Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0″;#2
10.160.0.100;14.11.2011 10:53;/;-;”Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)”;#3
Do you have any idea how to do this?
Which method to use?
Thank you for help.
This is not complete nor anywhere near optimal, but is basically everything you need.