Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7633273
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T06:50:12+00:00 2026-05-31T06:50:12+00:00

I have a python script which extracts unique IP addresses from a log file

  • 0

I have a python script which extracts unique IP addresses from a log file and displays their count of how many times those IPs are pinged the code is as follows.

import sys

def extract_ip(line):
    return line.split()[0]

def increase_count(ip_dict, ip_addr):
    if ip_addr in ip_dict:
       ip_dict[ip_addr] += 1
    else:
       ip_dict[ip_addr] = 1

def read_ips(infilename):
    res_dict = {}
    log_file = file(infilename)
    for line in log_file:
        if line.isspace():
           continue
        ip_addr = extract_ip(line)
        increase_count(res_dict, ip_addr)
    return res_dict

def write_ips(outfilename, ip_dict):
    out_file = file(outfilename, "w")
    for ip_addr, count in ip_dict.iteritems():
        out_file.write("%5d\t%s\n" % (count, ip_addr))
    out_file.close()

def parse_cmd_line_args():
    if len(sys.argv)!=3:
       print("Usage: %s [infilename] [outfilename]" % sys.argv[0])
       sys.exit(1)
    return sys.argv[1], sys.argv[2]

def main():
    infilename, outfilename = parse_cmd_line_args()
    ip_dict = read_ips(infilename)
    write_ips(outfilename, ip_dict)

if __name__ == "__main__":
    main()

The log file is in the following format with 2L lines. These are the first 30 lines of the log file

220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
111.92.9.222 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
120.56.236.46 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
49.138.106.21 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - -
117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
122.160.166.220 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /welcome.html HTTP/1.1" 204 212 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
122.169.136.211 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /css/epic.css HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
118.97.38.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /js/flash_detect_min.js HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-bottom.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Facebook_Like.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Twitter_Follow.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-top.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
49.138.106.21 - - [06/Mar/2012:00:00:01 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
120.61.182.186 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -

the output of the file is in the following format

    Number of Times      IPS
     158            111.92.9.222
     11             58.97.187.231
     30             212.57.209.41
     5              119.235.51.66
     3              122.168.134.106
     5              180.234.220.75
     13             115.252.223.243

Here the ip 111.92.9.222 – – [06/Mar/2012:00:00:00 -0800] “GET /mysidebars/newtab.html HTTP/1.1” 404 0 – – is pinged into epic 158 times totally.

Now i want to add a functionality to the code so that if i pass a particular URL, it should return how many times the URL was accessed by which IP addresses(IP address either from log file or from output file).

E.g. if I pass the url as input: http://www.epicbrowser.com/hrefadd.xml

the output should be in the following format

     10.10.128.134        4
     10.134.222.232       6
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T06:50:13+00:00Added an answer on May 31, 2026 at 6:50 am

    I assume your requirement that you want only IPs of one given URL is true. In this case you just have to add an additional filter to the program which filters out the unwanted lines. The structure of the program can be unchanged.

    Because the log files do not know anything about hosts, you have to specify only the path part of the URL as the third parameter; example: “/hrefadd.xml”

    #!/usr/bin/env python
    # 
    # Counts the IP addresses of a log file.
    # 
    # Assumption: the IP address is logged in the first column.
    # Example line: 117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] \
    #    "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
    #
    
    import sys
    
    def urlcheck(line, url):
        '''Checks if the url is part of the log line.'''
        lsplit = line.split()
        if len(lsplit)<7:
            return False
        return url==lsplit[6]
    
    def extract_ip(line):
        '''Extracts the IP address from the line.
           Currently it is assumed, that the IP address is logged in
           the first column and the columns are space separated.'''
        return line.split()[0]
    
    def increase_count(ip_dict, ip_addr):
        '''Increases the count of the IP address.
           If an IP address is not in the given dictionary,
           it is initially created and the count is set to 1.'''
        if ip_addr in ip_dict:
            ip_dict[ip_addr] += 1
        else:
            ip_dict[ip_addr] = 1
    
    def read_ips(infilename, url):
        '''Read the IP addresses from the file and store (count)
           them in a dictionary - returns the dictionary.'''
        res_dict = {}
        log_file = file(infilename)
        for line in log_file:
            if line.isspace():
                continue
            if not urlcheck(line, url):
                continue
            ip_addr = extract_ip(line)
            increase_count(res_dict, ip_addr)
        return res_dict
    
    def write_ips(outfilename, ip_dict):
        '''Write out the count and the IP addresses.'''
        out_file = file(outfilename, "w")
        for ip_addr, count in ip_dict.iteritems():
            out_file.write("%s\t%5d\n" % (ip_addr, count))
        out_file.close()
    
    def parse_cmd_line_args():
        '''Return the in and out file name.
           If there are more or less than two parameters,
           an error is logged in the program is exited.'''
        if len(sys.argv)!=4:
            print("Usage: %s [infilename] [outfilename] [url]" % sys.argv[0])
            sys.exit(1)
        return sys.argv[1], sys.argv[2], sys.argv[3]
    
    def main():
        infilename, outfilename, url = parse_cmd_line_args()
        ip_dict = read_ips(infilename, url)
        write_ips(outfilename, ip_dict)
    
    if __name__ == "__main__":
        main()
    

    IMHO it would be helpful if also the original post was referenced.

    IMHO you should leave the comments in place.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a small python script which i use everyday......it basically reads a file
I have a list which I have obtained from a python script. the content
I have a very simple python script that should scan a text file, which
I have a python script which should parse a file and produce some output
I have a Python script which accepts a XML file as input and then
i have a python script which should invoke a .exe file to get some
I have a Python script which uses a glade file to define its UI,
I have a python script which process a file line by line, if the
I have Python script which retrieves data from an API in JSON format (at
I have a Python script which processes a .txt file which contains report usage

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.