Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 680055
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T01:19:43+00:00 2026-05-14T01:19:43+00:00

Problem Hey folks. I’m looking for some advice on python performance. Some background on

  • 0

Problem Hey folks. I’m looking for some advice on python performance. Some background on my problem:

Given:

  1. A (x,y) mesh of nodes each with a value (0...255) starting at 0
  2. A list of N input coordinates each at a specified location within the range (0...x, 0...y)
  3. A value Z that defines the "neighborhood" in count of nodes

Increment the value of the node at the input coordinate and the node’s neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)

BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.

Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.

Current results Between my start and the comments below, we’ve got several implementations.

Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:

  f1: 2.819s
  f2: 1.567s
  f3: 1.593s
   f: 1.579s
 f3b: 1.526s
  f4: 0.978s

f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei’s suggestion in the comments and replaces the outer for with map()
f is Chris’s suggestion in the answers below
f3b is kriss’s take on f3
f4 is Alex’s contribution.

Code is included below for your perusal.

Question How can I further reduce the processing time? I’d prefer sub-1.0s for the test parameters.

Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I’m trying to avoid any third party packages. Also, I’ve generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.

thanks much!


**`f1` is the initial naive implementation: three nested `for` loops.**

def f1(x,y,n,z):
    rows = [[0]*x for i in xrange(y)]

    for i in range(n):
        inputX, inputY = (int(x*random.random()), int(y*random.random()))
        topleft = (inputX - z, inputY - z)
        for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
            for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
                if rows[i][j] <= 255: rows[i][j] += 1

f2 is replaces the inner for loop with a list comprehension.

def f2(x,y,n,z):
    rows = [[0]*x for i in xrange(y)]

    for i in range(n):
        inputX, inputY = (int(x*random.random()), int(y*random.random()))
        topleft = (inputX - z, inputY - z)
        for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
            l = max(0, topleft[1])
            r = min(topleft[1]+(z*2), y)
            rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]

UPDATE: f3 is based on Andrei‘s suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.

rows = [[0]*x for i in xrange(y)]

def f3(x,y,n,z):
    inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
    rows = map(g, inputs)

def g(input):
    inputX, inputY = input
    topleft = (inputX - 75, inputY - 75)
    for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
        l = max(0, topleft[1])
        r = min(topleft[1]+(75*2), 1024)
        rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]

UPDATE3: ChristopeD also pointed out a couple improvements.

def f(x,y,n,z):
    rows = [[0] * y for i in xrange(x)]
    rn = random.random
    for i in xrange(n):
        topleft = (int(x*rn()) - z, int(y*rn()) - z)
        l = max(0, topleft[1])
        r = min(topleft[1]+(z*2), y)
        for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
            rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]

UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.

def f3b(x,y,n,z):
    rn = random.random    
    rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]

def g1(x, y, z):
    l = y - z if y - z > 0 else 0
    r = y + z if y + z < 1024 else 1024
    for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
        rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]

UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.

def f4(x,y,n,z):
    rows = [[0]*y for i in range(x)]
    rr = random.randrange
    inc = (1).__add__
    sat = (0xff).__and__
    
    for i in range(n):
        inputX, inputY = rr(x), rr(y)
        b = max(0, inputX - z)
        t = min(inputX + z, x)
        l = max(0, inputY - z)
        r = min(inputY + z, y)
        for i in range(b, t):
            rows[i][l:r] = map(inc, rows[i][l:r])
    for i in range(x):
      rows[i] = map(sat, rows[i])

Also, since we all seem to be hacking around with variations, here’s my test harness to compare speeds: (improved by ChristopheD)

def timing(f,x,y,z,n):
    fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
    ctx = "from __main__ import %s" % f.__name__ 
    results = timeit.Timer(fn, ctx).timeit(10)
    return "%4.4s: %.3f" % (f.__name__, results / 10.0)

if __name__ == "__main__":
    print timing(f, 1024, 1024, 400, 75)
    #add more here.
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T01:19:44+00:00Added an answer on May 14, 2026 at 1:19 am

    On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:

    $ python -mtimeit -s'import op' 'op.f1()'
    10 loops, best of 3: 5.58 sec per loop
    $ python -mtimeit -s'import op' 'op.f2()'
    10 loops, best of 3: 3.15 sec per loop
    

    So, my machine is slower than yours by a factor of a bit more than 1.9.

    The fastest code I have for this task is:

    def f3(x=x,y=y,n=n,z=z):
        rows = [[0]*y for i in range(x)]
        rr = random.randrange
        inc = (1).__add__
        sat = (0xff).__and__
    
        for i in range(n):
            inputX, inputY = rr(x), rr(y)
            b = max(0, inputX - z)
            t = min(inputX + z, x)
            l = max(0, inputY - z)
            r = min(inputY + z, y)
            for i in range(b, t):
                rows[i][l:r] = map(inc, rows[i][l:r])
        for i in range(x):
          rows[i] = map(sat, rows[i])
    

    which times as:

    $ python -mtimeit -s'import op' 'op.f3()'
    10 loops, best of 3: 3 sec per loop
    

    so, a very modest speedup, projecting to more than 1.5 seconds on your machine – well above the 1.0 you’re aiming for:-(.

    With a simple C-coded extensions, exte.c…:

    #include "Python.h"
    
    static PyObject*
    dopoint(PyObject* self, PyObject* args)
    {
        int x, y, z, px, py;
        int b, t, l, r;
        int i, j;
        PyObject* rows;
    
        if(!PyArg_ParseTuple(args, "iiiiiO",
                             &x, &y, &z, &px, &py, &rows
            ))
            return 0;
    
        b = px - z;
        if (b < 0) b = 0;
        t = px + z;
        if (t > x) t = x;
        l = py - z;
        if (l < 0) l = 0;
        r = py + z;
        if (r > y) r = y;
    
        for(i = b; i < t; ++i) {
            PyObject* row = PyList_GetItem(rows, i);
            for(j = l; j < r; ++j) {
                PyObject* pyitem = PyList_GetItem(row, j);
                long item = PyInt_AsLong(pyitem);
                if (item < 255) {
                    PyObject* newitem = PyInt_FromLong(item + 1);
                    PyList_SetItem(row, j, newitem);
                }
            }
        }
    
        Py_RETURN_NONE;
    }
    
    static PyMethodDef exteMethods[] = {
        {"dopoint", dopoint, METH_VARARGS, "process a point"},
        {0}
    };
    
    void
    initexte()
    {
        Py_InitModule("exte", exteMethods);
    }
    

    (note: I haven’t checked it carefully — I think it doesn’t leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do

    import exte
    def f4(x=x,y=y,n=n,z=z):
        rows = [[0]*y for i in range(x)]
        rr = random.randrange
    
        for i in range(n):
            inputX, inputY = rr(x), rr(y)
            exte.dopoint(x, y, z, inputX, inputY, rows)
    

    and the timing

    $ python -mtimeit -s'import op' 'op.f4()'
    10 loops, best of 3: 345 msec per loop
    

    shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I’ve seen a comment saying you don’t want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I’ll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hey, folks. I'm looking for some regular expressions to help grab street addresses and
Hey looking for some help on this problem. Currently I'm developing an iPhone application
Hey I have a problem comparing the value of a CGPoint (struct with two
Hey folks, ok so this is my problem, I need to display HTML code
hey I'm looking for are clean solution to this problem: i start the loop
Hey so i managed to locate the problem in some graphics math i am
Hey, I'm having some trouble with this problem, and I don't even know where
Hey Folks am seemingly having problems with .htaccess files. The problem: I have a
Hey I Have some Problem to Find Exact Imotion(Facebook) for That Smily Code.. I
Hey here is my problem : i made a flash and i can`t change

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.