Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7927093
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T19:08:24+00:00 2026-06-03T19:08:24+00:00

I was trying to optimize a program I’m tinkering with, when I noticed that

  • 0

I was trying to optimize a program I’m tinkering with, when I noticed that doing value = i % 65536 appeared to be running slower then doing value = i % (2**16).

To test this, I ran the following program:

import cProfile
import pstats

AMOUNT = 100000000

def test1():
    for i in xrange(AMOUNT):
        value = i % 65536
    return

def test2():
    for i in xrange(AMOUNT):
        value = i % (256**2)
    return

def test3():
    for i in xrange(AMOUNT):
        value = i % (16**4)
    return

def test4():
    for i in xrange(AMOUNT):
        value = i % (4**8)
    return

def test5():
    for i in xrange(AMOUNT):
        value = i % (2**16)
    return

def run_tests():
    test1()
    test2()
    test3()
    test4()
    test5()
    return

if __name__ == '__main__':
    cProfile.run('run_tests()', 'results')
    stats = pstats.Stats('results')
    stats.sort_stats('calls', 'nfl')
    stats.print_stats()

…which produced the following output:

Fri May 11 15:11:59 2012    results

         8 function calls in 40.473 seconds

   Ordered by: call count, name/file/line

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000   40.473   40.473 <string>:1(<module>)
        1    0.000    0.000   40.473   40.473 test.py:31(run_tests)
        1   10.466   10.466   10.466   10.466 test.py:6(test1)
        1    7.475    7.475    7.475    7.475 test.py:11(test2)
        1    7.485    7.485    7.485    7.485 test.py:16(test3)
        1    7.539    7.539    7.539    7.539 test.py:21(test4)
        1    7.508    7.508    7.508    7.508 test.py:26(test5)

Using 65536 was the slowest at 10.466 seconds, while doing 256**2 was the fastest at 7.475 seconds (with the other possible exponent values falling in between). Granted, this difference in speed is only noticeable given high amounts of repetition, but I’m still curious as to why this occurs.

Why is taking the mod of a number by 65536 slower then taking the mod using exponents? They should evaluate to the same number, and I would have thought that it would take longer for the python interpreter to fully evaluate exponents before taking the mod.

By extension, is it generally more efficient to use powers of two in python expressions rather then fully typing the number out? And does this pattern hold true for operations besides modulus or for other numbers besides 2?

(btw, I’m using Python 2.7.2 (32 bit), and I ran the above on a 64 bit Windows 7 laptop).

EDIT:
So I tried reversing the order of the functions I call, and now the opposite is true. It looks like whatever the first function is in run_tests will always run a bit slower when using cProfile, which is weird. So, lesson learned, I guess — profilers are weird 😀

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T19:08:27+00:00Added an answer on June 3, 2026 at 7:08 pm

    There is no difference in the generated bytecode, because the compiler does its job well and optimizes away the constant arithmetic expression. That means your test results are just a coincidence (try timing the functions in a different order!).

    >>> import dis
    >>> dis.dis(test1)
      2           0 SETUP_LOOP              30 (to 33)
                  3 LOAD_GLOBAL              0 (xrange)
                  6 LOAD_GLOBAL              1 (AMOUNT)
                  9 CALL_FUNCTION            1
                 12 GET_ITER            
            >>   13 FOR_ITER                16 (to 32)
                 16 STORE_FAST               0 (i)
    
      3          19 LOAD_FAST                0 (i)
                 22 LOAD_CONST               1 (65536)
                 25 BINARY_MODULO       
                 26 STORE_FAST               1 (value)
                 29 JUMP_ABSOLUTE           13
            >>   32 POP_BLOCK           
    
      4     >>   33 LOAD_CONST               0 (None)
                 36 RETURN_VALUE        
    >>> dis.dis(test5)
      2           0 SETUP_LOOP              30 (to 33)
                  3 LOAD_GLOBAL              0 (xrange)
                  6 LOAD_GLOBAL              1 (AMOUNT)
                  9 CALL_FUNCTION            1
                 12 GET_ITER            
            >>   13 FOR_ITER                16 (to 32)
                 16 STORE_FAST               0 (i)
    
      3          19 LOAD_FAST                0 (i)
                 22 LOAD_CONST               3 (65536)
                 25 BINARY_MODULO       
                 26 STORE_FAST               1 (value)
                 29 JUMP_ABSOLUTE           13
            >>   32 POP_BLOCK           
    
      4     >>   33 LOAD_CONST               0 (None)
                 36 RETURN_VALUE        
    

    (well actually there is a difference: The number is stored at different offsets in the constant table. I can’t imagine this causing any difference, though).

    For completeness, here’s a proper test that uses the timeit module:

    import timeit
    
    setup = "i = 1337"
    
    best1 = best2 = float("inf")
    for _ in range(5000):
      best1 = min(best1, timeit.timeit("i % 65536", setup=setup, number=10000))
    for _ in range(5000):
      best2 = min(best2, timeit.timeit("i % (2**16)", setup=setup, number=10000))
    print best1
    print best2
    

    Note that I am measuring the minimum time needed, rather than the average. If it takes longer for some reason, this just means that it was interrupted more often (because the code doesn’t depend on anything but the power of your CPU).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

During program optimization, trying to optimize a loop that iterates through a vector, I
I'm trying to optimize some code that takes some test data stored in CSV
I am trying to optimize my program. I think I understand the basics of
I've been trying to optimize a numeric program of mine, and have run into
I'm trying to optimize the size of my Delphi classes so that they take
I am trying to optimize a query that does something like this: SELECT ...
I'm trying to optimize memory usage of a program and therefore wants to remove
I'm writing a program in C++ that uses genetic techniques to optimize an expression
I'm trying to optimize my WPF/VB.net program, and I'm considering freezing some objects. However,
I've been trying to optimize the Windows program I am developing, trying to find

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.