Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7564091
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T13:49:43+00:00 2026-05-30T13:49:43+00:00

I am comparing performance of numpy vs matlab , in several cases I observed

  • 0

I am comparing performance of numpy vs matlab, in several cases I observed that numpy is significantly slower (indexing, simple operations on arrays such as absolute value, multiplication, sum, etc.). Let’s look at the following example, which is somehow striking, involving the function digitize (which I plan to use for synchronizing timestamps):

import numpy as np
import time
scale=np.arange(1,1e+6+1)
y=np.arange(1,1e+6+1,10)
t1=time.time()
ind=np.digitize(scale,y)
t2=time.time()
print 'Time passed is %2.2f seconds' %(t2-t1)

The result is:

Time passed is 55.91 seconds

Let’s now try the same example Matlab using the equivalent function histc

scale=[1:1e+6];
y=[1:10:1e+6];
tic
[N,bin]=histc(scale,y);
t=toc;
display(['Time passed is ',num2str(t), ' seconds'])

The result is:

Time passed is 0.10237 seconds

That’s 560 times faster!

As I’m learning to extend Python with C++, I implemented my own version of digitize (using boost libraries for the extension):

import analysis # my C++ module implementing digitize
t1=time.time()
ind2=analysis.digitize(scale,y)
t2=time.time()
print 'Time passed is %2.2f seconds' %(t2-t1)
np.all(ind==ind2) #ok

The result is:

Time passed is 0.02 seconds

There is a bit of cheating as my version of digitize assumes inputs are all monotonic, this might explain why it is even faster than Matlab. However, sorting an array of size 1e+6 takes 0.16 seconds (with numpy.sort), making therefore the performance of my function worse (by a factor of approx 1.6) compared to the Matlab function histc.

So the questions are:

  • Why is numpy.digitize so slow? Is this function not supposed to be written in compiled and optimized code?
  • Why is my own version of digitize much faster than numpy.digitize, but still slower than Matlab (I am quite confident I use the fastest algorithm possible, given that I assume inputs are already sorted)?

I am using Fedora 16 and I recently installed ATLAS and LAPACK libraries (but there has been so change in performance). Should I perhaps rebuild numpy? I am not sure if my installation of numpy uses the appropriate libraries to gain maximum speed, perhaps Matlab is using better libraries.

Update

Based on the answers so far, I would like to stress that the Matlab function histc is not equivalent to numpy.histogram if someone (like me in this case) does not care about the histogram. I need the second output of hisc, which is a mapping from input values to the index of the provided input bins. Such an output is provided by the numpy functions digitize and searchsorted. As one of the answers says, searchsorted is much faster than digitize. However, searchsorted is still slower than Matlab by a factor 2:

t1=time.time()
ind3=np.searchsorted(y,scale,"right")
t2=time.time()
print 'Time passed is %2.2f seconds' %(t2-t1)

np.all(ind==ind3) #ok

The result is

Time passed is 0.21 seconds

So the questions are now:

  1. What is the sense of having numpy.digitize if there is an equivalent function numpy.searchsorted which is 280 times faster?

  2. Why is the Matlab function histc (which also provides the output of numpy.searchsorted) 2 times faster than numpy.searchsorted?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T13:49:44+00:00Added an answer on May 30, 2026 at 1:49 pm

    First, let’s look at why numpy.digitize is slow. If your bins are found to be monotonic, then one of these functions is called depending on whether the bins are nondecreasing or nonincreasing (the code for this is found in numpy/lib/src/_compiled_base.c in the numpy git repo):

    static npy_intp
    incr_slot_(double x, double *bins, npy_intp lbins)
    {
        npy_intp i;
    
        for ( i = 0; i < lbins; i ++ ) {
            if ( x < bins [i] ) {
                return i;
            }
        }
        return lbins;
    }
    
    static npy_intp
    decr_slot_(double x, double * bins, npy_intp lbins)
    {
        npy_intp i;
    
        for ( i = lbins - 1; i >= 0; i -- ) {
            if (x < bins [i]) {
                return i + 1;
            }
        }
        return 0;
    }
    

    As you can see, it is doing a linear search. Linear search is much, much slower than binary search so there is your answer as to why it is slow. I will open a ticket for this on the numpy tracker.

    Second, I think that Matlab is actually slower than your C++ code because Matlab also assumes that the bins are monotonically nondecreasing.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Comparing string in C# is pretty simple. In fact there are several ways to
When comparing the performance of operations this is how I would typicaly do the
I am comparing performance of Node.js (0.5.1-pre) vs Apache (2.2.17) for a very simple
Comparing LinkedLists and Arrays while also comparing their differences with sorted and unsorted data
In C#, would there be any difference in performance when comparing the following THREE
I have some bad performance issues in my application. One of the big operations
I heard a lot of people saying that java is slow comparing .net, like
I'm currently comparing the performance of 4 different Oracle .net drivers. ODP.Net, DataDirect, OraDirect
Can anyone provide any concrete evidence of performance when comparing int = int and:
Is there any performance testing results available in comparing traditional for loop vs Iterator

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.