Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7047245
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T02:44:52+00:00 2026-05-28T02:44:52+00:00

I’d like to create 2d histograms in python from large datasets (100000+ samples) stored

  • 0

I’d like to create 2d histograms in python from large datasets (100000+ samples) stored in a HDF5 file. I came up with the following code:

import sys
import h5py
import numpy as np
import matplotlib as mpl
import matplotlib.pylab

f = h5py.File(sys.argv[1], 'r')

A = f['A']
T = f['T']

at_hist, xedges, yedges = np.histogram2d(T, A, bins=500)
extent = [yedges[0], yedges[-1], xedges[0], xedges[-1]]

fig = mpl.pylab.figure()
at_plot = fig.add_subplot(111)

at_plot.imshow(at_hist, extent=extent, origin='lower', aspect='auto')

mpl.pylab.show()

f.close()

It takes about 15s to execute (100000 data points). CERN’s Root however (using its own tree data structure instead of HDF5) can do this in less than 1s. Do you have any idea how I could speed up the code? I could also change the structure of the HDF5 data if it would be helpful.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T02:44:52+00:00Added an answer on May 28, 2026 at 2:44 am

    I would try a few different things.

    1. Load your data from the hdf file instead of passing in what are effectively memory-mapped arrays.
    2. If that doesn’t fix the problem, you can exploit a scipy.sparse.coo_matrix to make the 2D histogram. With older versions of numpy, digitize (which all of the various histogram* functions use internally) could use excessive memory under some circumstances. It’s no longer the case with recent (>1.5??) versions of numpy, though.

    As an example of the first suggestion, you’d do something like:

    f = h5py.File(sys.argv[1], 'r')
    A = np.empty(f['A'].shape, f['A'].dtype)
    T = np.empty(f['T'].shape, f['T'].dtype)
    f['A'].read_direct(A)
    f['T'].read_direct(T)
    

    The difference here is that the entirety of the arrays will be read into memory, instead of being h5py‘s array-like objects, which are basically efficient memory-mapped arrays stored on disk.

    As for the second suggestion, don’t try it unless the first suggestion didn’t help your problem.

    It probably won’t be significantly faster (and is likely slower for small arrays), and with recent versions of numpy, it’s only slightly more memory-efficient. I do have a piece of code where I deliberately do this, but I wouldn’t recommend it in general. It’s a very hackish solution. In very select circumstances (many points and many bins), it can preform better than histogram2d, though.

    All those caveats aside, though, here it is:

    import numpy as np
    import scipy.sparse
    import timeit
    
    def generate_data(num):
        x = np.random.random(num)
        y = np.random.random(num)
        return x, y
    
    def crazy_histogram2d(x, y, bins=10):
        try:
            nx, ny = bins
        except TypeError:
            nx = ny = bins
        xmin, xmax = x.min(), x.max()
        ymin, ymax = y.min(), y.max()
        dx = (xmax - xmin) / (nx - 1.0)
        dy = (ymax - ymin) / (ny - 1.0)
    
        weights = np.ones(x.size)
    
        # Basically, this is just doing what np.digitize does with one less copy
        xyi = np.vstack((x,y)).T
        xyi -= [xmin, ymin]
        xyi /= [dx, dy]
        xyi = np.floor(xyi, xyi).T
    
        # Now, we'll exploit a sparse coo_matrix to build the 2D histogram...
        grid = scipy.sparse.coo_matrix((weights, xyi), shape=(nx, ny)).toarray()
    
        return grid, np.linspace(xmin, xmax, nx), np.linspace(ymin, ymax, ny)
    
    if __name__ == '__main__':
        num=1e6
        numruns = 1
        x, y = generate_data(num)
        t1 = timeit.timeit('crazy_histogram2d(x, y, bins=500)',
                setup='from __main__ import crazy_histogram2d, x, y',
                number=numruns)
        t2 = timeit.timeit('np.histogram2d(x, y, bins=500)',
                setup='from __main__ import np, x, y',
                number=numruns)
        print 'Average of %i runs, using %.1e points' % (numruns, num)
        print 'Crazy histogram', t1 / numruns, 'sec'
        print 'numpy.histogram2d', t2 / numruns, 'sec'
    

    On my system, this yields:

    Average of 10 runs, using 1.0e+06 points
    Crazy histogram 0.104092288017 sec
    numpy.histogram2d 0.686891794205 sec
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For some reason, after submitting a string like this Jack’s Spindle from a text
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I have a bunch of posts stored in text files formatted in yaml/textile (from
I would like to count the length of a string with PHP. The string
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
Basically, what I'm trying to create is a page of div tags, each has
I've got a string that has curly quotes in it. I'd like to replace
I want use html5's new tag to play a wav file (currently only supported

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.