Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6925779
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T10:50:00+00:00 2026-05-27T10:50:00+00:00

I originally asked a related question here , but didn’t really seem to get

  • 0

I originally asked a related question here, but didn’t really seem to get anywhere. Perhaps if I rephrase part of it more specifically it might help….

I have files stored using Matlab’s sparse format (HDF5, csc I
believe), and I’m trying to use Pytables to operate on them directly,
but haven’t succeeded yet. Using h5py I can do the following:

# Method 1: uses h5py (WORKS)
f1 = h5py.File(fname)
data = f1['M']['data']
ir = f1['M']['ir']
jc = f1['M']['jc']
M = scipy.sparse.csc_matrix((data, ir, jc))

but if I try to do the equivalent in Pytables:

# Method 2: uses pyTables (DOESN'T WORK)
f2 = tables.openFile(fname)
data = f2.root.M.data
ir = f2.root.M.ir
jc = f2.root.M.jc
M = scipy.sparse.csc_matrix( (data,ir,jc) )

this fails (after a long wait) with the error:

TypeError                                 Traceback (most recent call last)

/home/tdiethe/BMJ/<ipython console> in <module>()

/usr/lib/python2.6/dist-packages/scipy/sparse/compressed.pyc in __init__(self, arg1, shape, dtype, copy, dims, nzmax)
    56                     self.indices = np.array(indices, copy=copy)
    57                     self.indptr  = np.array(indptr, copy=copy)
---> 58                     self.data    = np.array(data, copy=copy, dtype=getdtype(dtype, data))
    59                 else:
    60                     raise ValueError, "unrecognized %s_matrix constructor usage" %\

/usr/lib/python2.6/dist-packages/scipy/sparse/sputils.pyc in getdtype(dtype, a, default)
    69                 canCast = False
    70             else:
---> 71                 raise TypeError, "could not interpret data type"
    72     else:
    73         newdtype = np.dtype(dtype)

TypeError: could not interpret data type

Looking at f2:

In [63]: f2.root.M.data
Out[63]: 
/M/data (CArray(4753606,), zlib(3)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8181,)

In [64]: f2.root.M.ir
Out[64]: 
/M/ir (CArray(4753606,), zlib(3)) ''
  atom := UInt64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8181,)

In [65]: f2.root.M.jc
Out[65]: 
/M/jc (CArray(133339,), zlib(3)) ''
  atom := UInt64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (7843,)

I have two questions:

  • how do I load this file in using pytables
  • do I need to perform the conversion to a scipy sparse matrix in order to be able to perform operations on it, or can I perform operations directly on the disk files (matrix multiplication etc …) – i.e. without loading the file into memory (if not what’s the point of using pytables?)?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T10:50:01+00:00Added an answer on May 27, 2026 at 10:50 am

    I missed seeing this in your original post but I think your issue is in the design of PyTables, which provides an extra level of abstraction on top of the underlying data.

    Consider the following:

    >>> import tables
    >>> import numpy as np
    
    >>> h5_file = tables.openFile(fname)
    >>> data = f2.root.M.data
    

    At this point data is not a numpy array:

    >>> type(data)
    tables.array.Array
    
    >>> isinstance(data, np.ndarray)
    False
    

    The tables.array.Array does immediately load the underlying array, or immediately expose array-like functionality. This is what led to the error when you tried to use these types of objects to create a sparse array in scipy.

    Instead the data object produced by PyTables is intended to provide access to the data through additional commands (i.e. you did that by using fancy indexing [...]). In this approach you can access parts of the data or all of it by doing data[:] or data.read(). It is only at this point that the familiar numpy array is produced.

    For more information on the tables.array.Array class see http://pytables.github.com/usersguide/libref.html#the-array-class or the Getting actual data section at http://www.pytables.org/moin/HowToUse for examples of accessing underlying data.

    In comparison pyh5 produces much more array-like objects, although still not numpy arrays. Consider:

    >>> import pyh5
    >>> f1 = h5py.File(fname)
    >>> data = f1['M']['data']
    >>> type(data)
    h5py._hl.dataset.Dataset
    >>> isinstance(data, np.ndarray)
    >>> False
    

    However, you can immediately do numpy operations on data like you call to scipy, or more simple operations like np.cos(data) or data + np.arange(len(data)). It also appears that data object has some familiar numpy like attributes (i.e. shape) and that the underlying data (a numpy.ndarray) is stored at data.value. However, I’m not familiar with pyh5 as I haven’t used it myself, so I’m not sure what the limitations are in this regard.

    In general it appears that PyTables and pyh5 have different design goals and therefore should be used in different ways. pyh5 offers a more Numpy-like interface to HDF files, while PyTables offers a lot more complex database like operations. See the discussion of the differences at the pyh5, PyTables docs and the Enthought mailing list:

    • http://code.google.com/p/h5py/wiki/FAQ#What's_the_difference_between_h5py_and_PyTables_?
    • http://www.pytables.org/moin/FAQ#HowdoesPyTablescomparewiththeh5pyproject.3F
    • https://mail.enthought.com/pipermail/enthought-dev/2010-November/027506.html
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

another memory management question: I have asked this before, but did not really get
I originally asked this question on RefactorMyCode , but got no responses there... Basically
I originally asked this question , but in finding an answer, discovered that my
This question was originally asked in askubuntu.com but got no attention so I think
I originally asked this question on cstheory.stackexchange.com but was suggested to move it to
I asked this question originally on SuperUser , but it looks like I'm much
I originally asked this question on Super User but was told that it might
I originally asked this question on the Highcharts forum a few days ago but
This is an offshoot question that's related to another I asked here . I'm
I originally asked a question along these lines using Regex but was recommended to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.