This question is about filtering a NumPy ndarray according to some column values. I

Question

0

Asked: June 10, 20262026-06-10T10:34:34+00:00 2026-06-10T10:34:34+00:00

This question is about filtering a NumPy ndarray according to some column values. I

0

This question is about filtering a NumPy ndarray according to some column values.

I have a fairly large NumPy ndarray (300000, 50) and I am filtering it according to values in some specific columns. I have ndtypes so I can access each column by name.

The first column is named category_code and I need to filter the matrix to return only rows where category_code is in ("A", "B", "C").

The result would need to be another NumPy ndarray whose columns are still accessible by the dtype names.

Here is what I do now:

index = numpy.asarray([row['category_code'] in ('A', 'B', 'C') for row in data])
filtered_data = data[index]

List comprehension like:

list = [row for row in data if row['category_code'] in ('A', 'B', 'C')]
filtered_data = numpy.asarray(list)

wouldn’t work because the dtypes I originally had are no longer accessible.

Are there any better / more Pythonic way of achieving the same result?

Something that could look like:

filtered_data = data.where({'category_code': ('A', 'B','C'})

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T10:34:36+00:00

You can use the NumPy-based library, Pandas, which has a more generally useful implementation of ndarrays:

>>> # import the library
>>> import pandas as PD

Create some sample data as python dictionary, whose keys are the column names and whose values are the column values as a python list; one key/value pair per column

>>> data = {'category_code': ['D', 'A', 'B', 'C', 'D', 'A', 'C', 'A'], 
            'value':[4, 2, 6, 3, 8, 4, 3, 9]}

>>> # convert to a Pandas 'DataFrame'
>>> D = PD.DataFrame(data)

To return just the rows in which the category_code is either B or C, two steps conceptually, but can easily be done in a single line:

>>> # step 1: create the index 
>>> idx = (D.category_code== 'B') | (D.category_code == 'C')

>>> # then filter the data against that index:
>>> D.ix[idx]

        category_code  value
   2             B      6
   3             C      3
   6             C      3

Note the difference between indexing in Pandas versus NumPy, the library upon which Pandas is built. In NumPy, you would just place the index inside the brackets, indicating which dimension you are indexing with a “,”, and using “:” to indicate that you want all of the values (columns) in the other dimension:

>>>  D[idx,:]

In Pandas, you call the the data frame’s ix method, and place only the index inside the brackets:

>>> D.loc[idx]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This question is about filtering a NumPy ndarray according to some column values. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply