I’m using the excellent read_csv() function from pandas, which gives: In [31]: data =

Question

0

Asked: June 18, 20262026-06-18T16:18:24+00:00 2026-06-18T16:18:24+00:00

I’m using the excellent read_csv() function from pandas, which gives: In [31]: data =

0

I’m using the excellent read_csv()function from pandas, which gives:

In [31]: data = pandas.read_csv("lala.csv", delimiter=",")

In [32]: data
Out[32]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12083 entries, 0 to 12082
Columns: 569 entries, REGIONC to SCALEKER
dtypes: float64(51), int64(518)

but when i apply a function from scikit-learn i loose the informations about columns:

from sklearn import preprocessing
preprocessing.scale(data)

gives numpy array.

Is there a way to apply scikit or numpy function to DataFrames without loosing the information?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T16:18:25+00:00

A (slightly naive) way would be to store the structure of your data frame, i.e. its columns and index, separately, and then create a new data frame from your preprocessed results like so:

In [15]: data = np.zeros((2,2))

In [16]: data
Out[16]: 
array([[ 0.,  0.],
       [ 0.,  0.]])

In [17]: from pandas import DataFrame

In [21]: df  = DataFrame(data, index = ['first', 'second'], columns=['c1','c2'])

In [22]: df
Out[22]: 
        c1  c2
first    0   0
second   0   0

In [26]: i = df.index

In [27]: c = df.columns

# generate new data as a numpy array    
In [29]: df  = DataFrame(np.random.rand(2,2), index=i, columns=c)

In [30]: df
Out[30]: 
              c1        c2
first   0.821354  0.936703
second  0.138376  0.482180

As you can see in Out[22], we start off with a data frame, and then in In[29] we place some new data inside the frame, leaving the rows and columns unchanged. I am assuming your preprocessing will not shuffle the rows/ columns of the data.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the excellent read_csv() function from pandas, which gives: In [31]: data =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply