Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8676973
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T20:20:00+00:00 2026-06-12T20:20:00+00:00

I have performed some simple z-transforms on some variables in a pandas DataFrame. Of

  • 0

I have performed some simple z-transforms on some variables in a pandas DataFrame. Of total 216 columns in the dataframe, I transformed 196 of them and then concatenated the 197 onto the original 216 for a total of 412 total columns.

Then I used the to_csv function to write the new dataframe to a CSV file. The original data is about 300MB, while the new dataset is 1.2GB. It seems odd that adding less than double of the columns leads to around 4x increase in size for the final file.

The code is:

import pandas as pd


full_data = pd.read_csv('data.csv')

names = full_data.columns.tolist()
names = names[16:-2]
len(names) #197 as expected
transform = (full_data[names] - full_data[names].mean())/full_data[names].std() #Transform has 197 columns as expected. 

column_names = transform.columns.tolist()

new_names = {}
for name in column_names:
    new_names[name] = name + '_standardized'

transform = transform.rename(columns=new_names)


to_concat = [full_data, transform]

final_data = pd.concat(to_concat, axis=1)

final_data.to_csv('transformed_data.csv', index = False)

Everything looks fine with the first row of the data. Also, the number of rows are the same between all three of the DataFrames.

Am I missing something? Is there a more efficient way to write DataFrames to CSV files?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T20:20:02+00:00Added an answer on June 12, 2026 at 8:20 pm

    The CSV stores string representations of data, so it’s not necessarily going to scale in an obvious way with the number of columns unless all columns have roughly the same size in string representation. It’s quite plausible that your CSV could increase a lot in size if your original data had only a few decimal places. If you read in numbers like 0.1, 0.2, 3, 1.7, whatever, and then z-scale them, you’re likely to get results with many decimal places. As a simple example, I did this:

    >>> df = pandas.DataFrame([[2, 3, 5]], columns=["A", "B", "C"])
    >>> df
       A  B  C
    0  2  3  5
    >>> df.to_csv('someCSV.csv')
    >>> df**0.5
              A         B         C
    0  1.414214  1.732051  2.236068
    >>> (df**0.5).to_csv('someCSV2.csv')
    

    I didn’t add any rows or columns to the data at all, just took the square root, but the second CSV is 4 times the size of the first, because the second one has lots of decimal places that take more bytes to write out in string form. You’re likely to get similarly long decimals when you divide by the standard deviation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some simple JQuery / Javascript to perform some simple logic for all
A seemingly simple issue, I have an off-screen bitmap that I perform some transformations
I have an application in which some operations are performed by MDB. These MDB
I have to perform some 2D transforms under iOS. I know that you can
I have a simple table of items. Call them Parts. Each part can have
I have a program that performs some network IO that compiles a 32 bit
I have a method myButtonAction which performs some heavy calculations, which I need to
I have a class that I need to perform some actions on but I
I have a method invoking bean which calls a method to perform some sort
I have a table on which I want to perform some operations every hour.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.