Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8581929
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T21:10:07+00:00 2026-06-11T21:10:07+00:00

I run Matlab R2011b and R version 2.13.1 on Linux Mint v12 with 16

  • 0

I run Matlab R2011b and R version 2.13.1 on Linux Mint v12 with 16 GB of RAM.

I have a csv file. The first 5 rows (and header) is:

#RIC,Date[G],Time[G],GMT Offset,Type,Price,Volume
DAEG.OQ,07-JUL-2011,15:10:03.424,-4,Trade,1.68,1008
DAEG.OQ,07-JUL-2011,15:10:03.424,-4,Trade,1.68,1008
DAEG.OQ,07-JUL-2011,15:10:03.424,-4,Trade,1.66,300
DAEG.OQ,07-JUL-2011,15:10:03.424,-4,Trade,1.65,1000
DAEG.OQ,07-JUL-2011,15:10:03.464,-4,Trade,1.65,3180

The file is large (approx 900MB). Given the combination of character and numeric data, one might read this file into matlab as follows:

fid1 = fopen('/home/MyUserName/Temp/X.csv');
D = textscan(fid1, '%s%s%s%f%s%f%f', 'Delimiter', ',', 'HeaderLines', 1);
fclose(fid1);

Although the file is 900MB, when running the above code, System Monitor indicates my RAM usage jumps from about 2GB to 10GB. Worse, if I attempt this same procedure with a slightly larger csv file (about 1.2 GB) my RAM maxes out at 16GB and Matlab never manages to finish reading in the data (it just stays stuck in “busy” mode).

If I wanted to read the same file into R, I might use:

D <- read.csv("/home/MyUserName/Temp/X.csv", stringsAsFactors=FALSE)

This takes a bit longer than Matlab, but system monitor indicates my RAM usage only jumps from 2GB to 3.3GB (much more reasonable given the original file size).

My question has two parts:

1) Why is textscan such a memory hog in this scenario?

2) Is there another approach I could use to get a 1.2GB csv file of this type into Matlab on my system without maxing out the RAM?

EDIT: Just to clarify, I’m curious as to whether there exists a matlab-only solution, ie I’m not interested in a solution that involves using a different language to break up the csv file into smaller chunks (as this is what I’m already doing). Sorry Trav1s, I should have made this clear from the start.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T21:10:08+00:00Added an answer on June 11, 2026 at 9:10 pm

    The problem is probably that those “%s” strings are being read in to Matlab cellstrs, which are a memory-inefficient data structure for low cardinality strings. Cellstrs are lousy for big tabular data like this. Each string ends up getting stored in a separate primitive char array, each with some 400 bytes of overhead and fragmentation issues. With your 900MB file, that looks like 18 million rows; 4 strings per row, and that’s about 10-20 GB of cellstrs to hold those strings. Ugh.

    What you want is to convert those strings in to compact primitive datatypes as they’re coming in, instead of getting all 18 million rows slurped in to bulky cell strings at once. The dates and timestamps as datenums or whatever numeric representation you’re using, and those low-cardinality strings either as 2-d char arrays or some equivalent of a categorical variable. (Given your data set size, you probably want those strings represented as simple numeric identifiers with a lookup table, not chars.)

    Once you’ve decided on your compact data structure, there’s a couple approaches to loading it in. You could just break the read in to chunks in pure Matlab: use textscan() calls in a loop to read in 1000 lines at a time, parse and convert the cellstrs in that chunk in to their compact forms, buffer all the results, and cat them together at the end of the read. That’ll keep the peak memory requirements lower.

    If you’re going to do a lot of work like this, and performance matters, you might want to drop down to Java and write your own parser that can convert the strings and dates as they come in, before handing them back to Matlab as more compact datatypes. It’s not hard, and the Java method can be called directly from Matlab, so this may only kind of count as using a separate language.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large text file as below imported in MATLAB : Run Lat
I have a matlab compiled .exe file, it can run well with double-click, but
I am going to run a Matlab program in a remote Linux server using
I have a matlab function written in an m-file. (it's not written by me
I have a Labview VI that I intend to run from Matlab through ActiveX.
I have a matlab function and I was able to run it from command
In MATLAB R2011b, I have a script which explicitly specifies a figure number (say,
I have a file.m, which is a file from matlab, and I want it
I would like to run some Matlab scripts. Nevertheless we don't have the Matlab
I run MATLAB on both Linux and Windows XP. My files are synced among

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.