Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8989629
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T22:18:28+00:00 2026-06-15T22:18:28+00:00

This is similar to a question I asked before, but is slightly different: So

  • 0

This is similar to a question I asked before, but is slightly different:

So I have a very large structure array in matlab. Suppose, for argument’s sake, to simplify the situation, suppose I have something like:

structure(1).name, structure(2).name, structure(3).name structure(1).returns, structure(2).returns, structure(3).returns (in my real program I have 647 structures)

Suppose further that structure(i).returns is a vector (very large vector, approximately 2,000,000 entries) and that a condition comes along where I want to delete the jth entry from structure(i).returns for all i. How do you do this? or rather, how do you do this reasonably fast? I have tried some things, but they are all insanely slow (I will show them in a second) so I was wondering if the community knew of faster ways to do this.

I have parsed my data two different ways; the first way had everything saved as cell arrays, but because things hadn’t been working well for me I parsed the data again and placed everything as vectors.

What I’m actually doing is trying to delete NaN data, as well as all data in the same corresponding row of my data file, and then doing the very same thing after applying the Hampel filter. The relevant part of my code in this attempt is:

for i=numStock+1:-1:1
    for j=length(stock(i).return):-1:1
        if(isnan(stock(i).return(j)))
            for k=numStock+1:-1:1
                stock(k).return(j) = [];
            end
        end
    end
    stock(i).return = sort(stock(i).return);
    stock(i).returnLength = length(stock(i).return);
    stock(i).medianReturn = median(stock(i).return);
    stock(i).madReturn = mad(stock(i).return,1);
end;

for i=numStock:-1:1
    for j = length(stock(i+1).volume):-1:1
        if(isnan(stock(i+1).volume(j)))
            for k=numStock:-1:1
               stock(k+1).volume(j) = [];
            end
        end
    end
    stock(i+1).volume = sort(stock(i+1).volume);
    stock(i+1).volumeLength = length(stock(i+1).volume);
    stock(i+1).medianVolume = median(stock(i+1).volume);
    stock(i+1).madVolume = mad(stock(i+1).volume,1);
end;



for i=numStock+1:-1:1
    for j=stock(i).returnLength:-1:1
        if (abs(stock(i).return(j) - stock(i).medianReturn) > 3*stock(i).madReturn)
            for k=numStock+1:-1:1
                stock(k).return(j) = [];
            end
        end;
    end;
end;

for i=numStock:-1:1
    for j=stock(i+1).volumeLength:-1:1
        if (abs(stock(i+1).volume(j) - stock(i+1).medianVolume) > 3*stock(i+1).madVolume)
            for k=numStock:-1:1
                stock(k+1).volume(j) = [];
            end
        end;
    end;
end;

However, this returns an error:

“Matrix index is out of range for deletion.

Error in Failure (line 110)
stock(k).return(j) = [];”

So instead I tried by parsing everything in as vectors. Then I decided to try and delete the appropriate entries in the vectors prior to building the structure array. This isn’t returning an error, but it is very slow:

%% Delete bad data, Hampel Filter

% Delete bad entries
id=strcmp(returns,'');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];

id=strcmp(returns,'C');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];

% Convert returns from string to double
returns=cellfun(@str2double,returns);
sp500=cellfun(@str2double,sp500);

% Delete all data for which a return is not a number
nanid=isnan(returns);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];

% Delete all data for which a volume is not a number
nanid=isnan(volume);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];

% Apply the Hampel filter, and delete all data corresponding to
% observations deleted by the filter.

medianReturn = median(returns);
madReturn = mad(returns,1);

for i=length(returns):-1:1
    if (abs(returns(i) - medianReturn) > 3*madReturn)
        returns(i) = [];
        volume(i)=[];
        date(i)=[];
        ticker(i)=[];
        name(i)=[];
        permno(i)=[];
    end;
end

medianVolume = median(volume);
madVolume = mad(volume,1);

for i=length(volume):-1:1
    if (abs(volume(i) - medianVolume) > 3*madVolume)
        returns(i) = [];
        volume(i)=[];
        date(i)=[];
        ticker(i)=[];
        name(i)=[];
        permno(i)=[];
    end;
end

As I said, this is very slow, probably because I’m using a for loop on a very large data set; however, I’m not sure how else one would do this. Sorry for the gigantic post, but does anyone have a suggestion as to how I might go about doing what I’m asking in a reasonable way?

EDIT: I should add that getting the vector method to work is probably preferable, since my aim is to put all of the return vectors into a matrix and get all of the volume vectors into a matrix and perform PCA on them, and I’m not sure how I would do that using cell arrays (or even if princomp would work on cell arrays).

EDIT2: I have altered the code to match your suggestion (although I did decide to give up speed and keep with the for-loops to keep with the structure array, since reparsing this data will be way worse time-wise). The new code snipet is:

stock_return = zeros(numStock+1,length(stock(1).return));

for i=1:numStock+1
    for j=1:length(stock(i).return)
        stock_return(i,j) = stock(i).return(j);
    end
end

stock_return = stock_return(~any(isnan(stock_return)), : );

This returns an Index exceeds matrix dimensions error, and I’m not sure why. Any suggestions?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T22:18:29+00:00Added an answer on June 15, 2026 at 10:18 pm

    I could not find a convenient way to handle structures, therefore I would restructure the code so that instead of structures it uses just arrays.
    For example instead of stock(i).return(j) I would do stock_returns(i,j).

    I show you on a part of your code how to get rid of for-loops.

    Say we deal with this code:

    for j=length(stock(i).return):-1:1
        if(isnan(stock(i).return(j)))
            for k=numStock+1:-1:1
                stock(k).return(j) = [];
            end
        end
    end
    

    Now, the deletion of columns with any NaN data goes like this:

    stock_return = stock_return(:, ~any(isnan(stock_return)) );
    

    As for the absolute difference from medianVolume, you can write a similar code:

    % stock_return_length is a scalar
    % stock_median_return is a column vector (eg. [1;2;3])
    % stock_mad_return is also a column vector.
    
    median_return = repmat(stock_median_return, stock_return_length, 1);
    is_bad = abs(stock_return - median_return) > 3.* stock_mad_return;
    stock_return = stock_return(:, ~any(is_bad));
    

    Using a scalar for stock_return_length means of course that the return lengths are the same, but you implicitly assume it in your original code anyway.

    The important point in my answer is using any. Logical indexing is not sufficient in itself, since in your original code you delete all the values if any of them is bad.

    Reference to any: http://www.mathworks.co.uk/help/matlab/ref/any.html.


    If you want to preserve the original structure, so you stick to stock(i).return, you can speed-up your code using essentially the same scheme but you can only get rid of one less for-loop, meaning that your program will be substantially slower.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have asked a similar question before but this time I want to focus
I have asked a similar question before, but didn't get very good results. I've
Similar questions have been asked before but this question strives to explore more options
I have asked a similar question like this before but I am going to
I know similar questions have been asked before but i think this is slightly
This is a similar question to those that have been asked before, but still
I believe this question is slightly different than similar ones asked on here before
Similar questions to this my have been asked a lot of times before. But
I have asked a question similar to this in the past but this is
I've asked very similar question before and should have mentioned more detailed. Last time

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.