I’m trying to “group by column” datas from a matrix.
The data are extracted from a database, and the matrix looks like that :
'2012-04-26' 'USD' 'BRL' [ 1.8894]
'2012-04-26' 'USD' 'IDR' [ 9185]
'2012-04-26' 'USD' 'INR' [ 52.5350]
'2012-04-26' 'USD' 'MXN' [ 13.2337]
'2012-04-26' 'USD' 'PEN' [ 2.6505]
'2012-04-26' 'USD' 'SGD' [ 1.2412]
'2012-04-26' 'USD' 'TRY' [ 1.7643]
'2012-04-27' 'USD' 'BRL' [ 1.8846]
'2012-04-27' 'USD' 'IDR' [ 9189]
'2012-04-27' 'USD' 'INR' [ 52.5600]
'2012-04-27' 'USD' 'MXN' [ 13.0147]
'2012-04-27' 'USD' 'PEN' [ 2.6395]
'2012-04-27' 'USD' 'SGD' [ 1.2385]
'2012-04-27' 'USD' 'TRY' [ 1.7600]
(this is a cell-array)
What I want to do is to group all datas by date (1st row) and then have one column for each value, like this :
'2012-04-26' [ 1.8894] [ 9185] [ 52.5350] [ 13.2337] [ 2.6505] [ 1.2412] [ 1.7643]
'2012-04-27' [ 1.8846] [ 9189] [ 52.5600] [ 13.0147] [ 2.6395] [ 1.2385] [ 1.7600]
where each row represents a currency pair (USD/BRL, USD/IDR, USD/INR, …)
Note that, for each date, they are exacltly the same number of lines (currency pairs) in the extracted data.
Is there an elegant (and fast) way to achieve this in Matlab ?
Thanks,
Given that you emphasize that speed is important in the question, I propose the following solution:
In the first line, I extract the data that is important into a numeric array. Numeric arrays are MUCH faster to operate on than cell arrays as a single element takes up much less memory. To deal with the date strings, I convert them in this first step into matlab numerical date format. If you plan on using Matlab much, I suggest you get familiar with the numerical date format as it is much more flexible than working with strings – for example, you can perform any kind of arithmetic you desire to the numerical date format.
In the second line, I get a unique list of dates, and an index.
In the third and fourth line I use the index to obtain the number of days for which you have data, and the number of observations for each day. CAUTION: The line
NumObsPerDay = sum(Index == 1);implicitly assumes that you have the same number of observations (ie other currencies) for each day. However, you state in the question that this is the case, so I’m taking you at your word 🙂In the fifth line, I create a numerical matrix that has the format you desire. The first column is the unique date vector obtained in line 2, and I’ve obtained the remaining columns by reshaping the data in
X. CAUTION: This line implicitly assumes that the ordering of the currencies in your cell array are identical for each day. Again, I’ve made this assumption because it is true in your sample data and you stated you wanted a fast solution.FINAL CAUTION: If either of the assumptions made above are violated then this code will fail, or your data will get mixed up. In other words, if you’re certain that all your data conforms to the sample you provided, then this solution should serve, and should also be fast. But if you’re not certain, then this is not a good solution for you.
ps if you want to see the dates in string format again, just use
datestr(Soln(:, 1), 'yyyy-mm-dd');