I’m porting a Matlab script to Python. Below is an extract:
%// Create a list of unique trade dates
DateList = unique(AllData(:,1));
%// Loop through the dates
for DateIndex = 1:size(DateList,1)
CalibrationDate = DateList(DateIndex);
%// Extract the data for a single cablibration date (but all expiries)
SubsetIndices = ismember(AllData(:,1) , DateList(DateIndex)) == 1;
SubsetAllExpiries = AllData(SubsetIndices, :);
AllData is an N-by-6 cell matrix, the first 2 columns are dates (strings) and the other 4 are numbers. In python I will be getting this data out of a csv so something like this:
import numpy as np
AllData = np.recfromcsv(open("MyCSV.csv", "rb"))
So now if I’m not mistaken AllData is a numpy array of ordinary tuples. Is this is best format to have this data in? The goal will be to extract a list of unique dates from column 1, and for each date extract the rows with that date in column 1 (column one is ordered). Then for each row in column one do some maths on the numbers and date in the remaining 5 columns.
So in matlab I can get the list of dates by unique(AllData(:,1)) and then I can get the records (rows) corresponding to that date (i.e. with that date in columns one) like this:
SubsetIndices = ismember(AllData(:,1) , MyDate) == 1;
SubsetAllExpiries = AllData(SubsetIndices, :);
How can I best achieve the same results in Python?
To put things in context,
np.recfromcsvis just a modified version ofnp.genfromtxtwhich outputs record arrays instead of structured arrays.A structured array lets you access the individual fields (here, your columns) by their names, like in
my_array["field_one"]while a record array gives you the same plus the possibility to access the fields as attributes, like inmy_array.field_one. I’m not fond of “access-as-attributes”, so I usually stick to structured arrays.For your information, structurede/record arrays are not arrays of tuples, but arrays of some numpy object call a
np.void: it’s a block of memory composed of as many sub-blocks you have of fields, the size of each sub-block depending on its datatype.That said, yes, what you seem to have in mind is exactly the kind of usage for a structured array. The approach would then be:
datesarray and filter them to find the unique elements.matching;matchingto access the corresponding records (eg, rows of your array) using fancy indexing, asmy_array[matching].Note that you can keep your dates as strings or transform them into
datetimeobjects using a user-defined converter, as described in the documentation. For example, your could transform aYYYY-MM-DDinto adatetimeobject with alambda s:datetime.dateime.strptime(s,"%Y-%m-%d"). That way, instead of having, say, aNarray where each row (a record) consists of two dates as strings and 4 floats, you would have aNarray where each row consists of twodatetimeobjects and 4 floats.Note the shape of your array (via
my_array.shape), it says(N,), meaning it’s a 1D array, even if it looks like a 2D table with multiple columns. You can access individual fields (each “column”) by using its name. For example, if we create an array consisting of one string field calledfirstand oneintfield calledsecond, like that:you could access the
firstcolumn with