I’m writing a function that requires an input vector of dates which I specify as Julian dates, and a range of values (as vectors). Within the function I use a pre-defined window size to remove any nans in the data. For example:
t = transpose(1/24:1/24:40);
data1 = 1+(30-1).*rand(length(t),1);
Randm = floor(1+(length(t)-1).*rand(120,1));
data1(Randm) = nan;
figure(1);
plot(data1,'linewidth',3);
hold on;
dailyData = reshape(data1,40,[]);
nanMap = isnan(dailyData);
validValuesPerDay = sum(~nanMap, 2);
nonNanData = dailyData;
nonNanData(nanMap) = 0;
sumPerDay = sum(nonNanData, 2);
dailyMeans = sumPerDay ./ validValuesPerDay;
dailyMeans = repmat(dailyMeans, [1 24]);
repairedData = dailyData;
repairedData(nanMap) = dailyMeans(nanMap);
data1 = reshape(repairedData,[],1);
plot(data1,'--r');
The problem that I’m faced with now is how to cope with data that is of different resolution to hourly e.g. daily, or weekly as this will effect the code when I use reshape. Does anyone have some suggestions on how to deal with this? I was thinking of doing something along the lines of specifying the window size (by this I mean when using reshape) as a fraction of the length of the data.
Generally, you’d have to write separate routines for all different functionalities you want. For instance, what do you want do to if you get daily data? Average over weeks? Fortnights? Months? Years?
Therefore, the most robust option would be (supposing you’ll always get linearly gridded times):
Note that the
reshapeis probably the only thing you need to do in the switch; the rest of the code will stay the same. You might want to re-think the variable names though…