I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.
This will allow me to then analyze all of the data points within a specific range and to treat them as “similar situations to the input.”
From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called “probability integral transform.”
Can anyone assist me with some code (Matlab preferred, but it doesn’t really matter) to help me accomplish this?
Here’s something I put together quickly. It’s not polished and not perfect, but it does what you want to do.
and the function
getIntervalisExplanation:
The CDF of the random distribution is shown below by the line in blue. You provide a point (here
5in the input togetInterval) about which you want a range that gives you 10% of the area (input0.1togetInterval). The chosen point is marked by the red cross and theinterval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as
You’ll find that on an average, the number of points in this example is ~2000, which is 10% of
numel(randList)NOTE:
yCdfRangefalls outside[0 1], in which caseinterp1will return aNaN. This is fairly straightforward to implement, and I’ll leave that to you.ksdensityis very CPU intensive. I wouldn’t recommend increasingnpointsto more than1e4. I assume you’re only working with a fixed list (i.e., you have a list of5e5points that you’ve obtained somehow and now you’re just running tests/analyzing it). In that case, you can runksdensityonce and save the result.