I’m writing a program in Python that’s processing some data generated during experiments, and

Question

0

Asked: June 15, 20262026-06-15T05:54:20+00:00 2026-06-15T05:54:20+00:00

I’m writing a program in Python that’s processing some data generated during experiments, and

0

I’m writing a program in Python that’s processing some data generated during experiments, and it needs to estimate the slope of the data. I’ve written a piece of code that does this quite nicely, but it’s horribly slow (and I’m not very patient). Let me explain how this code works:

1) It grabs a small piece of data of size dx (starting with 3 datapoints)

2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is larger than a certain minimum value (40x std. dev. of noise)

3) If the difference is large enough, it will calculate the slope using OLS regression. If the difference is too small, it will increase dx and redo the loop with this new dx

4) This continues for all the datapoints

[See updated code further down]

For a datasize of about 100k measurements, this takes about 40 minutes, whereas the rest of the program (it does more processing than just this bit) takes about 10 seconds. I am certain there is a much more efficient way of doing these operations, could you guys please help me out?

Thanks

EDIT:

Ok, so I’ve got the problem solved by using only binary searches, limiting the number of allowed steps by 200. I thank everyone for their input and I selected the answer that helped me most.

FINAL UPDATED CODE:

def slope(self, data, time):
    (wave1, wave2) = wt.dwt(data, "db3")
    std = 2*np.std(wave2)
    e = std/0.05
    de = 5*std
    N = len(data)
    slopes = np.ones(shape=(N,))
    data2 = np.concatenate((-data[::-1]+2*data[0], data, -data[::-1]+2*data[N-1]))
    time2 = np.concatenate((-time[::-1]+2*time[0], time, -time[::-1]+2*time[N-1]))
    for n in xrange(N+1, 2*N):     
        left = N+1
        right = 2*N
        for i in xrange(200):
            mid = int(0.5*(left+right))
            diff = np.abs(data2[n-mid+N]-data2[n+mid-N])
            if diff >= e:
                if diff < e + de:  
                    break
                right = mid - 1
                continue
            left = mid + 1
        leftlim = n - mid + N
        rightlim = n + mid - N
        y = data2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
        x = time2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
        xavg = np.average(x)
        yavg = np.average(y)
        xlen = len(x)
        slopes[n-N] = (np.dot(x,y)-xavg*yavg*xlen)/(np.dot(x,x)-xavg*xavg*xlen)
    return np.array(slopes)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T05:54:21+00:00

Your comments suggest that you need to find a better method to estimate i_k+1 given i_k. No knowledge of values in data would yield to the naive algorithm:

At each iteration for n, leave i at previous value, and see if the abs(data[start]-data[end]) value is less than e. If it is, leave i at its previous value, and find your new one by incrementing it by 1 as you do now. If it is greater, or equal, do a binary search on i to find the appropriate value. You can possibly do a binary search forwards, but finding a good candidate upper limit without knowledge of data can prove to be difficult. This algorithm won’t perform worse than your current estimation method.

If you know that data is kind of smooth (no sudden jumps, and hence a smooth plot for all i values) and monotonically increasing, you can replace the binary search with a search backwards by decrementing its value by 1 instead.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a program in Python that’s processing some data generated during experiments, and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply