I have a function calc_dG that, for any array corresponding to a short DNA

Question

0

Asked: May 26, 20262026-05-26T05:01:44+00:00 2026-05-26T05:01:44+00:00

I have a function calc_dG that, for any array corresponding to a short DNA

0

I have a function calc_dG that, for any array corresponding to a short DNA sequence (3 to 15 bases or so), gives me the binding energy of that sequence. Actually, it’s just an array lookup. nndG is an array of binding energies for adjacent pairs of bases, and thus the binding energies can be calculated with nndG[4*S[:-1]+S[1:]] when using an a,g,c,t -> 0,1,2,3 way of denoting sequences numerically: this means that arrays of many sequences can be calculated at once very quickly in numpy.

I need to find, for a length L, every sequence that both fits some template and results in a binding energy value in a certain range.

This is very easy to do with iterators: just iterate through every possible array input, calculate the binding energy, and then record the arrays that are in the range. This, however, is far too slow when implemented in Python (for length 15 with 4 possible values for each element there are 4**15 possible arrays, etc etc). I could use Weave or some other method of implementing it in C, but I’d prefer to find an array-based solution that is simple and fast.

For example, if every element has the same possible values (eg, [0,1,2,3]), then generating an array of every possible length L 1D array with those values can be done with lambda x: indices(repeat([4],L)).reshape((L,-1)).transpose(); then I can just do calc_dG( result ), and use result[results that are in the desired range] to get the arrays that I want as a final result. This is much faster than using Python iterators, and likely almost as fast, if not faster, than using C iterators. Unfortunately, it doesn’t work for arbitrary templates, and for longer sequences, will run out of memory, as it has to store every possible array in memory before calculating values.

Is there some way to do all of this without resorting to C?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T05:01:44+00:00

If I understand your problem correctly, you are maximizing a function f(i_1, i_2, ..., i_n) over integers in the set {0, 1, 2, 3}.

You can use a combination of iteration and vectorized indexing.

import numpy as np
import itertools

def cartesian_chunked(n, n_items=4, chunk_dim=3):
    if n > chunk_dim:
        p = n - chunk_dim
        q = chunk_dim
        outer = itertools.product(*([range(n_items)] * (n - chunk_dim)))
    else:
        p = 0
        q = n
        def outer_iter():
            yield ()
        outer = outer_iter()

    chunk = np.zeros([n_items**q, n], dtype=int)
    chunk[:,p:] = np.indices(np.repeat([n_items], q)).reshape(q, -1).T
    for seq in outer:
        chunk[:,:p] = seq
        yield chunk

def compute_energy(indices):
    base_energies = np.array([-1, 4, 8, 2.4])
    return (base_energies[indices]).sum(axis=1) 

max_energy = 0
max_config = None

# try out 4**10 ~ 1e6 combinations, in chunks of 4**8
for chunk in cartesian_chunked(n=10, n_items=4, chunk_dim=8):
    energies = compute_energy(chunk)
    j = np.argmax(energies)
    if energies[j] > max_energy:
        max_energy = energies[j]
        max_config = chunk[j].copy() # copy! the chunk is modified

print max_energy
print max_config

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a function calc_dG that, for any array corresponding to a short DNA

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply