Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8841991
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T10:51:17+00:00 2026-06-14T10:51:17+00:00

I know this is a somewhat beaten topic, but I have reached the limit

  • 0

I know this is a somewhat beaten topic, but I have reached the limit of help I can get from what’s already been answered.

This is for the Rosalind project problem LREP. I’m trying to find the longest k-peated substring in a string and I’ve been provided the suffix tree, which is nice. I know that I need to annotate the suffix table with the number of descendant leaves from each node, then find nodes with >=k descendants, and finally find the deepest of those nodes. Theory-wise I’m set.

I’ve gotten a lot of help from the following resources (oops, I can only post 2):

  • Find longest repetitive sequence in a string
  • Depth-first search (Python)

I can get the paths from the root to each leaf, but I can’t figure out how to pre-process the tree in such a way that I can get the number of descendants from each node. I have a separate algorithm that works on small sequences but it’s in exponential complexity, so for larger stuff it takes way too long. I know with a DFS I should be able to perform the whole task in linear complexity. For this algorithm to work I need to be able to get the longest k-peat of an ~40,000 length string in less than 5 minutes.

Here’s some sample data (first line: sequence, second line: k, suffix table format: parent child location length):

CATACATAC$
2
1 2 1 1
1 7 2 1
1 14 3 3
1 17 10 1
2 3 2 4
2 6 10 1
3 4 6 5
3 5 10 1
7 8 3 3
7 11 5 1
8 9 6 5
8 10 10 1
11 12 6 5
11 13 10 1
14 15 6 5
14 16 10 1

The output from this should be CATAC.

With the following code (modified from LiteratePrograms) I’ve been able to get the paths, but it still takes a long time on longer sequences to parse out a path for each node.

#authors listed at
#http://en.literateprograms.org/Depth-first_search_(Python)?action=history&offset=20081013235803
class Vertex:
    def __init__(self, data):
        self.data = data
        self.successors = []

def depthFirstSearch(start, isGoal, result):
    if start in result:
        return False

    result.append(start)

    if isGoal(start):
        return True
    for v in start.successors:
        if depthFirstSearch(v, isGoal, result):
            return True

    # No path was found
    result.pop()
    return False

def lrep(seq,reps,tree):
    n = 2 * len(seq) - 1
    v = [Vertex(i) for i in xrange(n)]
    edges = [(int(x[0]),int(x[1])) for x in tree]
    for a, b in edges:
        v[a].successors.append(v[b])

    paths = {}
    for x in v:
        result = []
        paths[x.data] = []
        if depthFirstSearch(v[1], (lambda v: v.data == x.data), result):
            path = [u.data for u in result]
            paths[x.data] = path

What I’d like to do is pre-process the tree to find nodes which satisfy the descendants >= k requirement prior to finding the depth. I haven’t even gotten to how I’m going to calculate depth yet. Though I imagine I’ll have some dictionary to keeps track of the depths of each node in the path then sums.

So, my first-most-important question is: “How do I preprocess the tree with descendant leaves?”

My second-less-important question is: “After that, how can I quickly compute depth?”

P.S. I should state that this isn’t homework or anything of that sort. I’m just a biochemist trying to expand my horizons with some computational challenges.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T10:51:18+00:00Added an answer on June 14, 2026 at 10:51 am

    Nice question for an excercise in basic string-operations. I didnt remember the suffix-tree anymore 😉 But as you have stated: theory-wise, you are set.

    How do I preprocess the tree with descendant leaves?

    The wikipedia-stub onto this topic is a bit confusing. You only need to know, if you are the outermost non-leaf-node with n >= k childs. If you found the substring from the root-node to this one in the whole string, the suffix-tree tells you, that there are n possible continuitations. So there must be n places, where this string occurs.

    After that, how can I quickly compute depth?

    A simple key-concept of this and many similar problems is to do a depth-first-search: In every Node, ask the child-elements for their value and return the maximum of it to the parent. The root-node will get the final result.

    How the values are calculated differs between the problems. Here you have three possiblilitys for every node:

    1. The node have no childs. Its a leaf-node, the result is invalid.
    2. Every child returns an invalid result. Its the last non-leaf-node, the result is zero (no more characters after this node). If this node have n childs, the concated string of every edge from the root to this node appears n times in the whole string. If we need at least k nodes and k > n, the result is also invalid.
    3. One or more leafs return something valid. The result is the maximum of the returned value plus the length of the string attached the edge to it.

    Of course, you also have to return the correspondending node. Otherwise you will know, how long the longest repeated substring is but not where it is.

    Code

    You should try to code this by yourself first. Constructing the tree is simple but not trivial if you want to gather all necessary informations. Nevertheless here is a simple example. Please note: every sanity-checking is dropped out and everything will fail horribly, if the input is somehow invalid. E.g. do not try to use any other root-index than one, do not refere to nodes as a parent, which weren’t referenced as a childs before, etc. Much room for improvement *hint;)*.

    class Node(object):
        def __init__(self, idx):
            self.idx = idx     # not needed but nice for prints 
            self.parent = None # edge to parent or None
            self.childs = []   # list of edges
    
        def get_deepest(self, k = 2):
            max_value = -1
            max_node = None
            for edge in self.childs:
                r = edge.n2.get_deepest()
                if r is None: continue # leaf
                value, node = r
                value += len(edge.s)
                if value > max_value: # new best result
                    max_value = value
                    max_node = node
            if max_node is None:
                # we are either a leaf (no edge connected) or 
                # the last non-leaf.
                # The number of childs have to be k to be valid.
                return (0, self) if len(self.childs) == k else None
            else:
                return (max_value, max_node)
    
        def get_string_to_root(self):
            if self.parent is None: return "" 
            return self.parent.n1.get_string_to_root() + self.parent.s
    
    class Edge(object):
        # creating the edge also sets the correspondending
        # values in the nodes
        def __init__(self, n1, n2, s):
            #print "Edge %d -> %d [ %s]" % (n1.idx, n2.idx, s)
            self.n1, self.n2, self.s = n1, n2, s
            n1.childs.append(self)
            n2.parent = self
    
    nodes = {1 : Node(1)} # root-node
    string = sys.stdin.readline()
    k = int(sys.stdin.readline())
    for line in sys.stdin:
        parent_idx, child_idx, start, length = [int(x) for x in line.split()]
        s = string[start-1:start-1+length]
        # every edge constructs a Node
        nodes[child_idx] = Node(child_idx)
        Edge(nodes[parent_idx], nodes[child_idx], s)
    
    (depth, node) = nodes[1].get_deepest(k)
    print node.get_string_to_root()
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know this is somewhat subjective, but I can't find an honest answer anywhere.
ok i know this has been beat to death but i can't seam to
I know this question has been somewhat dealt with before, but I feel like
I know this sounds somewhat counterintuitive, but let me explain what I am trying
Know this might be rather basic, but I been trying to figure out how
I know this is possible in Perl, but I was wondering if this can
I know this is an old question, but I have spend any hours on
I know this is somewhat of a dead horse, but I'm not finding a
I know this is somewhat of a newb question but I am running into
I apologize in advance for this somewhat ignorant question, but I have researched this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.