Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6208047
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T05:44:26+00:00 2026-05-24T05:44:26+00:00

Is it possible to write a Hadoop-ready reduce function that can find the longest

  • 0

Is it possible to write a Hadoop-ready reduce function that can find the longest run of 1s (only the length of the run)?

I’m thinking of something that can run on Python’s functools.reduce. But I would eventually like to run on a Hadoop cluster (by “Hadoop-ready” I mean that the reduction steps can run in any arbitrary order).

The motivation is the search for tandem repeats in biological sequences as discussed here http://biostar.stackexchange.com/questions/10582/counting-repeat-sequence – finding the longest run of repeats. Sequentially this problem is trivial. But processing can this be done on big data? Trying to frame it as a map-reduce problem: The map function would map all words of interest (e.g. all occurrences of TGATCT) to 1s and everything else to 0s. The reducer function just needs to find the longest run of 1s.

I tried one approach that seemed doable but found a case where it fails.

The following is a skeleton code with tests.

#!/usr/bin/env python

def count_tandem_repeats_reducer(left, right):
  # ...

def reduce(func, array):
  # Just like functools.reduce but apply func at random positions
  # func takes 2 adjacent elements of the array and returns 1 element
  # the 2 elements are reduced into 1 until the array is of size 1


def count_tandem_repeats(seq):
  if not seq: return 0
  if len(seq) == 1: return seq[0]
  return reduce(count_tandem_repeats_reducer, m)

# Testing
assert count_tandem_repeats([]) == 0
assert count_tandem_repeats([0,0,0]) == 0
assert count_tandem_repeats([1,1]) == 2
assert count_tandem_repeats([1,0,0,0,1,1,1,1,0,0,1]) == 4
assert count_tandem_repeats([0,0,0,1,1,1,0,0]) == 3
assert count_tandem_repeats([0,1,0,1,1,0,1,1,1,0,1,1,1,1,0] == 4
assert count_tandem_repeats([0,1,0,1,1,0,1,1,1,0,1,1,1,1,0][::-1]) == 4
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T05:44:27+00:00Added an answer on May 24, 2026 at 5:44 am

    This doesnt seem like a perfect fit for a set of parallel reducers. One alternative is to implement it as a separate map-reduce task which will run after your original algo (the one that converts ur sequence to 1s and 0s).

    You then implement a Custom Input format and record reader that splits your input stream into some arbitrary number of segments and ALSO ensure the split happens only at a 1 -> 0 transition. Then in the mapper (If you were implementing the solution in Java, you would have a mapper class) you can maintain a count of the longest number of 1. Each mapper will output the longest run of 1s in its input split.. The reducer will then just return the max() of the output of all the mappers.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is it possible to write code in a Flex application that will only be
Is it possible to write a C function that does the following? Allocate a
Is it possible to write a doctest unit test that will check that an
Is it possible to write a regular expression that matches a nested pattern that
Would it be possible to write a class that is virtually indistinguishable from an
It's possible to write Markdown content with invalid syntax. Invalid means that the BlueCloth
Is it possible to write a template that changes behavior depending on if a
Is it possible to write to the windows logs in python?
Is it possible to write a user interface in Java for an application written
Is it possible to write a PL/SQL query to identify a complete list of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.