I’ve recently written some Scala code which processes a String, finding all its sub-strings

Question

0

Asked: June 7, 20262026-06-07T15:38:08+00:00 2026-06-07T15:38:08+00:00

I’ve recently written some Scala code which processes a String, finding all its sub-strings

0

I’ve recently written some Scala code which processes a String, finding all its sub-strings and retaining a list of those which are found in a dictionary. The start and end of the sub-strings within the overall string also have to be retained for later use, so the easiest way to do this seemed to be just to use nested for loops, something like this:

for (i <- 0 until word.length)
  for (j <- i until word.length) {
    val sub = word.substring(i, j + 1)
    // lookup sub in dictionary here and add new match if found
  }

As an exercise, I decided to have a go at doing the same thing in Haskell. It seems straightforward enough without the need for the sub-string indices – I can use something like this approach to get the sub-strings, then call a recursive function to accumulate the matches. But if I want the indices too it seems trickier.

How would I write a function which returns a list containing each continuous sub-string along with its start and end index within the “parent” string?

For example tokens "blah" would give [("b",0,0), ("bl",0,1), ("bla",0,2), ...]

Update

A great selection of answers and plenty of new things to explore. After messing about a bit, I’ve gone for the first answer, with Daniel’s suggestion to allow the use of [0..].

data Token = Token String Int Int 

continuousSubSeqs = filter (not . null) . concatMap tails . inits

tokenize xs = map (\(s, l) -> Token s (head l) (last l)) $ zip s ind
    where s = continuousSubSeqs xs
          ind = continuousSubSeqs [0..]

This seemed relatively easy to understand, given my limited Haskell knowledge.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T15:38:11+00:00

Editorial Team

2026-06-07T15:38:11+00:00Added an answer on June 7, 2026 at 3:38 pm

import Data.List

continuousSubSeqs = filter (not . null) . concatMap inits . tails

tokens xs = map (\(s, l) -> (s, head l, last l)) $ zip s ind
    where s   = continuousSubSeqs xs
          ind = continuousSubSeqs [0..length(xs)-1]

Works like this:

tokens "blah"
[("b",0,0),("bl",0,1),("bla",0,2),("blah",0,3),("l",1,1),("la",1,2),("lah",1,3),("a",2,2),("ah",2,3),("h",3,3)]

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve recently written some Scala code which processes a String, finding all its sub-strings

Update

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply