I’m trying to extract some DNA info from a file. Before the DNA data

Question

0

Asked: May 27, 20262026-05-27T09:46:06+00:00 2026-05-27T09:46:06+00:00

I’m trying to extract some DNA info from a file. Before the DNA data

0

I’m trying to extract some DNA info from a file.
Before the DNA data consisting of bases GCAT there is the word ORIGIN, and after there is a //. How do I write a regular expression to get these bases between these markers?

I have tried the following but it doesn’t work.

[ORIGIN(GCATgcat)////]

Sample data:

ORIGIN      
  1 acagatgaag acagatgaag acagatgaag acagatgaag
  2 acagatgaag acagatgaag acagatgaag acagatgaag
//

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T09:46:07+00:00

Try this pattern “\\b([GCATgcat]+)\\b” which matches any GCAT character sequence (upper or lowercase) surrounded by a word boundary (so it wouldn’t match those characters embedded in other strings, like the word “catalog”). If you repeatedly scan for this regex in your sample file you will extract each sequence.

Here’s a working example for your sample file:

// Locate the substring between "ORIGIN" and "//" in the file.
String fileContents = getSampleFileContents();
int indexOfOrigin = fileContents.indexOf("ORIGIN");
String pertinentSection = fileContents.substring(
    indexOfOrigin, fileContents.indexOf("//", indexOfOrigin));

// Search for sequences within the pertinent substring.
Pattern p = Pattern.compile("\\b([GCATgcat]+)\\b");
Matcher m = p.matcher(pertinentSection);
List<String> sequences = new ArrayList<String>();
while (m.find()) {
  sequences.add(m.group(1));
}
sequences.toString(); // => ["acagatgaag", "acagatgaag", ..., "acagatgaag"]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to extract some DNA info from a file. Before the DNA data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply