Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3662048
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T01:22:22+00:00 2026-05-19T01:22:22+00:00

I have a complex regex, and I’d like to match it with the contents

  • 0

I have a complex regex, and I’d like to match it with the contents of an entire huge file. The main concern is efficiency, since the file is indeed very big and running out of memory is a distinct possibility.

Is there a way I can somehow “buffer” the contents while pumping it through a regex matcher?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T01:22:22+00:00Added an answer on May 19, 2026 at 1:22 am

    Yes, Pattern.match() will take a CharSequence.

    If your input is already in a charset which uses exactly 2 bytes to represent a character without any ‘prologue’, you need only:

    ByteBuffer bb = ...; // acquire memory mapped byte buffer
    CharBuffer cb = bb.asCharBuffer();  // get a char[] 'view' of the bytes
    

    … and since CharBuffer implements CharSequence, you’re done.

    On the other hand, if you need to decode the bytes into some other charset, you’ll have your work cut out, since CharBuffer is charset-agnostic, and CharsetDecorder.decode(ByteBuffer) internally allocates a new CharBuffer roughly the same size as the input bytes.

    Whether or not you’ll be able to get away with a smaller buffer depends a fair bit on your regex and what you want to do with the match results. But the basic approach would be to implement CharSequence and wrap the memory-mapped ByteBuffer, a smaller CharBuffer for ‘working space’, and a CharsetDecoder. You’ll use Charset.decode(ByteBuffer,CharBuffer,boolean) to decode the bytes ‘on demand’, and hope that the general direction of the regex matcher is ‘forward’, and that the input you’re interested in comes in fairly small chunks.

    As a rough start:

    class MyCharSequence implements CharSequence {
    
        public MyCharSequence(File file, Charset cs, int bufferSize) throws IOException {
    
            FileInputStream input = new FileInputStream(file);
            FileChannel channel = input.getChannel();
            this.fileLength = (int) channel.size();
            this.bytes = channel.map(FileChannel.MapMode.READ_ONLY, 0, fileLength);
            this.charBuffer = CharBuffer.allocate(bufferSize);
            this.decoder = cs.newDecoder();
    
        }
    
        public int length() {
            // ouch! have to decode the lot, even if you don't choose to keep it all handy
        }
    
        public char charAt(final int index) {
            while ( /* not yet decoded target char[] */ )  {
                this.decoder.decode(this.bytes, this.charBuffer, true);
            }
            // don't assume 2-bytes == a char unless that's true for your charset!
        }
    
        public CharSequence subSequence(final int start, final int end) {
            // this'll be fun, too
        }
    
        private long fileLength;
        private MappedByteBuffer bytes;
        private CharBuffer charBuffer;
        private CharsetDecoder decoder;
    
    }
    

    It might be instructive to wrap a fully-decoded CharBuffer in a much simpler CharSequence wrapper of your own, and log how the methods are actually called for your given input, when you run it with a big heap on your development box. That will give you an idea if this approach is going to work for your particular scenario.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a complex regex, which contains constructs like this one: ( + sectionPattern
I have a set of complex file numbers and need to use Regex to
I have several complex data structures like Map< A, Set< B > > Set<
I have 4 complex regex patterns, A , B , C and D .
I've build a complex (for me) regex to parse some file names, and it
A complex regex to me atleast. This is a string I have: /wp-content/themes/modern2/timthumb.php?src=http://www.cnn.com/storyimages/4C59D569-7749-F32B.jpg&h=442&w=642&zc=1&s=2 what
I have a complex regex I want to apply. Here is my pattern: /(?:^|\s|[\.(\+\-\,])(?:\$?)\$((?:[0-9]+(?=[a-z])|(?![0-9\.\:\_\-]))(?:[a-z0-9]|[\_\.\-\:](?![\.\_\.\-\:]))*[a-z0-9]+)/i
I have some string, they looks like: div#title.title.top #main.main a.bold#empty.red They are similar to
Does someone have a regex to match unclosed HTML tags? For example, the regex
I have a complex Regex which is used to help strip out HTML from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.