Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7536455
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T06:34:45+00:00 2026-05-30T06:34:45+00:00

I have created a suffix array using the Princeton implementation. However, my basic text

  • 0

I have created a suffix array using the Princeton implementation. However, my basic text document is very, very large and the resulting suffix array is over 500mb in size. Is there a way to compress the suffix array?

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T06:34:46+00:00Added an answer on May 30, 2026 at 6:34 am

    Contrary to what is said in the previous answer, you can not only compress suffix arrays, but in fact compressing suffix trees is usually implemented by first emulating the tree using a suffix array, and then compressing that.

    I am not aware of any ready-to-use Java implementation of suffix array compression and the various existing algorithms are too involved to be described here in detail. There is a paper by Navarro and Mäkinen (DOI 10.1145/1216370.1216372) which provides detailed descriptions and comparisons.

    But broadly speaking, there are two general approaches:

    Approach A: Reducing the size of the array directly
    (see section 7.1 of the paper). This involves storing only some of the entries of the suffix array, and interpolating the missing ones when needed. The interpolation is carried out using a function (called ψ in the paper), which is itself stored in the form of a large array (but not as large as the original suffix array) and an indexed bit vector.

    Approach B: The FM approach (see section 9 of the paper). Here, the suffix array is basically replaced with a relatively short array C that indicates starting positions (in the suffix array) of the main lexicographic buckets (i.e. groups of suffixes starting with the same initial character), combined with another relatively large data structure Occ that enables so called backward search. Specifically, given a search pattern p=c1..cm, it makes it possible to iteratively narrow the bucket for character cm to a smaller bucket for string cm-1cm, and then further to the bucket for cm-2cm-1cm and so forth, until the final range for the complete pattern p is found. The data structure Occ that enables this is large, but compressible using various techniques, most notably wavelet trees.

    Effects on search performance
    The paper cited above contains careful analyses and comparisons. But again broadly speaking, compressing the suffix array will cause the search for a pattern of length m (which can be O(m) in an uncompressed suffix array, if carefully implemented) to be delayed by a factor that depends (usually logarithmically) on the length of the entire text. Furthermore, any approach making use of wavelet trees means an additional dependence on the size of the alphabet.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Have created a c++ implementation of the Hough transform for detecting lines in images.
I have created a custom dialog for Visual Studio Setup Project using the steps
I have created a C# class file by using a XSD-file as an input.
I have created a template for Visual Studio 2008 and it currently shows up
I have created a PHP-script to update a web server that is live inside
I have created a UserControl that has a ListView in it. The ListView is
I have created a few small flash widgets that stream .mp3 audio from an
i have created a workflow activity that do give the item creater of a
I have created a foreign key (in SQL Server) by: alter table company add
I have created a webservice in .net 2.0, C#. I need to log some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.