Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7866527
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T00:25:33+00:00 2026-06-03T00:25:33+00:00

Correct me if I’m approaching this wrong, but I have a queue server and

  • 0

Correct me if I’m approaching this wrong, but I have a queue server and a bunch of java workers that I’m running on in a cluster. My queue has work units that are very small but there are many of them. So far my benchmarks and review of the workers has shown that I get about 200mb/second.

So I’m trying to figure out how to get more work units via my bandwidth. Currently my CPU usage is not very high(40-50%) because it can process the data faster than the network can send it. I want to get more work through the queue and am willing to pay for it via expensive compression/decompression(since half of each core is idle right now).

I have tried java LZO and gzip, but was wondering if there was anything better(even if its more cpu expensive)?

Updated: data is a byte[]. Basically the queue only takes it in that format so I am using ByteArrayOutputStream to write two ints and a int[] to to a byte[] format. The values in int[] are all ints between 0 to 100(or 1000 but the vast majority of the numbers are zeros). The lists are quite large anywhere from 1000 to 10,000 items(again, majority zeros..never more than 100 non-zero numbers in the int[])

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T00:25:35+00:00Added an answer on June 3, 2026 at 12:25 am

    It sounds like using a custom compression mechanism that exploits the structure of the data could be very efficient.

    Firstly, using a short[] (16 bit data type) instead of an int[] will halve (!) the amount of data sent, you can do this because the numbers are easily between -2^15 (-32768) and 2^15-1 (32767). This is ridiculously easy to implement.

    Secondly, you could use a scheme similar to run-length encoding: a positive number represents that number literally, while a negative number represents that many zeros (after taking absolute values). e.g.

    [10, 40, 0, 0, 0, 30, 0, 100, 0, 0, 0, 0] <=> [10, 40, -3, 30, -1, 100, -4]
    

    This is harder to implement that just substituting short for int, but will provide ~80% compression in the very worst case (1000 numbers, 100 non-zero, none of which are consecutive).

    I just did some simulations to work out the compression ratios. I tested the method I described above, and the one suggested by Louis Wasserman and sbridges. Both performed very well.

    Assuming the length of the array and the number of non-zero numbers are both uniformly between their bounds, both methods save about 5400 ints (or shorts) on average with a compressed size of about 2.5% the original! The run-length encoding method seems to save about 1 additional int (or average compressed size that is 0.03% smaller), i.e. basically no difference, so you should use the one that is easiest to implement. The following are histograms of the compression ratios for 50000 random samples (they are very similar!).

    histograms

    Summary: using shorts instead of ints and one of the compression methods, you will be able to compress the data to about 1% of its original size!

    For the simulation, I used the following R script:

    SIZE <- 50000
    
    lengths <- sample(1000:10000, SIZE, replace=T)
    nonzeros <- sample(1:100, SIZE, replace=T)
    
    f.rle <- function(len, nonzero) {
      indexes <- sort(c(0,sample(1:len, nonzero, F)))
      steps <- diff(indexes)
      sum(steps > 1) + nonzero # one short per run of zeros, and one per zero
    }
    
    f.index <- function(len, nonzero) {
      nonzero * 2
    }
    
    # using the [value, -1 * number of zeros,...] method
    rle.comprs <- mapply(f.rle, lengths, nonzeros)
    print(mean(lengths - rle.comprs)) # average number of shorts saved
    
    rle.ratios <- rle.comprs / lengths * 100
    print(mean(rle.ratios)) # average compression ratio
    
    # using the [(index, value),...] method
    index.comprs <- mapply(f.index, lengths, nonzeros)
    print(mean(lengths - index.comprs)) # average number of shorts saved
    
    index.ratios <- index.comprs / lengths * 100
    print(mean(index.ratios)) # average compression ratio
    
    
    par(mfrow=c(2,1))
    hist(rle.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Run length encoding")
    hist(index.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Store indices")
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Correct me if I'm wrong, but Server studio doesn't have relationship connectors. Is there
Correct me if I am wrong, but this SQL command: create table MYTABLE (ID
Correct me if I am wrong, but it appears that the SyndicationItem.Content is null
Correct me if I'm wrong, but I'm surprised this hasn't been asked before on
Please correct me if I am wrong. When running java [-options] -jar jarfile with
Please correct me if I am wrong but I understand that John McCarthy was
I have a French site that I want to parse, but am running into
Correct me if I'm wrong, but it seems that Flyway's first step to integrate
(Correct me if I'm wrong.) A server host can detect the pages that a
correct me if i'm wrong, but Adobe AIR currently only allows applications to have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.