Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4052500
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T14:19:40+00:00 2026-05-20T14:19:40+00:00

I want to store web pages in compressed text files (CSV). To achieve the

  • 0

I want to store web pages in compressed text files (CSV). To achieve the optimal compression, I would like to provide a set of 1000 web pages. The library should then spend some time creating the optimal “dictionary” for this content. One obvious “dictionary” entry could be <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">, which could get stored as %1 or something like that because it is present on almost all web pages. By creating a customized dictionary like this, the compression rates should be 99% in my case.

My question is, does a library for doing this exist on Windows with MIT or similar liberal licensing exist? If not, are there any general purpose compression libaries you would recommend. I have tried a bit with zlib, but it outputs binary data. If I would convert this binary data into text, I am worried that the result might be longer than the original text.

EDIT: I need to be able to store the text in CSV files and still be able to import them into a database or even Excel.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T14:19:40+00:00Added an answer on May 20, 2026 at 2:19 pm
    1. “text files (not binary)” is a little too general. If you mean that some
      byte values (00,1A or whatever) can’t be used, then any binary method +
      something like base64 coding can be used. (Although I’d suggest a more efficient method
      from Coroutine demo source).

      To be specific, you can use any general-purpose compressor to compress your
      base file, then base file + target file, then diff these, and you’d get
      a dictionary compression (binary), which can be then converted to “text”
      with base64 or yenc or whatever.

      Alternatively, there’re some coders with build-in support for that, for example
      http://compression.ru/ds/ppmtrain.rar
      http://code.google.com/p/lzham/

    2. If you actually want to have common phrases replaced with references, and
      all other things left untouched (what is kinda implied, but not equals to “text output”),
      you can use text preprocessors like:
      http://xwrt.sourceforge.net/
      http://compression.ru/ds/liptify.rar
      (There were more afair).

    3. Also a hybrid method is possible. You can use a general-purpose LZ compressor like in [1], for example lzma, then replace its entropy coding with something text-based.
      For example, in http://nishi.dreamhosters.com/u/lzmarec_v1_bin.rar
      there’s an utility which removes LZMA’s entropy coding, and its pretty easy to convert
      its output to text.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to develop web store on GAE, that takes data directly from OpenERP
I want to store events in a web application I am fooling around with
I want to use session object in my web app.I want to store some
I'm working on small web form,in C#/ASP.NET. I want to store the errors messages
I want my web's register users to store their location (latitude & longitude) from
I want to create a web app that lets me create, store and display
I'm coding in ASP.NET and want to store audio files (.mp3, or smaller formats)
In web development, when we want to pass something between different pages, we might
For my web application, I need to store form inputs spanning across multiple pages,
I am using Twitter bootstrap for developing Web pages,Now I want to integrate this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.