Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 477601
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T00:36:26+00:00 2026-05-13T00:36:26+00:00

i have a task to compress a stock market data somehow…the data is in

  • 0

i have a task to compress a stock market data somehow…the data is in a file where the stock value for each day is given in one line and so on…so it’s a really big file.

Eg,
123.45
234.75
345.678
889.56
…..

now the question is how to compress the data (aka reduce the redundancy) using standard algorithms like Huffman or Arithmetic coding or LZ coding…which coding is most preferable for this sort of data??…

I have noticed that if i take the first data and then consider the difference between each consecutive data, there is lot of repetition in the difference values…this makes me wonder if first taking these differences, finding their frequency and hence probalility and then using huffman coding would be a way??…

Am i right?…can anyone give me some suggestions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T00:36:27+00:00Added an answer on May 13, 2026 at 12:36 am

    I think your problem is more complex than merely subtracting the stock prices. You also need to store the date (unless you have a consistent time span that can be inferred from the file name).

    The amount of data is not very large, though. Even if you have data every second for every day for every year for the last 30 years for 300 stockd, you could still manage to store all that in a higher end home computer (say, a MAC Pro), as that amounts to 5Tb UNCOMPRESSED.

    I wrote a quick and dirty script which will chase the IBM stock in Yahoo for every day, and store it “normally” (only the adjusted close) and using the “difference method” you mention, then compressing them using gzip. You do obtain savings: 16K vs 10K. The problem is that I did not store the date, and I don’t know what value correspond to what date, you would have to include this, of course.

    Good luck.

    import urllib as ul
    import binascii as ba
    
    # root URL
    url = 'http://ichart.finance.yahoo.com/table.csv?%s'
    
    # dictionary of options appended to URL (encoded)
    opt = ul.urlencode({
        's':'IBM',       # Stock symbol or ticker; IBM
        'a':'00',        # Month January; index starts at zero
        'b':'2',         # Day 2
        'c':'1978',      # Year 2009
        'd':'10',        # Month November; index starts at zero
        'e':'30',        # Day 30
        'f':'2009',      # Year 2009
        'g':'d',         # Get daily prices
        'ignore':'.csv', # CSV format
        })
    
    # get the data
    data = ul.urlopen(url % opt)
    
    # get only the "Adjusted Close" (last column of every row; the 7th)
    
    close = []
    
    for entry in data:
        close.append(entry.strip().split(',')[6])
    
    # get rid of the first element (it is only the string 'Adj Close') 
    close.pop(0)
    
    # write to file
    f1 = open('raw.dat','w')
    for element in close:
        f1.write(element+'\n')
    f1.close()
    
    # simple function to convert string to scaled number
    def scale(x):
        return int(float(x)*100)
    
    # apply the previously defined function to the list
    close = map(scale,close)
    
    # it is important to store the first element (it is the base scale)
    base = close[0]
    
    # normalize all data (difference from nom)
    close = [ close[k+1] - close[k] for k in range(len(close)-1)]
    
    # introduce the base to the data
    close.insert(0,base)
    
    
    
    # define a simple function to convert the list to a single string
    def l2str(list):
        out = ''
        for item in list:
            if item>=0:
                out += '+'+str(item)
            else:
                out += str(item)
        return out
    
    # convert the list to a string
    close = l2str(close)
    
    f2 = open('comp.dat','w')
    f2.write(close)
    f2.close()
    

    Now compare the “raw data” (raw.dat) versus the “compressed format” you propose (comp.dat)

    :sandbox jarrieta$ ls -lh
    total 152
    -rw-r--r--  1 jarrieta  staff    23K Nov 30 09:28 comp.dat
    -rw-r--r--  1 jarrieta  staff    47K Nov 30 09:28 raw.dat
    -rw-r--r--  1 jarrieta  staff   1.7K Nov 30 09:13 stock.py
    :sandbox jarrieta$ gzip --best *.dat
    :sandbox jarrieta$ ls -lh
    total 64
    -rw-r--r--  1 jarrieta  staff    10K Nov 30 09:28 comp.dat.gz
    -rw-r--r--  1 jarrieta  staff    16K Nov 30 09:28 raw.dat.gz
    -rw-r--r--  1 jarrieta  staff   1.7K Nov 30 09:13 stock.py
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have around 270k data block pairs, each pair consists of one 32KiB and
I have task to prepare two windows with swing. One contains grid of squares,
Good day! I'm looking for solution for perform such task from command-line: Compile ASP.NET
I have task which should be done reapited daily for given time. I have
I have a task to check compilation of code from one of our branches,
I have this compress video task that uses an external program to do it
Basically, i have a program that is given a 4 meg compressed file, it
I am new this sharepoint development and i have task in hand to do
I have this Task model: class Task < ActiveRecord::Base acts_as_tree :order => 'sort_order' end
I have a task and I want to generate some code using the CodeDom.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.