i have a task to compress a stock market data somehow…the data is in

Question

0

Asked: May 13, 20262026-05-13T00:36:26+00:00 2026-05-13T00:36:26+00:00

i have a task to compress a stock market data somehow…the data is in

0

i have a task to compress a stock market data somehow…the data is in a file where the stock value for each day is given in one line and so on…so it’s a really big file.

Eg,
123.45
234.75
345.678
889.56
…..

now the question is how to compress the data (aka reduce the redundancy) using standard algorithms like Huffman or Arithmetic coding or LZ coding…which coding is most preferable for this sort of data??…

I have noticed that if i take the first data and then consider the difference between each consecutive data, there is lot of repetition in the difference values…this makes me wonder if first taking these differences, finding their frequency and hence probalility and then using huffman coding would be a way??…

Am i right?…can anyone give me some suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T00:36:27+00:00

I think your problem is more complex than merely subtracting the stock prices. You also need to store the date (unless you have a consistent time span that can be inferred from the file name).

The amount of data is not very large, though. Even if you have data every second for every day for every year for the last 30 years for 300 stockd, you could still manage to store all that in a higher end home computer (say, a MAC Pro), as that amounts to 5Tb UNCOMPRESSED.

I wrote a quick and dirty script which will chase the IBM stock in Yahoo for every day, and store it “normally” (only the adjusted close) and using the “difference method” you mention, then compressing them using gzip. You do obtain savings: 16K vs 10K. The problem is that I did not store the date, and I don’t know what value correspond to what date, you would have to include this, of course.

Good luck.

import urllib as ul
import binascii as ba

# root URL
url = 'http://ichart.finance.yahoo.com/table.csv?%s'

# dictionary of options appended to URL (encoded)
opt = ul.urlencode({
    's':'IBM',       # Stock symbol or ticker; IBM
    'a':'00',        # Month January; index starts at zero
    'b':'2',         # Day 2
    'c':'1978',      # Year 2009
    'd':'10',        # Month November; index starts at zero
    'e':'30',        # Day 30
    'f':'2009',      # Year 2009
    'g':'d',         # Get daily prices
    'ignore':'.csv', # CSV format
    })

# get the data
data = ul.urlopen(url % opt)

# get only the "Adjusted Close" (last column of every row; the 7th)

close = []

for entry in data:
    close.append(entry.strip().split(',')[6])

# get rid of the first element (it is only the string 'Adj Close') 
close.pop(0)

# write to file
f1 = open('raw.dat','w')
for element in close:
    f1.write(element+'\n')
f1.close()

# simple function to convert string to scaled number
def scale(x):
    return int(float(x)*100)

# apply the previously defined function to the list
close = map(scale,close)

# it is important to store the first element (it is the base scale)
base = close[0]

# normalize all data (difference from nom)
close = [ close[k+1] - close[k] for k in range(len(close)-1)]

# introduce the base to the data
close.insert(0,base)



# define a simple function to convert the list to a single string
def l2str(list):
    out = ''
    for item in list:
        if item>=0:
            out += '+'+str(item)
        else:
            out += str(item)
    return out

# convert the list to a string
close = l2str(close)

f2 = open('comp.dat','w')
f2.write(close)
f2.close()

Now compare the “raw data” (raw.dat) versus the “compressed format” you propose (comp.dat)

:sandbox jarrieta$ ls -lh
total 152
-rw-r--r--  1 jarrieta  staff    23K Nov 30 09:28 comp.dat
-rw-r--r--  1 jarrieta  staff    47K Nov 30 09:28 raw.dat
-rw-r--r--  1 jarrieta  staff   1.7K Nov 30 09:13 stock.py
:sandbox jarrieta$ gzip --best *.dat
:sandbox jarrieta$ ls -lh
total 64
-rw-r--r--  1 jarrieta  staff    10K Nov 30 09:28 comp.dat.gz
-rw-r--r--  1 jarrieta  staff    16K Nov 30 09:28 raw.dat.gz
-rw-r--r--  1 jarrieta  staff   1.7K Nov 30 09:13 stock.py

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i have a task to compress a stock market data somehow…the data is in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply