I have a large database containing words and their inflected forms , e.g.: BASIC_FORM

Question

0

Asked: May 30, 20262026-05-30T04:51:53+00:00 2026-05-30T04:51:53+00:00

I have a large database containing words and their inflected forms , e.g.: BASIC_FORM

0

I have a large database containing words and their inflected forms, e.g.:

BASIC_FORM ##### INFLECED_FORM

talk ----- talk
talk ----- talking
talk ----- talked
talk ----- talks
paragraph ----- paragraph
paragraph ----- paragraphs
...

This database requires a lot of disk space, of course, as soon as it has 1 million entries or more.

What is the best method to “compress” that set of data, i.e. reduce the required amount of disk space while no information is lost?

My first idea was to create an extra column which holds the number of characters that can be copied from the beginning of the basic form. Then you just have to save the part of the inflected form that differs, e.g.:

BASIC_FORM ##### NUM_EQUAL ##### INFLECED_FORM

talk ----- 4 ----- 
talk ----- 4 ----- ing
talk ----- 4 ----- ed
talk ----- 4 ----- s
try ----- 3 ----- 
try ----- 2 ----- ied
paragraph ----- 9 ----- 
paragraph ----- 9 ----- s
...

This should save some amount of disk space as “NUM_EQUAL” can be saved as TINYINT in MySQL (for example) so it requires only 1 byte and in the string “INFLECTED_FORM” you usually save more than 1 character (i.e. more than 1 byte).

Do you have other suggestions to save disk space?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T04:51:55+00:00

You should normalize the model. That means, create a separate table for the basic_form. I’m not sure how much space you will save because that way because that will depend on the data (the longer the words you have and the more inflections you have, the more space you’ll save). However, let’s say you only have one word and one inflected word for each (I know that’s not the case, but let’s take it to that extreme), then having two tables would increase the storage needed.

Now, after aplying the previous refactor (that will also save you some headaches, as normalization always do!) you can also apply YOUR system for reducing the size it takes to store the inlections too.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large database containing words and their inflected forms , e.g.: BASIC_FORM

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply