I need to make a hash table that can eventually be used to write a full assembler.
Basically I will have something like:
foo 100,
and I will need to hash foo and then store the 100 (the address of the command). I was thinking I should just use a 2d array. The second dimension of the array would only be accessed when recording the address (just an int) or when returning the address. There would be no searching done in the second dimension.
If I implement the hash table this way, would it be inefficient? If it is very inefficient, what would be a better way to implement the table?
Edit: I haven’t written any code yet. In fact I don’t even know what language I’m going to use yet. I want to write it in C so it will be more of a challenge, but I might write it in Java if I feel pressured for time.
If you have every other int in the array unused then in addition to memory waste you’re going to use the cache poorly as the cache lines will be underused.
But normally I wouldn’t worry about such things when writing an assembler as it’s not something very performance demanding as say graphics or heavy computations. At least, I wouldn’t rush into optimizing too early.
It is, however, important to keep in mind that once you start assembling large pieces of code (~100,000 lines of assembly) generated automatically (say, from C/C++ code by a compiler), performance will become more and more important as the user experience (wait times) degrades. At that point there will be many candidates for optimization: I/O, parsing, symbol look up, generation of as short as possible jump instructions if they can have multiple encodings for shorter and longer jumps. Expressions and macros will contribute too. You may even consider minimizing white space and comments in the input assembly code in the first place.