So, I’m trying to write an adder tree in verilog. The generic part of it is that it has a configurable number of elements to add and a configurable word size. However, I’m encountering problem after problem and I’m starting to question that this is the right way to solve my problem. (I will be using it in a larger project.) It is definately possible to just hard code the adder tree, alhough that will take alot of text.
So, I though I’d check with you stack overflowers on what you think about it. Is this “the way to do it”? I’m open for suggestions on different approaches too.
I can also mention that I’m quite new to verilog.
In case anyone is interested, here’s my current non-working code: (I’m not expecting you to solve the problems; I’m just showing it for convenience.)
module adderTree(
input clk,
input [`WORDSIZE * `BANKSIZE - 1 : 0] terms_flat,
output [`WORDSIZE - 1 : 0] sum
);
genvar i, j;
reg [`WORDSIZE - 1 : 0] pipeline [2 * `BANKSIZE - 1 : 0]; // Pipeline array
reg clkPl = 0; // Pipeline clock
assign sum = pipeline[0];
// Pack flat terms
generate
for (i = `BANKSIZE; i < 2 * `BANKSIZE; i = i + 1) begin
always @ (posedge clk) begin
pipeline[i] <= terms_flat[i * `WORDSIZE +: `WORDSIZE];
clkPl = 1;
end
end
endgenerate
// Add terms logarithmically
generate
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always @ (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
end
end
end
endgenerate
endmodule
Here are a few comments you might find useful:
CLOCKING
It is generally good to have as few clocks as possible in your design (preferably just one).
In this particular case it appears you are trying generating a new clock clkPl, but this does not work because it will never return to 0. (The “reg clkPl=0;” will reset it to 0 at time 0, then it is set permanently to 1 in “clkPl = 1;”.)
You can fix this by simply replacing
with
ASSIGNMENTS
It is good form to only use blocking assignments in combinatorial blocks, and non-blocking in clocked blocks. You are mixing both blocking and non-blocking assignments in your “Pack flat terms” section.
As you don’t need clkPl you can simply delete the line with the blocking assignment (“clkPl = 1;”)
TREE STRUCTURE
Your double for loop:
looks like it will access incorrect elements.
e.g. for BANKSIZE = 28, **i will count up to 7, at which point “pipeline[i * (2 ** i) + j]”=”pipeline[7*2**7+j]”=”pipeline[896+j] which will be out of bounds for the array. (The array has 2*BANKSIZE=512 elements in it.)
I think you actually want this structure:
LOWER LATENCY
Note that most verilog tools are very good at synthesising adds of multiple elements so you may want to consider combining more terms at each level of the hierarchy.
(Adding more terms costs less than someone might expect because the tools can use optimisations such as carry save adders to reduce the gate delay.)