I wanted to know how the processor does multiplication in a multi-cycle data-path right from the beginning i.e from Instruction Reading -> decoding the instruction-> reading register files etc.
In other word I wanted to know that given the booth’s algorithm for multiplication implemented separately (a circuit is given) how will you extend the multi-cycle data-path to support multiply instruction using minimum resources.
Can you tell for division also?
Another good link is the opensource OpenFire microprocessor core, variant/analog of MicroBlaze (which is based on DLX) here:
http://opencores.org/websvn,listing?repname=openfire_core&path=%2Fopenfire_core%2Ftrunk%2Fopenfire_top_syn%2Fhdl%2Fverilog%2F#path_openfire_core_trunk_openfire_top_syn_hdl_verilog_
Part of datapath for ALU and Multiplier unit is in
openfire_primitives.vfile.Manual of DLX datapath with good explanation of pipeline stalls and bubbles is
http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/hazards.html
And there is info on mulitcycle ops (DLX)
http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/multicycle.html
So, operation which can stay in pipeline (need more ticks) will insert stalls (or bubble) to pipeline. You can think about this as stopping all stages of pipeline besides EX, which do a Long operation for several tisks.
Another Mul/Div unit of opensource Verilog is here:
http://opencores.org/websvn,filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%2For1200%2Frtl%2Fverilog%2For1200_mult_mac.v