2-cyc cache

The table below shows again the latency for various scenarios.

Case 2-cyc cache (# cycles)
read hit 2
write hit 3
read miss => load 6
write miss => load 7
read miss => write back => load 14
write miss => write back => load 15

Table of Contents

  1. Benchmarks (sim-rtl)
    1. Direct-mapped cache
    2. 2-way set-associative cache
  2. Post-synthesis
    1. Direct-mapped cache
    2. 2-way set-associative cache
  3. Post-PAR
    1. Direct-mapped cache
      1. Benchmarks (sim-gl-par)
    2. 2-way set-associative cache

Benchmarks (sim-rtl)

Direct-mapped cache

The total number of cycles to run all benchmarks is 57,929,714.

bmark_sim-rtl_2-cyc_direct-mapped

2-way set-associative cache

The total number of cycles to run all benchmarks is 57,920,130.

bmark_sim-rtl_2-cyc_two-way

Post-synthesis

Post-synthesis results are NOT up-to-date with the design, but changes are small and won’t affect the overall picture. Synthesis documents are stored here.

Direct-mapped cache

With the timing met, we are able to achieve a 7ns clock period post-synthesis. The timing report is here.

2-way set-associative cache

With the timing met, we are able to achieve an 8ns clock period post-synthesis. The timing report is here.

Post-PAR

Post-PAR results are NOT up-to-date with the design, but changes are small and won’t affect the overall picture. With the latest design we can slightly improve the clock to 9ns or even below it (to be verified). PAR documents are stored here.

Direct-mapped cache

We are able to get a 9.05ns clock period after PAR. The timing report is here. The total time required for running all benchmarks is:

T_{total} = \sum N_{cycles} \times T_{clock} = 57,929,714 \times 9.05ns= 0.524s

Below is the screenshot of the floorplan.

dmap_1-cyc_cache_floorplan

Benchmarks (sim-gl-par)

After running sim-gl-par for more than a day, we were able to pass all benchmarks after PAR with a slightly increased number of cycles. The result is shown below:

bmark_summary

A more accurate post-PAR running time for benchmarks is:

T_{par-total} = \sum N_{par-cycles} \times T_{clock} = 58,197,619 \times 9.05ns= 0.527s 

2-way set-associative cache

We are able to get a 12.5ns clock period after par. The timing report is here. It will be slower to run the benchmarks than the direct-mapped cache, so we just skip the calculation. Below is the screenshot of the floorplan.

dmap_cache_floorplan