1-cyc cache

The table below shows again the latency for various scenarios.

Case 1-cyc cache (# cycles)
read hit 1
write hit 2
read miss => load 6
write miss => load 7
read miss => write back => load 14
write miss => write back => load 15

Table of Contents

  1. Benchmarks (sim-rtl)
    1. Direct-mapped cache
  2. Post-synthesis
    1. Direct-mapped cache
  3. Post-PAR
    1. Direct-mapped cache

Benchmarks (sim-rtl)

Direct-mapped cache

The total number of cycles to run all benchmarks is 40,058,621. Compared to the number of cycles with the 2-cyc cache, there is a 31% reduction.

bmark_sim-rtl_1-cyc_direct-mapped

Post-synthesis

The synthesis result is generated together with PAR.

Direct-mapped cache

We are able to achieve 11.5ns clock period post-synthesis with 629ps slack. The timing report is here. It suggests that synthesis without PAR can achieve lower clock period.

Post-PAR

Direct-mapped cache

We are able to get 11.5ns clock period after PAR. The timing report is here. The total time required for running all benchmarks is:

T_{total} = \sum N_{cycles} \times T_{clock} = 40,058,621 \times 11.5ns= 0.461s

Below is the screenshot of the floorplan.

dmap_2-cyc_cache_floorplan