1-cyc cache

The table below shows again the latency for various scenarios.

Case	1-cyc cache (# cycles)
read hit	1
write hit	2
read miss => load	6
write miss => load	7
read miss => write back => load	14
write miss => write back => load	15

Benchmarks (sim-rtl)
1. Direct-mapped cache
Post-synthesis
1. Direct-mapped cache
Post-PAR
1. Direct-mapped cache

Benchmarks (sim-rtl)

Direct-mapped cache

The total number of cycles to run all benchmarks is 40,058,621. Compared to the number of cycles with the 2-cyc cache, there is a 31% reduction.

bmark_sim-rtl_1-cyc_direct-mapped

Post-synthesis

The synthesis result is generated together with PAR.

Direct-mapped cache

We are able to achieve 11.5ns clock period post-synthesis with 629ps slack. The timing report is here. It suggests that synthesis without PAR can achieve lower clock period.

Post-PAR

Direct-mapped cache

We are able to get 11.5ns clock period after PAR. The timing report is here. The total time required for running all benchmarks is:

T_{total} = \sum N_{cycles} \times T_{clock} = 40,058,621 \times 11.5ns= 0.461s

Below is the screenshot of the floorplan.

dmap_2-cyc_cache_floorplan

1-cyc cache

Table of Contents

Benchmarks (sim-rtl)

Direct-mapped cache

Post-synthesis

Direct-mapped cache

Post-PAR

Direct-mapped cache