====================================== Reuse Timing-closed Logic As A Shell ====================================== Background ============ Often in FPGA development, a desirable timing-closed implementation is only achieved after several iterations or many parallel implementation runs of a design. Elusive timing closure can be caused by one or a few stubborn modules in a design that have tight constraints or a large number of moderately difficult paths that have a lower probability of timing closure on any given run. One advantageous strategy to improve timing closure success can be to preserve and enable reuse of a known good implementation of the stubborn logic. By preserving the implementation, place and route tools can (hopefully) avoid rediscovering difficult timing closure and simply focus on the other logic. Some traditional approaches in Vivado to employ this preservation strategy might be to use or `Incremental Implementation Flows `_ `Dynamic Function eXchange `_ (DFX, previously known as partial reconfiguration or PR). Incremental Implementation Flows can work if the design has mostly converged and the amount of future changes to the design is small. However, if significant development still remains, this strategy is unlikely to save compile time. Using DFX, one can lock down a portion of the design to form a reusable shell along with one or more reconfigurable partitions that contains logic under development. However, using DFX for this reuse methodology comes with some additional restrictions such as requiring area constraints and partition pin placements between the static and dynamic partitions of the design. It is more difficult to achieve an overlap of the preserved logic and the new logic and the nature of DFX requires additional DRCs that would not normally be run without using DFX. This tutorial offers an alternative to the DFX flow with fewer restrictions and the ability to reused timing-closed logic without the need of area constraints by using the capabilities inherent in RapidWright. Approach =========== To enable reuse of a timing-closed design as a shell in RapidWright, the original design will need some minor modifications. .. image:: images/Shell-Picture1.png :width: 550px :align: center 1. The design should be logically partitioned into two parts: static and dynamic (as shown in the diagram above). The static part of the design is everything that should be preserved and be part of the "shell". For example, many designs include components for handling network, DDR memory or a PCIe interface. These kinds of modules typically will have more demanding timing constraints and benefit from reusing their timing closure. The dynamic component is the portion of the design that the designer wants to change over time. The main requirement is that the dynamic component must be composed of one or more logical modules. If there is logic that needs to be modified at the top level of the design, it should be migrated into an existing module or a new module should be created and the logic added to it. 2. The interface of the dynamic modules must be consistent with all future logic modules that will populate it. In theory, this is straight-forward. However, during synthesis, design optimization, placement and routing, optimizations can modify the original interface of a module so that it no longer is consistent with the original definition and subsequent runs can cause divergence. To avoid this, dynamic modules should have the `DONT_TOUCH `_ synthesis attribute applied to the module instance. The alternative ``KEEP_HIERARCHY`` is not sufficient as ``DONT_TOUCH`` will stay persistent on the netlist through routing whereas ``KEEP_HIERARCHY`` will only persist through synthesis. Note that applying ``DONT_TOUCH`` to a module instance means that Vivado cannot add or remove pins of the instance, but can connect or disconnect pins and optimize logic inside the hierarchical module. Once the design is properly partitioned and synthesis attributes applied to dynamic modules, the design should be implemented using the typical implementation flow in Vivado. Once a fully placed and routed implementation that meets all requirements has been achieved, this design can be preserved as a design checkpoint (DCP) and used to seed the shell creation process. This candidate shell design can then be loaded into RapidWright and all dynamic modules turned into black boxes. Getting Started ================ 1. Prerequisites ------------------- To run this tutorial, you will need: 1. RapidWright 2023.1.3 or later 2. Vivado 2023.1 or later 2. Creating a Candidate Implementation ---------------------------------------- For the ease of demonstration purposes in this tutorial, we have chosen a simple RISCV design targeting a KCU105 board (Kintex UltraScale ``xcku040-ffva1156-2-e``). The design was created using the `Linux on LiteX-VexRiscv `_ project, but we will recreate the design using a minimal set of steps and dependencies. .. note:: This design compilation step can take up to 30 minutes to complete and it is highly recommended to skip past it to save time. To do so, you can :download:`download the output files ` instead by running: .. code-block:: bash wget http://www.rapidwright.io/docs/_downloads/kcu105_step2.zip unzip kcu105_step2.zip cd kcu105 vivado & and then skip to step 3. To get started, follow the commands below to :download:`download the source files `: .. code-block:: bash wget http://www.rapidwright.io/docs/_downloads/kcu105_example.zip unzip kcu105_example.zip cd kcu105 vivado -source kcu105.tcl & The included script will create a Vivado project, load the generated Verilog and synthesize, optimize and place and route the design. The Verilog module for one of the RISCV CPUs has already been annotated for you with ``DONT_TOUCH`` and will serve as our dynamic module for this tutorial. The script will take several minutes to complete but will generate a placed and routed DCP and EDIF file ready for RapidWright. Notice we are running Vivado in the background as we will come back to the terminal shortly. A sample result is shown in the image below with the leaf cells of CPU core (``cores_1_cpu_logic_cpu``) highlighted in yellow. .. image:: images/Shell-Picture2.png :width: 550px :align: center Out of convenience for this tutorial, we will generate the logic that will populate the dynamic module directly from this project. We simply need to change the top of the design to the ``VexRiscv_1`` core and then resynthesize using the ``-mode out_of_context`` option: .. code-block:: set_property top VexRiscv_1 [current_fileset] reset_run synth_1 synth_design -mode out_of_context write_checkpoint riscv_1_synth.dcp At this point we should have two DCPs, one placed and routed candidate DCP to be made into a shell and one synthesized RISCV core that will populate the dynamic region in our shell. 3. Creating a Shell ---------------------- To create a shell implementation, we need to take our top-level RISCV design that has the static portion meeting all necessary constraints and remove all logic from the dynamic components. To remove the logic in the dynamic module, we need to use RapidWright in order to carefully separate the static logic from the dynamic logic as no area constraints (i.e. pblocks) were used to separate the two. Vivado can create a black box but can only do so correctly when the module made into a black box was sufficiently constrained such that all of its logic does not share any sites with any static logic. RapidWright has a built-in command that can accept a DCP and one or more cell instance names and produce a shell-based design with the cell instances turned into black boxes. For our example, we can run RapidWright from the command line (outside of Vivado): .. code-block:: bash rapidwright MakeBlackBox kcu105_route.dcp kcu105_route_shell.dcp VexRiscvLitexSmpCluster_Cc4_Iw64Is8192Iy2_Dw64Ds8192Dy2_ITs4DTs4_Ldw512_Cdma_Ood/cores_1_cpu_logic_cpu This will create a new "shell" DCP (``kcu105_route_shell.dcp``) where the dynamic module has been turned into a black box. This DCP can then be used again and again as a base starting point as it contains an implemented solution for all of the static logic and we will use Vivado (and RapidWright in the future) to place and route additional dynamic modules on top of it. 4. Populating a Black Box --------------------------- Returning to our running Vivado instance, we can close our previous project and load the shell DCP using ``open_checkpoint`` at the Tcl command prompt: .. code-block:: close_project open_checkpoint kcu105_route_shell.dcp .. note:: Due to the large number of constraints generated in RapidWright, opening the checkpoint might take a few minutes. If RapidWright was able to correctly create the black box, you should see exactly one critical warning, which may show up in a dialog from Vivado as shown below: .. image:: images/Shell-Picture3.png :width: 550px :align: center The implemented design will look similar to the original design, except that the cells previously highlighted in yellow above will be missing: .. image:: images/Shell-Picture4.png :width: 550px :align: center You may also notice that several BEL sites have been marked with a ``PROHIBIT`` property that prevents any cells from being placed in those locations. Through experimentation, it has been found that cells placed in the same half SLICE as those in the existing static logic portion of the design can lead to congestion. Therefore, RapidWright adds the ``PROHIBIT`` property to the remaining BEL sites in any occupied half SLICEs to avoid this issue. These prohibited locations can be seen in the image below (the red circles with a slash): .. image:: images/Shell-Picture4a.png :width: 550px :align: center We can also verify that the design is consistent by checking the routing status: .. code-block:: report_route_status Which should return a result something like this: .. code-block:: Design Route Status : # nets : ------------------------------------------- : ----------- : # of logical nets.......................... : 65546 : # of nets not needing routing.......... : 23431 : # of internally routed nets........ : 20613 : # of nets with no loads............ : 2818 : # of routable nets..................... : 42115 : # of unrouted nets................. : 38 : # of fully routed nets............. : 42077 : # of nets with routing errors.......... : 0 : ------------------------------------------- : ----------- : The key element to look for is that there are no nets with routing errors. Since we see that value is ``0`` we can proceed. At this point, we want to lock down the implementation so that further place and route runs do not upset the timing closure of the design. We can do this by running the Vivado Tcl command: .. code-block:: lock_design -level routing This tags the netlist, placement and routing such that ``place_design`` and ``route_design`` do not modify the netlist of the existing implementation--thus preserving the original timing closure. To populate the black box with the synthesized, out-of-context version of the RISCV core, we can load it directly in Vivado with ``read_checkpoint -cell`` (this is different from ``open_checkpoint``). .. code-block:: read_checkpoint -cell VexRiscvLitexSmpCluster_Cc4_Iw64Is8192Iy2_Dw64Ds8192Dy2_ITs4DTs4_Ldw512_Cdma_Ood/cores_1_cpu_logic_cpu riscv_1_synth.dcp Once the dynamic module has been loaded with the synthesized RISCV core, we can implement the design and check the results .. code-block:: # We need to waive a DRC due to the nature of the design set_msg_config -id {Common 17-55} -new_severity {Warning} set_property SEVERITY {Warning} [get_drc_checks REQP-1753] place_design route_design report_route_status report_timing Results should looks similar to: .. code-block:: Design Route Status : # nets : ------------------------------------------- : ----------- : # of logical nets.......................... : 75917 : # of nets not needing routing.......... : 28293 : # of internally routed nets........ : 24553 : # of nets with no loads............ : 3740 : # of routable nets..................... : 47624 : # of nets with fixed routing....... : 41853 : # of fully routed nets............. : 47624 : # of nets with routing errors.......... : 0 : ------------------------------------------- : ----------- : and should meet timing: .. code-block:: Timing Report Slack (MET) : 0.253ns (required time - arrival time) Source: main_crg_idelayctrl_ic_reset_reg/C (rising edge-triggered cell FDRE clocked by main_crg_clkout1 {rise@0.000ns fall@2.500ns period=5.000ns}) Destination: IDELAYCTRL_REPLICATED_0_2/RST (recovery check against rising-edge clock main_crg_clkout1 {rise@0.000ns fall@2.500ns period=5.000ns}) Path Group: **async_default** Path Type: Recovery (Max at Slow Process Corner) Requirement: 5.000ns (main_crg_clkout1 rise@5.000ns - main_crg_clkout1 rise@0.000ns) Data Path Delay: 3.838ns (logic 0.117ns (3.048%) route 3.721ns (96.952%)) Logic Levels: 0 Clock Path Skew: -0.211ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 5.765ns = ( 10.765 - 5.000 ) Source Clock Delay (SCD): 6.087ns Clock Pessimism Removal (CPR): 0.112ns Clock Uncertainty: 0.065ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE Total System Jitter (TSJ): 0.071ns Discrete Jitter (DJ): 0.108ns Phase Error (PE): 0.000ns Clock Net Delay (Source): 2.666ns (routing 1.174ns, distribution 1.492ns) Clock Net Delay (Destination): 2.359ns (routing 1.078ns, distribution 1.281ns) Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock main_crg_clkout1 rise edge) 0.000 0.000 r G10 0.000 0.000 r clk125_p (IN) net (fo=0) 0.001 0.001 IBUFDS/I HPIOBDIFFINBUF_X1Y59 DIFFINBUF (Prop_DIFFINBUF_HPIOBDIFFINBUF_DIFF_IN_P_O) 0.521 0.522 r IBUFDS/DIFFINBUF_INST/O net (fo=1, routed) 0.090 0.612 IBUFDS/OUT G10 IBUFCTRL (Prop_IBUFCTRL_HPIOB_I_O) 0.000 0.612 r IBUFDS/IBUFCTRL_INST/O net (fo=1, routed) 0.750 1.362 IBUFDS_n_0_BUFG_inst_n_0 BUFGCE_X1Y52 BUFGCE (Prop_BUFCE_BUFGCE_I_O) 0.083 1.445 r IBUFDS_n_0_BUFG_inst/O net (fo=9, routed) 1.687 3.132 main_crg_clkin MMCME3_ADV_X1Y2 MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT1) -0.231 2.901 r MMCME2_ADV/CLKOUT1 net (fo=1, routed) 0.437 3.338 main_crg_clkout1 BUFGCE_X1Y69 BUFGCE (Prop_BUFCE_BUFGCE_I_O) 0.083 3.421 r BUFG/O X0Y1 (CLOCK_ROOT) net (fo=31, routed) 2.666 6.087 idelay_clk SLICE_X0Y139 FDRE r main_crg_idelayctrl_ic_reset_reg/C ------------------------------------------------------------------- ------------------- SLICE_X0Y139 FDRE (Prop_HFF2_SLICEL_C_Q) 0.117 6.204 f main_crg_idelayctrl_ic_reset_reg/Q net (fo=25, routed) 3.721 9.925 main_crg_idelayctrl_ic_reset BITSLICE_CONTROL_X0Y3 IDELAYCTRL f IDELAYCTRL_REPLICATED_0_2/RST ------------------------------------------------------------------- ------------------- (clock main_crg_clkout1 rise edge) 5.000 5.000 r G10 0.000 5.000 r clk125_p (IN) net (fo=0) 0.001 5.001 IBUFDS/I HPIOBDIFFINBUF_X1Y59 DIFFINBUF (Prop_DIFFINBUF_HPIOBDIFFINBUF_DIFF_IN_P_O) 0.324 5.325 r IBUFDS/DIFFINBUF_INST/O net (fo=1, routed) 0.051 5.376 IBUFDS/OUT G10 IBUFCTRL (Prop_IBUFCTRL_HPIOB_I_O) 0.000 5.376 r IBUFDS/IBUFCTRL_INST/O net (fo=1, routed) 0.649 6.025 IBUFDS_n_0_BUFG_inst_n_0 BUFGCE_X1Y52 BUFGCE (Prop_BUFCE_BUFGCE_I_O) 0.075 6.100 r IBUFDS_n_0_BUFG_inst/O net (fo=9, routed) 1.524 7.624 main_crg_clkin MMCME3_ADV_X1Y2 MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT1) 0.335 7.959 r MMCME2_ADV/CLKOUT1 net (fo=1, routed) 0.372 8.331 main_crg_clkout1 BUFGCE_X1Y69 BUFGCE (Prop_BUFCE_BUFGCE_I_O) 0.075 8.406 r BUFG/O X0Y1 (CLOCK_ROOT) net (fo=31, routed) 2.359 10.765 idelay_clk BITSLICE_CONTROL_X0Y3 IDELAYCTRL r IDELAYCTRL_REPLICATED_0_2/REFCLK clock pessimism 0.112 10.876 clock uncertainty -0.065 10.812 BITSLICE_CONTROL_X0Y3 IDELAYCTRL (Recov_CONTROL_BITSLICE_CONTROL_REFCLK_RST) -0.633 10.179 IDELAYCTRL_REPLICATED_0_2 ------------------------------------------------------------------- required time 10.179 arrival time -9.925 ------------------------------------------------------------------- slack 0.253 The final implementation with the newly populated dynamic module highlighted in green is shown below. .. image:: images/Shell-Picture5.png :width: 550px :align: center Complexity can vary widely amongst different designs, so not all designs may benefit from this approach. However, please `reach out `_ to the RapidWright team if you encounter challenges when applying this approach for your own projects.