Create Placed and Routed DCP to Cross SLR

What You’ll Need to Get Started:
  • RapidWright 2018.2 or later
  • Vivado 2018.2 or later

One of the example programs that is provided with RapidWright solves a challenging problem on UltraScale+ devices (this approach is not valid for Series 7 or UltraScale parts). Crossing super logic region (SLR) boundaries at high speed can prove quite difficult in conventional Vivado flows. The hardware provides dedicated TX/RX flip flops in Laguna sites to enable the creation of paths with very short delay but experience two significant problems:

  1. The dedicated super long lines (SLLs) that connect TX and RX Laguna flip flop pairs are often sensitive to hold time violations due to the higher multi-die variability.
  2. Paths crossing the SLR boundary are taxed with an additional delay penality called “Inter-SLR Compensation” (ISC). This penalty increases the calculated delay and reduces it potential for high speed.
_images/ISC.png

Example Vivado tooltip window describing the Inter-SLR Compensation delay penalty

In RapidWright, we have created a parametrized, stand-alone application that can automatically generate a placed and routed DCP from scratch that implements a circuit that eliminates and minimizes the two challenges mentioned above. First, it creates a netlist with pairs of flops that are connected and placed and routed across SLR crossings using the dedicated Laguna TX/RX flip flop sites. Next, it custom routes the clock (the circuit has its own BUFGCE) such that it can individually tune the leaf clock buffers (LCBs) for each direction on each side of the SLR. By using the LCBs, the hold time in the first challenge mentioned above is eliminated. To minimize the ISC penalty, a clock root is generated for each clock region (CR) that contains an SLR crossing.

Steps to Run

  1. Ensure you have RapidWright correctly setup and/or installed. See the Getting Started page for details.
  2. Run java com.xilinx.rapidwright.examples.SLRCrosserGenerator -h. This will print all the available options to parameterize the SLR crossing output, example output below:
==============================================================================
==                        SLR Crossing DCP Generator                        ==
==============================================================================
This RapidWright program creates a placed and routed DCP that can be
imported into UltraScale+ designs to aid in high speed SLR crossings.  See
RapidWright documentation for more information.

Option                                   Description
------                                   -----------
-?, -h                                   Print Help
-a [String: Clk input net name]          (default: clk_in)
-b [String: Clock BUFGCE site name]      (default: BUFGCE_X0Y218)
-c [String: Clk net name]                (default: clk)
-d [String: Design Name]                 (default: slr_crosser)
-i [String: Input bus name prefix]       (default: input)
-l [String: Comma separated list of      (default: LAGUNA_X2Y120)
  Laguna sites for each SLR crossing]
-n [String: North bus name suffix]       (default: _north)
-o [String: Output DCP File Name]        (default: slr_crosser.dcp)
-p [String: UltraScale+ Part Name]       (default: xcvu9p-flgc2104-2-i)
-q [String: Output bus name prefix]      (default: output)
-r [String: INT clk Laguna RX flops]     (default: GCLK_B_0_1)
-s [String: South bus name suffix]       (default: _south)
-t [String: INT clk Laguna TX flops]     (default: GCLK_B_0_0)
-u [String: Clk output net name]         (default: clk_out)
-v [Boolean: Print verbose output]       (default: true)
-w [Integer: SLR crossing bus width]     (default: 512)
-x [Double: Clk period constraint (ns)]  (default: 1.538)
-y [String: BUFGCE cell instance name]   (default: BUFGCE_inst)
-z [Boolean: Use common centroid]        (default: false)
  1. A default scenario of a single bi-directional crossing of 512 bits is generated at the LAGUNA_X2Y120 site on a VU9P part if no options are provided. The DCP is generated in the current working directory with the name slr_crosser.dcp unless the -o option is specified.
$ java com.xilinx.rapidwright.examples.SLRCrosserGenerator
==============================================================================
==                           SLRCrosserGenerator                            ==
==============================================================================
                    Init:     4.787s
          Create Netlist:     0.123s
     Place SLR Crossings:     0.121s
      Custom Clock Route:     3.756s
           Route VCC/GND:     0.079s
              Write EDIF:     0.148s
     Writing XDEF Header:     0.090s
  Writing XDEF Placement:     0.213s
    Writing XDEF Routing:     0.404s
 Writing XDEF Finalizing:     0.079s
             Writing XDC:     0.039s
------------------------------------------------------------------------------
         [No GC] *Total*:     9.839s
Wrote final DCP: /home/user/slr_crosser.dcp
  1. Open the DCP using Vivado to view the design. It should look similar to the annotated screenshot below:
_images/SLR_crossing.png

Vivado Screenshot with bubble annotations of a single, bi-direction 512-bit SLR crossing circuit.

  1. You can also unzip the DCP (treating it like an ordinary ZIP file) and inside you’ll find Verilog and VHDL stubs that can be imported into RTL designs for black box inclusion. Example output below:
$ unzip slr_crosser.dcp
Archive:  slr_crosser.dcp
  inflating: slr_crosser.edf
  inflating: slr_crosser.xdef
  inflating: slr_crosser_late.xdc
  inflating: slr_crosser_stub.v
  inflating: slr_crosser_stub.vhdl
  inflating: dcp.xml
$ cat slr_crosser_stub.v
// This file was generated by RapidWright 2018.2.0.

// This empty module with port declaration file causes synthesis tools to infer a black box for IP.
// Please paste the declaration into a Verilog source file or add the file as an additional source.
module slr_crosser(clk_in, clk_out, input0_north, input0_south, output0_north, output0_south);
  input clk_in;
  output clk_out;
  input [511:0]input0_north;
  input [511:0]input0_south;
  output [511:0]output0_north;
  output [511:0]output0_south;
endmodule
$

Optionally, you can open the DCP in Vivado and write out the netlist as EDIF, Verilog or VHDL to be packaged as an IP. The DCP can then be dropped into the IP cache later.

  1. As one additional example, the generator is capable of using every SLL in the device. To generate such a DCP for a VU9P device, run:
$ java com.xilinx.rapidwright.examples.SLRCrosserGenerator -w 720 -l LAGUNA_X0Y120,LAGUNA_X2Y120,LAGUNA_X4Y120,LAGUNA_X6Y120,LAGUNA_X8Y120,LAGUNA_X10Y120,LAGUNA_X12Y120,LAGUNA_X14Y120,LAGUNA_X16Y120,LAGUNA_X18Y120,LAGUNA_X20Y120,LAGUNA_X22Y120,LAGUNA_X0Y360,LAGUNA_X2Y360,LAGUNA_X4Y360,LAGUNA_X6Y360,LAGUNA_X8Y360,LAGUNA_X10Y360,LAGUNA_X12Y360,LAGUNA_X14Y360,LAGUNA_X16Y360,LAGUNA_X18Y360,LAGUNA_X20Y360,LAGUNA_X22Y360

The resultant DCP should look similar to the following in Vivado:

_images/Full_SLR_crossing.png

Vivado Screenshot of all SLLs being used at potentially a 760MHz for a speed grade 2 device.