=============================================================== Polynomial Generator: Placed and Routed Circuits in Seconds =============================================================== Background ============ Often, FPGA compilation runtime can be long due to the complexity and nature of problems that are solved in mapping a user's design onto the fabric of an FPGA. However, if we scope a user's design to a specific domain, the compilation process can become signficantly simplified. This tutorial aims to provide a limited scope proof-of-concept of this idea. Consider the mathematical formula called a `polynomial `_. A polynomial equation consists of an expression of variables and coefficients that involves only the operations of addition, subtraction, multiplication and positive integer-powers of variables. Since a polynomical relies on a finite set of mathematical operations, we can devise circuit "generators" for these operators that can be created on-the-fly. These circuit generators (adders, subtractors, multipliers and raise-to-integer-power) have very predictable implementation patterns on FPGA fabric and can thus be placed and internally routed very quickly. In this application, the polynomials supported have the following attributes: * Coefficients are integers and the first coefficient is positive * No division operations are present * Four mathematical operators: addition, subtraction, multiplication and positive integer powers RapidWright has three generators to support the ``PolynomialGenerator`` that implement the four supported mathematical operators. A combination adder/subtractor generator for addition and subtraction and a multiplier generator for multiplication and raise to the integer power (chaining multiple multipliers together). Getting Started ================ 1. Prerequisites ------------------- To run this tutorial, you will need: 1. RapidWright 2023.1.4 or later 2. Vivado 2023.1 or later 2. Creating a Simple Polynomial Circuit in Seconds --------------------------------------------------- The interface to run the ``PolynomialGenerator`` is quite simple: .. code-block:: bash rapidwright PolynomialGenerator Which should produce the usage message: .. code-block:: bash USAGE: [--hand-placer] The polynomial syntax requires explicit operators and expanded set of terms (no parethesis or factors), for example: .. math:: (x-1)(x+2) = x^2+x-2 should be rewritten as ``x^2+x-2``. Coefficient also will require the explicit multiplication operator ``*``, for example: .. math:: 3x^2+x-2 should be rewritten as ``3*x^2+x-2``. The mathematical generators will create placed and routed circuits up to 18 bits of width. Although the FPGA fabric can support much larger dimensions, for simplicity we limit this proof-of-concept to 18 bits. Further work could push this limit far beyond 18 bits. We can generate this polynomial with 16 bit operators with the following command: .. code-block:: bash rapidwright PolynomialGenerator 3*x^2+x-2 16 With the following output (`RWRoute `_ output removed for brevity): .. code-block:: bash ============================================================================== == Polynomial Generator == ============================================================================== Load Device: 1.692s Init Operators: 0.878s Build Operator Tree: 0.183s ... ... Final Route: 0.474s Write DCP: 0.455s ------------------------------------------------------------------------------ [No GC] *Total*: 3.681s Wrote DCP: polynomial.dcp The resulting DCP, ``polynomial.dcp`` should be generated in just a few seconds and can be examined by Vivado by running: .. code-block:: bash vivado polynomial.dcp & (Let's run it in the background so we can return to the terminal later with Vivado still running). Once loaded, we can zoom to the placed and routed circuit in clock region ``X3Y3``, we can also highlight the individual operators by color by running the following Tcl command in the Vivado Tcl prompt: .. code-block:: foreach c [get_cells] { incr i; highlight_objects -leaf_cells $c -color_index $i } The resulting circuit should looks similar to this: .. image:: images/polynomial1.png :width: 550px :align: center We can also run ``report_route_status``: .. code-block:: report_route_status .. code-block:: Design Route Status : # nets : ------------------------------------------- : ----------- : # of logical nets.......................... : 1775 : # of nets not needing routing.......... : 1659 : # of internally routed nets........ : 1334 : # of nets with no loads............ : 292 : # of implicitly routed ports....... : 33 : # of routable nets..................... : 116 : # of fully routed nets............. : 116 : # of nets with routing errors.......... : 0 : ------------------------------------------- : ----------- : which shows the design being fully routed without any errors or violations. We can also check timing with ``report_timing``: .. code-block:: report_timing .. code-block:: Slack (MET) : 0.310ns (required time - arrival time) Source: mult2_51/mult/DSP_OUTPUT_INST/CLK (rising edge-triggered cell DSP_OUTPUT clocked by clk {rise@0.000ns fall@0.646ns period=1.291ns}) Destination: mult_34/mult/DSP_A_B_DATA_INST/A[9] (rising edge-triggered cell DSP_A_B_DATA clocked by clk {rise@0.000ns fall@0.646ns period=1.291ns}) Path Group: clk Path Type: Setup (Max at Slow Process Corner) Requirement: 1.291ns (clk rise@1.291ns - clk rise@0.000ns) Data Path Delay: 0.577ns (logic 0.207ns (35.875%) route 0.370ns (64.125%)) Logic Levels: 0 Clock Path Skew: -0.095ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 1.763ns = ( 3.054 - 1.291 ) Source Clock Delay (SCD): 2.047ns Clock Pessimism Removal (CPR): 0.189ns Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.071ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk rise edge) 0.000 0.000 r 0.000 0.000 r clk (IN) net (fo=107, unset) 2.047 2.047 mult2_51/mult/CLK DSP48E2_X12Y73 DSP_OUTPUT r mult2_51/mult/DSP_OUTPUT_INST/CLK ------------------------------------------------------------------- ------------------- DSP48E2_X12Y73 DSP_OUTPUT (Prop_DSP_OUTPUT_DSP48E2_CLK_P[41]) 0.207 2.254 r mult2_51/mult/DSP_OUTPUT_INST/P[41] net (fo=1, routed) 0.370 2.624 mult_34/mult/A[9] DSP48E2_X12Y72 DSP_A_B_DATA r mult_34/mult/DSP_A_B_DATA_INST/A[9] ------------------------------------------------------------------- ------------------- (clock clk rise edge) 1.291 1.291 r 0.000 1.291 r clk (IN) net (fo=107, unset) 1.763 3.054 mult_34/mult/CLK DSP48E2_X12Y72 DSP_A_B_DATA r mult_34/mult/DSP_A_B_DATA_INST/CLK clock pessimism 0.189 3.243 clock uncertainty -0.035 3.207 DSP48E2_X12Y72 DSP_A_B_DATA (Setup_DSP_A_B_DATA_DSP48E2_CLK_A[9]) -0.273 2.934 mult_34/mult/DSP_A_B_DATA_INST ------------------------------------------------------------------- required time 2.934 arrival time -2.624 ------------------------------------------------------------------- slack 0.310 By default, the clock constraint in the polynomial design is set to 775MHz, or the highest specification of the DSP in speed grade 2 UltraScale+ devices. As can be seen above, this circuit has been placed and routed successfully and has margin to spare to run at this frequency. Of course, as polynomials grow larger, this frequency may be impacted, but it strives to run at the spec of the device. We can repeat this process for a more complex polynomial in the next step--keep your Vivado instance open so we can reload the next iteration more quickly. 3. More Complex Polynomial and Inspection with the RapidWright Hand Placer ---------------------------------------------------------------------------- For the next step, let's consider a more complex polynomial: .. math:: 8y^4+43yx^3+7x^2-14 .. code-block:: bash rapidwright PolynomialGenerator 8*y^4+43*y*x^3+7*x^2-14 18 --hand-placer This will generate a multi-variable polynomial with inputs ``x`` and ``y`` and before the design is routed, will invoke the RapidWright hand placer that will allow the placement of the polynomial to be inspected by the user. After running the command, a window should pop up that looks similar to this: .. image:: images/polynomial2.png :width: 550px :align: center This is a simplified device model view in RapidWright of the targeted device with the operator moduled overlayed in green and orange. The user can use a mouse scroll wheel up to zoom in (or ``CTRL`` + ``-``) and down to zoom out (or ``CTRL`` + ``=``). Alternatively, there are toolbar buttons to control zoom, or zoom to a selected module (which can be selected on the right window pane list). .. image:: images/polynomial3.png :width: 550px :align: center After zooming in, try selecting one of the module instances by moving the mouse over one of the shapes, click and hold the left mouse button and move the block around the fabric as shown in the animation below: .. image:: images/polynomial4.* :width: 550px :align: center Notice that the color of the block changes color based on what area of the fabric is located. Green means a valid placement location, red is invalid and orange is valid although its footprint overlaps with another module. Also notice that when a module instance is being drag selected, it has translucent lines to other module instances. The thickness of these lines is determined by the number of net connections between those two module instances. In this fashion, the modules can be placed or re-placed by hand. Try moving the ``add18_0`` block away from the rest of the module instances onto another valid location (where the block turns green) as shown in the image below: .. image:: images/polynomial5.png :width: 550px :align: center If you make a mistake, the hand placer also has an Undo/Redo stack so ``CTRL`` + ``Z`` will undo the last movement and ``CTRL`` + ``SHIFT`` + ``Z`` will redo a movement. When completed, the window can be closed and the new placement is automatically applied. Once the window is closed, the ``PolynomialGenerator`` will automatically resume and generate the ``polynomial.dcp``. We can then use ``refresh_design`` in Vivado's Tcl prompt to re-load the DCP: .. code-block:: refresh_design set i 0; foreach c [get_cells] { incr i; highlight_objects -leaf_cells $c -color_index $i } After we zoom in, we should see a very similar floorplan layout to the one chosen interactively in the hand placer: .. image:: images/polynomial6.png :width: 550px :align: center Notice in the snapshot above, the ``add18_0`` module instance is in the top right of the screen, in step with where we placed it in the hand placer. Again, we can repeat the Vivado Tcl commands ``report_route_status`` and ``report_timing`` to validate the result. Although we do not replicate the output here, the design should be valid and meet timing as in step 2. At this point you are invited to try different polynomials of your own and try making your own placements in the hand placer to explore the several possibilities available to you in this proof-of-concept.