Check clock gating

A_clock_gating_check

Figure 1  A clock gating check

A clock gating check occurs when a gating signal can control the path of a clock signal at a logic cell. An example is shown in Figure 1. The pin of logic cell connected to clock is called clock pin and pin where gating signal is connected to is gating pin. Logic cell where clock gating occurs is also referred to as gating cell.

One condition for a clock gating check is that clock that goes through cell must be used as a clock downstream. Downstream clock usage can be either as a FF clock or it can fanout to an output port or as generated clock that refers to output of gating cell as its master. If clock is not used as a clock after gating cell, then no clock gating check is inferred.

Another condition for clock gating check applies to gating signal. The signal at gating pin of check should not be a clock or if it was a clock, it should not be used as a clock downstream.

In a general scenario, clock signal and gating signal do not need to be connected to a single logic cell such as and, or or, but may be inputs to an arbitrary logic block. In such cases, for a clock gating check to be inferred, clock pin of the check must fan out to a common output pin.

There are two types of clock gating checks inferred:

  • Active-high clock gating check: Occurs when gating cell has an and or a nand function.
  • Active-low clock gating check: Occurs when gating cell has an or or a nor function.

Active-high and active-low refer to logic state of gating signal which activates clock signal at output of gating cell. If gating cell is a complex function where gating relationship is not obvious, such as a multiplexer or an xor cell,  STA output will typically provide a warning that no clock gating check is being inferred. But this can be changed by specifying a clock gating relationship for gating cell explicitly by using command set_clock_gating_check. In such cases, if set_clock_gating_check specification disagrees with functionality of gating cell, STA will normally provide a warning.

As specified earlier, a clock can be a gating signal only if it is not used as a clock downstream. Consider example in Figure 2. CLKB is not used as a clock downstream due to definition of generated clock of CLKA – path of CLKB is blocked by generated clock definition. Hence a clock gating check for clock CLKA is inferred for and cell.

Gating_check_inferred

Figure 2  Gating check inferred – clock at gating pin not used as a clock downstream

Active-High Clock Gating

We examine timing relationship of an active-high clock gating check now. This occurs at an and or a nand cell; an example using and is shown in Figure 3. Pin B of gating cell is clock signal, and pin A of gating cell is gating signal.

Let us assume that both clocks CLKA and CLKB have the same waveform.

create_clock -name CLKA -period 10 
-waveform {0 5} [get_ports CLKA]
create_clock -name CLKB -period 10 
-waveform {0 5} [get_ports CLKB]

Active_high_clock_gating_using_an_AND_cell

Figure 3  Active high clock gating using an AND cell

Because it is an and cell, a high on gating signal UAND0/A opens up gating cell and allows clock to propagate through. Clock gating check is intended to validate that gating pin transition does not create an active edge for fanout clock. For positive edge-triggered logic, this implies that rising edge of gating signal occurs during inactive period of clock (when it is low). Similarly, for negative edge-triggered logic, falling edge of gating signal should occur only when clock is low. Note that if clocks drives both positive and negative edge-triggered FF, any transition of gating signal (rising or falling edge) must occur only when clock is low. Figure 4 shows an example of a gating signal transition during active edge which needs to be delayed to pass clock gating check.

Gating_signal_needs_to_be_delayed

Figure 4  Gating signal needs to be delayed

The active-high clock gating setup check requires that gating signal changes before clock goes high. Here is setup path report.

Clock_gating_setup_check_path_report_0

Clock_gating_setup_check_path_report_1

Notice that Endpoint indicates that it is a clock gating check. In addition, path is in clock_gating_default group of paths as specified in Path Group. Check validates that gating signal changes before next rising edge of clock CLKB at 10ns.

Active-high clock gating hold check requires that gating signal changes only after falling edge of clock. Here is hold path report.

Clock_gating_hold_check_path_report

Hold gating check fails because gating signal is changing too fast, before falling edge of CLKB at 5ns. If a 5ns delay was added between UDFF0/Q and UAND0/A1 pins, both setup and hold gating checks would pass validating that gating signal changes only in the specified window.

One can see that hold time requirement is quite large. This is caused by fact that sense of gating signal and FF being gated are the same. This can be resolved by using a different type of launch FF, a negative edge-triggered FF to generate gating signal.

Gating_signal_clocked_on_falling_edge

Figure 5  Gating signal clocked on falling edge

In Figure 5, FF UFF0 is controlled by negative edge of clock CLKA. Safe clock gating impies that the output of FF UFF0 must change during inactive part of gating clock, which is between 5ns and 10ns.

The signal waveform corresponding to schematic in Figure 5 are depicted in Figure 6. Here is clock gating setup report.

 Gating_signal_generated_from_negative_edge_FF_meets_gating_checksFigure 6  Gating signal generated from negative edge FF meets gating checks

Clock_gating_setup_check_path_report_CKN_0

Clock_gating_setup_check_path_report_CKN_1

Clock_gating_setup_check_path_report_CKN_2

Here is clock gating hold report. Notice that hold time check is much easier to meet with new design.

Clock_gating_hold_check_path_report_CKN

Since clock edge (negative edge) that launches gating signal is opposite of clock being gated (active-high), setup and hold requirements are easy to meet. This is the most common structure used for gated clocks.

Active-Low Clock Gating

Figure 7 shows an example of an active-low clock gating check.

Active_low_clock_gating_check

Figure 7  Active-low clock gating check

create_clock -name MCLK -period 8 
-waveform {0 4} [get_ports MCLK]
create_clock -name SCLK -period 8 
-waveform {0 4} [get_ports SCLK]

Active-low clock gating check validates that rising edge of gating signal arrives at active portion of clock (when it is high) for positive edge-triggered logic. As described previously, the key is that gating signal should not cause an active edge for output gated clock. When gating signal is high, clock cannot go through. Thus gating signal should switch only when clock is high as illustrated in Figure 8.

Here is active-low clock gating setup timing report. This check ensures that gating signal arrives before clock edge becomes inactive, in this case, at 4ns.

Gating_signal_changes_when_clock_is_highFigure 8  Gating signal changes when clock is high

Clock_gating_setup_check_path_report_active_low_0

Clock_gating_setup_check_path_report_active_low_1

Here is clock gating hold timing report. This check ensures that gating signal changes only after rising edge of clock signal, which in this case is at 0ns.

Clock_gating_hold_check_path_report_active_low

Clock Gating with a Multiplexer

Figure 9 shows an example of clock gating using a multiplexer cell. A clock gating check at multiplexer inputs ensures that multiplexer select signal arrives at right time to clearly switch between MCLK and TCLK. For example, we are interested in switching to and from MCLK and assume that TCLK is low when select signal switches. This implies that select signal of multiplexer should switch only when MCLK is low. This similar to active-high clock gating check.

Clock_gating_using_a_mux

Figure 9  Clock gating using a multiplexer

Gating_signal_arrives_when_clock_is_low

Figure 10  Gating signal arrives when clock is low

Figure 10 shows timing relationships. The select signal for multiplexer must arrive at time MCLK is low. Also, assume TCLK will be low when select changes.

Since gating cell is a multiplexer, clock gating check is not inferred automatically, as evidenced in this message reported during STA.

Warning: No clock-gating check is inferred for clock MCLK at pins UMUX0/S and UMUX0/I0 of cell UMUX0. Warning: No clock-gating check is inferred for clock TCLK at pins UMUX0/S and UMUX0/I1 of cell UMUX0.

But a clock gating check can be explicitly forced by providing a set_clock_gating_check specification.

set_clock_gating_check -high 
[get_cells UMUX0] 
# The -high option indicates an active-high check. 
set_disable_clock_gating_check UMUX0/I1

The disable check turns off clock gating check on specific pin, as we are not concerned with this pin. Clock gating check on multiplexer has been specified to be an active-high clock gating check.

set_clock_gating_check_constraint_example

Here is setup timing path report.

Clock_gating_setup_check_path_report_mux_0

Clock_gating_setup_check_path_report_mux_1

The clock gating hold timing report is next.

Clock_gating_hold_check_path_report_mux_0

Clock_gating_hold_check_path_report_mux_1

Crosstalk delay on timing verificaiton

crosstalk_for_data_path_and_clock_path

Figure 1  Crosstalk in data and clock paths

1.  Setup analysis

  • Launch clock path sees positive crosstalk delay so that data is launched late.
  • Data path sees positive crosstalk delay so that it takes longer for data to reach destination (D pin in capture FF).
  • Capture clock path sees negative crosstalk delay so that data is captured by capture FF early.

Since launch and capture clock edges for a setup check are different (normally one clock cycle apart), common clock path can have different crosstalk contributions for launch and capture clock edges.

2.  Hold analysis

There is one important difference between hold and setup analyses related to crosstalk on common portion of clock path (launch and capture). Launch and capture clock edge are normally the same edge for hold analysis. Clock edge through common clock portion cannot have different crosstalk contributions for launch clock path and capture clock path. Therefore, the worst-case hold analysis removes crosstalk contribution from common clock path.
Setup analysis concerns two different edges of clock which may be impacted differently in time. Thus, common path crosstalk contributions are considered for both launch and capture paths during setup analysis.
  • Launch clock (not including common path) sees negative crosstalk delay so that data is launched early.
  • Data path sees negative cross talk delay so that it reaches destination early (D pin in capture FF).
  • Capture clock (not including common path) sees positive crosstalk delay so that data is captured by capture FF late.

 

 

Configure STA environment

  1. What’s STA environment?
  2. Specifying Clocks. Clock uncertainty and Clock latency
  3. Generated clocks
  4. Input paths constraint
  5. Output paths constraint 
  6. Timing path groups
  7. External attributes modeling 
  8. Check design rules
  9. Refine timing analysis
  10. Point-to-point specification

Set up environment for static timing analysis. Specification of correct constraints is important in analyzing STA results. Design environment should be specified accurately so that STA analysis can identify all the timing issues in the design. Preparing for STA, setting up clocks, specifying IO timing characteristics, and specifying false paths and multicycle paths.

1.  What’s STA environment?

A_synchronous_design

Figure 1  A synchronous design

Most digital designs are synchronous where the data computed from previous clock cycle is latched in the flip-flops at the active clock edge. Consider a typical synchronous design shown in Figure 1. It is assumed that Design Under Analysis (DUA) interacts with other synchronous designs. This means that DUA receives data from a clocked flip-flop and outputs data to another clocked flip-flop external to DUA.

To perform STA on this design, one needs to specify clocks to the flip-flops, and timing constraints for all path leading into the design and for all paths exiting the design.

Example in Figure 1 assumes that there is only one clock and C1, C2, C3, C4, and C5 represent combination blocks. The combination blocks C1 and C5 are outside of the design being analyzed.

In a typical design, there can be multiple clocks with many paths from one clock domain to another. The following sections describe how the environment is specified in such scenarios.

2.  Specifying Clocks

To define a clock, we need to provide the following information:

i. Clock source: it can be a port of design, or be a pin of a cell inside design (typically that is a part of a clock generation logic).

ii. Period: time period of clock.

iii. Duty cycle: high duration (positive phase) and low duration (negative phase).

iv. Edge times: times for rising edge and falling edge.

A_clock_definition

Figure 2  A clock definition

Figure 2  shows basic definitions. By defining clocks, all the internal timing paths (all flip-flop to flip-flop paths) are constrained; this implies that all internal paths can be analyzed with just the clock specifications. The clock specification specifies that a flip-flop to flip-flop path must take one cycle. We shall later describe how this requirement (of one cycle timing) can be relaxed.

Here is a basic clock specification.

create_clock 
-name SYSCLK 
-period 20 
-waveform { 0 5 } 
[get_ports SCLK]

The name of the clock is SYSCLK and is defined at the port SCLK. The period of SYSCLK is specified as 20 units – the default time unit is nanoseconds if none has been specified. (In general, time unit is specified as part of technology library.) The first argument in waveform specifies time at which rising edge occurs and the second argument specifies time at which falling edge occurs.

There can be any number of edges specified in a waveform option, however, all edges must be within one period. The edge times alternate starting from the first rising edge after time zero, then a falling edge, then a rising edge, and so on. This implies that all time values in the edge list must be monotonically increasing.

-waveform {time_rise time_fall time_rise time_fall ... }

In addition, there must be an even number of edges specified. The waveform option specifies waveform within one clock period, which then repeats itself.

If no waveform option is specified, default is:

-waveform { 0 , period/2 }

Here is an example of a clock specification with no waveform specification.

create_clock -period 5 [ get_ports SCAN_CLK ]

In this specification, since no -name option is specified, the name of clock is the same as the name of the port, which is SCAN_CLK.

Clock_specification_example

Figure 3  Clock specification example

Here is another example of a clock specification in which the edges of the waveform are in the middle of a period.

create_clock -name BDYCLK -period 15 
-waveform { 5 12 } [get_ports GBLCLK]

Clock_specification_with_arbitray_edges

Figure 4  Clock specification with arbitrary edges

The name of the clock is BDYCLK and it is defined at the port GBLCLK. In practice, it is a good idea to keep the clock name the same as the port name.

Here are some more clock specifications.

# See Figure 5a:
create_clock -period 10 -waveform { 5 10 } [get_ports FCLK]
# Creates a clock with the rising edge at 5ns and the falling edge at 10ns.

# See Figure 5b:
create_clock -period 125 
-waveform { 100 150 } [get_ports ARMCLK]
# Since the first edge has to be rising edge, 
# the edge at 100ns is specified first and then the falling
# edge at 150ns is specified. The falling edge at 25ns is 
# automatically inferred.

Example_clock_waveforms

Figure 5  Example clock waveform

# See Figure 6a:
create_clock -period 1.0 -waveform { 0.5 1.375 } MAIN_CLK
# The first rising edge and the next falling edge is 
# specified. Falling edge at 0.375ns is inferred 
# automatically.

# See Figure 6b:
create_clock -period 1.2 -waveform { 0.3 0.4 0.8 1.0 } JTAG_CLK
# Indicates a rising edge at 300ps, a falling edge at 400ps
# a rising edge at 800ps and a falling edge at 1ns, this
# pattern is repeated every 1.2ns.

Example_with_general_clock_waveforms

Figure 6 Example with general clock waveform

2.1  Clock uncertainty

The timing uncertainty of a clock period can be specified using the set_clock_uncertainty specification. The uncertainty can be used to model various factors that can reduce the effective clock period. These factors can be the clock jitter and any other pessimism that one may want to include for timing analysis.

set_clock_uncertainty -setup 0.2 [get_clocks CLK_CONFIG]
set_clock_uncertainty -hold 0.05 [get_clocks CLK_CONFIG]

Note that clock uncertainty for setup effectively reduces available clock period by specified amount as illustrated in Figure 7. For hold checks, clock uncertainty for hold is used as an additional timing margin that needs to be satisfied.

Specifying_clock_uncertainty

Figure 7  Specifying clock uncertainty

The following commands specify uncertainty to be used on paths crossing specified clock boundaries, called inter-clock uncertainty.

set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-hold 0.05
set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-setup 0.3
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -hold 0.05
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -setup 0.1

 Figure 8 shows a path between two different clock domains, SYS_CLK and CFG_CLK. Based on the inter-clock uncertainty specifications above, 100ps is used as an uncertainty for setup checks and 50ps is used as an uncertainty for hold checks.

Inter-clock_path

Figure 8  Inter-clock paths

2.2  Clock latency

Latency of a clock can be specified using the set_clock_latency command.

# Rise clock latency on MAIN_CLK is 1.8ns:
set_clock_latency 1.8 -rise [get_clocks MAIN_CLK]
# Fall clock latency on all clocks is 2.1ns:
set_clock_latency 2.1 -fall [all_clocks]
# The -rise, -fall refer to the edge at the clock pin of a # flip-flop.

There are two types of clock latency: network latency and source latency. Network latency is the delay from clock definition point (create_clock) to clock pin of a flip-flop. Source latency, also called insertion delay, is the delay from clock source to clock definition point. Source latency could represent either on-chip or off-chip latency. Figure 9 shows both the scenarios. The total clock latency at the clock pin of a flip-flop is the sum of source and network latency.

Here are some example commands that specify source and network latency.

# Specify a network latency (no -source option) of 0.8ns 
# for rise, fall, max and min:
set_clock_latency 0.8 [get_clocks CLK_CONFIG] 
# Specify a source latency:
set_clock_latency 1.9 -source [get_clocks SYS_CLK]
# Specify a min source latency:
set_clock_latency 0.851 -source -min [get_clocks CFG_CLK]
# Specify a max source latency:
set_clock_latency 1.322 -source -max [get_clocks CFG_CLK]

Two_type_clock_latency

Figure 9 Clock latency

3.  Generated clocks

A generated clock is a clock derived from a master clock. A master clock is a clock defined using the create_clock specification.

When a new clock is generated in a design that is based on a master clock, the new clock can be defined as a generated clock. For example, if there is a divide-by-3 circuitry for a clock, one would define a generated clock definition at the output of this circuitry. This definition is needed as STA does not know that the clock period has changed at the output of the divide-by logic, and more importantly what the new clock period is. Figure 10 shows an example of a generated clock which is a divide-by-2 of the master clock, CLKP.

create_clock -name CLKP 10 [get_pins UPLL0/CLKOUT]
# Create a master clock with name CLKP of period 10ns
# with 50% duty cycle at the CLKOUT pin of the PLL.
create_generated_clock -name CLKPDIV2 -source UPLL0/CLKOUT -divide_by 2 [get_pins UFF0/Q]
# Creates a generated clock with name CLKPDIV2 at the Q
# pin of flip-flop UFF0. The master clock is at the CLKOUT 
# pin of PLL. Period of generated clock is double that of 
# clock CLKP, that is, 20ns.

Generated_clock_at_output_of_divider

Figure 10  Generated clock at output of divider

Can a new clock (a master clock) be defined at the output of flip-flop instead of a generated clock? The answer is yes, however, there are some disadvantages. Defining a master clock instead of a generated clock creates a new clock domain. This is not a problem in general except that there are more clock domains to deal with in setting up the constraints for STA. Defining the new clock as a generated clock does not create a new clock domain, and generated clock is considered to be in phase with its master clock. The generated clock does not require additional constraints to be developed. Thus, one must attempt to define a new internally generated clock as a generated clock instead of set it as another master clock.

Another important difference between a master clock and a generated clock is the notion of clock origin. In a master clock, the origin of the clock is at the point of definition of the master clock. In a generated clock, the clock origin is that of the master clock and not that of the generated clock. This implies that in a clock path report, the start point of a clock path is always the master clock definition point. This is a big advantage of a generated clock over defining a new master clock as the source latency is not automatically included for the case of a new master clock.

Figure 11 shows an example where the clock SYS_CLK is gated by the output of a flip-flop. Since the output of the flip-flop may not be a constant, one way to handle this situation is to define a generated clock at the output of the and cell which is identical to the input clock.

 Clock_gated_by_a_flip-flop

Clock_gated_by_a_flip-flop_update

Figure 11  Clock gated by a flip-flop*

* It might be CKN in left FF, or it would not meet clock gating hold requirement, details and explain in Check clock gating

create_clock 0.1 [get_ports SYS_CLK]
# Create a master clock of period 100ps with 50% duty 
# cycle.
create_generated_clock -name CORE_CLK -divide_by 1 
-source SYS_CLK [get_pins UAND1/Z]
# Create a generated clock called CORE_CLK at the output of
# the AND cell and the clock waveform is the same as that
# of the master clock.

 Master_clock_and_multiply-by-2_generated_clockFigure 12  Master clock and multiply-by-2 generated clock

create_clock -period 10 -waveform { 0 5 } [get_ports PCLK]
# Create a master clock with name PCLK of period 10ns
# with rise edge at 0ns and fall edge at 5ns.
create_generated_clock -name PCLKx2 
-source [get_ports PCLK] 
-multiply_by 2 [get_pins UCLKMULTREG/Q]
# Creates a generated clock called PCLKx2 from the master 
# clock PCLK and the frequency is double that of the master
# clock. The generated clock is defined at the output of 
# the flip-flop UCLKMULTREG.

Note that -multiply_by and -divide_by options refer to frequency of clock, even though a clock period is specified in a master clock definition.

 Clock_generationFigure 13  Clock generation

Figure 13 shows an example of generated clocks. A divide-by-2 clock in addition to out-of-phase clocks are generated. The waveform for clocks are also shown in figure.

create_clock 2 [get_ports DCLK]
# Name of clock is DCLK, has period of 2ns with a rise edge
# at 0ns and a fall edge at 1ns.
create_generated_clock -name DCLKDIV2 -edges {2 4 6}
-source DCLK [get_pins UBUF2/Z]
create_generated_clock -name PH0CLK -edges {3 4 7} 
-source DCLK [get_pins UAND0/Z]
create_generated_clock -name PH1CLK -edges {1 2 5} 
-source DCLK [get_pins UAND1/Z]

 Clock Latency for Generated Clocks

Latency_on_generated_clock

Figure 14  Latency on generated clock

A generated clock can have another generated clock as its source, that is, one can have generated clocks of generated clocks, and so on, however, a generated clock can have only one master clock.

Typical Clock Generation Scenario

Clock_distribution_in_a_tyical_ASIC

Figure 15  Clock distribution in a typical ASIC

Figure 15 shows a scenario of how a clock distribution may appear in a typical ASIC. The oscillator is external to the chip and produces a low frequency (10-50 MHz typical) clock which is used as a reference clock by on-chip PLL to generate a high-frequency low-jitter clock (200-800 MHz typical). This PLL clock is then fed to a clock divider logic that generates required clocks for ASIC.

On some of the branches of the clock distribution, there may be clock gates that are used to turn off the clock to an inactive portion of design to save power when necessary. PLL can also have a multiplexer at its output so that the PLL can be bypassed if necessary. A master clock is defined for the reference clock at the input pin of chip where it enters the design, and a second master clock is defined at the output of PLL. PLL output clock has no phase relationship with reference clock. Therefore, output clock should not be a generated clock of reference clock. Most likely, all clocks generated by the clock divider logic are specified as generated clocks of the master clock at PLL output.

4.  Input paths constraint

STA cannot check any timing on a path that is not constrained. Thus, all paths should be constrained to enable their analysis.

Input_port_timing_path

Figure 16  Input port timing path

Figure 16 shows an input path of Design Under Analysis (DUA). Flip-flop UFF0 is external to DUA and provides data to flip-flop UFF1 which is internal to DUA. Data is connected through input port INP1.

set Tclk2q 0.9 
set Tc1    0.6
set_input_delay -clock CLKA -max [expr Tclk2q + Tc1]
[get_ports INP1]

The constraint specifies that external delay on input INP1 is 1.5ns and this is with respect to clock CLKA. (in fact, input_delay equals to one part of data_path delay). Assuming clock period for CLKA is 2ns, then logic for INP1 pin has only 500ps (=2ns – 1.5ns) available for propagating internally in DUA. Tc2 + Tsetup <= 500ps for flip-flop UFF1 to reliably capture data launched by flip-flop UFF0.

5.  Output paths constraint

Example A

Output_port_timing_path_a

Figure 17  Output timing path

set Tc2  3.9
set Tsetup 1.1
set_output_delay -clock CLKQ -max [expr Tc2 + Tsetup] 
[get_ports OUTB]

Example B

Output_port_timing_path_b_max_min_delays

Figure 18  Output timing path Max Min delays

Tc2max + Tsetup = 7ns + 0.4ns = 7.4ns

Tc2min – Thold = 0 – 0.2ns = 0.2ns

create_clock -period 20 -waveform {0 15} [get_ports CLKQ]
set_output_delay -clock CLKQ -min -0.2 [get_ports OUTC]
set_output_delay -clock CLKQ -max 7.4 [get_ports OUTC]

Example C

Input_output_timing_path

Figure 19  Input and output timing path

create_clock -period 100 -waveform {5 55} [get_ports MCLK]
set_input_delay 25 -max -clock MCLK [get_ports DATAIN]
set_input_delay 5 -min -clock MCLK [get_ports DATAIN]
set_output_delay 20 -max -clock MCLK [get_ports DATAOUT]
set_output_delay -5 -min -clock MCLK [get_ports DATAOUT]

6.  Timing path groups

 Timing_paths

Figure 20  Timing paths

Path_groupsFigure 21  Path groups

Timing paths in a design can be considered as a collection of paths. Each path has a startpoint and an endpoint.

In STA, paths are timed based on valid startpoints and valid endpoints. Valid startpoints are: input ports and clock pins of synchronous device, such as flip-flops and memories. Valid endpoints are output ports and data input pins of synchronous devices. Thus, a valid timing path can be:

i.  an input port —> an output port,

A —> Z

ii.  an input port —> a data input pin of a flip-flop (FF) or a memory,

A —> UFFA/D

iii.  a clock pin of FF —> a data input of FF,

UFFA/CLK —> UFFB/D

iv.  a clock pin of FF —> an output port,

UFFB/CLK —> Z

Timing paths are sorted into path groups by the clock associated with endpoint of the path. Thus, each clock has a set of paths associated with it. There is also a default path group that includes all non-clocked (asynchronous) paths.

  • CLKA group: A —> UFFA/D.
  • CLKB group: UFFA/CK —> UFFB/D.
  • DEFAULT group: A —> Z, UFFB/CK —> Z.

7.  External attributes modeling 

While create_clock, set_input_delay and set_output_delay are enough to constrain all paths in a design for performing timing analysis, these are not enough to obtain accurate timing for IO pins of block. The following attributes are required to accurately model environment of a design also. For inputs, one needs to specify slew at input. This information can be provided using:

  • set_driving_cell
  • set_input_transition

For outputs, one need to specify capacity load seen by output. This is specified by using following specification:

  • set_load

set_input_transition_specification_representation

Figure 22  set_input_transition specification representation

set_input_transition 0.85 [get_ports INPC]
# Specifies an input transition of 850ps on port INPC.

set_load_specification_representation

Figure 23  Capacity load on output port

set_load 5 [get_ports OUTX]
# Place a 5pF load on output port OUTX

The set_load specification can be used for specifying a load on an internal net in design.

set_load 0.25 [get_nets UCNT5/NET6]
# Set net capacitance to be 0.25pF.

8.  Check design rules

Two of frequently used design rules for STA are max transition and max capacitance. These rules check all ports and pins in design meet specified limits for transition time and capacitance.

  • set_max_transition
  • set_max_capacitance

 9.  Refine timing analysis

 Four common commands that are used to constrain analysis are:

i.  set_case_analysis: Specify constant value on a pin of a cell, or on an input port.

ii.  set_disable_timing: Break a timing arc of a cell.

iii.  set_false_path: Specify paths that are not real which implies that these paths are not checked in STA.

iv.  set_multicycle_path: Specify paths that can take longer than one clock cycle.

9.1  Specify inactive signals

In a design, certain signals have a constant value in a specific mode of chip. For example, if a chip has DFT logic in it, then Scan pin of chip should be at 0 in normal functional mode.

set_case_analysis_0_scan_for_functional_mode

9.2  Break timing arcs in cells

Apply set_disable_timing to break timing arcs, for example, timing arcs in delay element is not real timing path in DDR PHY dataslice level STA.

set_disable_timing_dll_delay_element_in_dataslice_simple

Note, One should caution when apply set_disable_timing as it removes all timing paths through specified pins. Where possilbe, it is preferable to apply set_false_path and set_case_analysis commands.

In fact, set_false_path is available for replacing set_disable_timing in some situation. For example, set_false_path during delay_element hardening, so it is no need to set_disable_timing in data_slice level after set_false_path in delay_element hardening.

set_false_path_in_dll_delay_element_simple

9.3  Multicycle paths

In some case, data path between two flip-flops might take more than one clock cycle to propagate through logic. In such cases, this combination data path is declared as a multicycle path. Even though data is captured by capture FF on every clock edge, we direct STA that relevant capture edge occurs after specified number of clock cycles.

A_three-cycle_multicycle_path

Figure 24  A three-cycle multicycle path

Figure 24 shows an example, since data path takes 3 clock cycles, a setup multicycle check of 3 cycles should be specified. Multicycle setup constraints specified are given below.

create_clock -name CLKM -period 10 [get_ports CLKM] 
set_multicycle_path 3 -setup 
-from [get_pins UFF0/Q]  
-to [get_pins UFF1/D]

A hold multicycle check should be checked as it was in a single cycle setup case, which is the one shown in Figure 24.  It ensures that data is free to change anytime between 3 cycles. In absence of such a hold multicycle specification, default hold check is done on active edge prior to setup capture edge which is not intent. We need to move hold check 2 cycles prior to default hold check edge and hence a hold multicycle of 2 is specified. The intended behavior is shown in Figure 25.

set_multicycle_path 2 -hold 
-from [get_pins UFF0/Q] 
-to [get_pins UFF1/D]

Hold_check_moved_back_to_launch_edge

Figure 25  Hold check moved back to launch edge

The number of cycles denoted on a multicycle hold specifies how many clock cycles to move back from its default hold check edge which is one active edge prior to setup capture edge.

In most designs, if max path (or setup) requires N clock cycles, it is not feasible to achieve min path constraint to greater than (N-1) clock cycles.

Thus, in most designs, a multicycle setup specified as N cycles should be accompanied by a multicycle hold constraint specified as N-1 cycles.

10.  Point-to-point specification

set_min_delay

set_max_delay

###########################################
### clk --> read_mem_dqs
###########################################
set_max_delay [expr ($PHY_THREEQUARTER - $skew_clk_to_read_mem_dqs_max)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]
set_min_delay [expr ($PHY_THREEQUARTER - $PHY_CLK_PERIOD + $skew_clk_to_read_mem_dqs_min)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]

Does delay in set_max/min_delay refer to source clock latency vs target clock latency skew? or data path delay?

Path delay in cross clock domain

Sometimes, for cross clock domain timing analysis, incorrect timing report from mistake Path Delay due to big source clock paths latency/skew or target paths latency/skew would lead tool report and fix timing violation wrongly. Designer analysis launch clock paths latency/skew and capture clock paths latency/skew, find out which path(s) is/are too long, for example, it might be a path in capture clock path group. One might contact with team members about STA constraint to check if the constraint is correct or not.

same source create_generated_clock -add

#Constraint

set CLK_PHASE_0_SRC "dummy_clk4x"
set EDGES {1 3 5}
set CLK_PHY_PORT "clk4x"
set PERIOD [expr 0.833 * $TOOL_TIME_SCALE * $LIB_TIME_SCALE]
set PHY_CLK_PERIOD $PERIOD
set PHY_HALF [expr 0.5 * $PHY_CLK_PERIOD]
set PHY_QUARTER [expr 0.25 * $PHY_CLK_PERIOD]
set CTLR_CLK_PERIOD [expr 2 * $PHY_CLK_PERIOD]
set PHY_DDL_CLK_PERIOD [expr 0.2 * $TOOL_TIME_SCALE * $LIB_TIME_SCALE]
create_clock [get_ports $CLK_PHY_PORT ] -name dummy_clk4x -period $PHY_HALF -waveform "0 $PHY_QUARTER"
set clk_dqs_pin0 [cdn_get_pin inst_data_path_tb/inst_write_path_tb/inst_clk_wrdqs_base_delay_macro/inst_wrdqs_base_delay_line/inst_exit_inv/hic_dnt_inv/${NEG_OUTPUT} inst_data_path_tb/inst_write_path_tb/inst_clk_wrdqs_base_delay_macro/inst_wrdqs_base_delay_line/base_delay_out]
create_generated_clock -name clk_dqs_phase_3 -source $CLK_PHY_PORT 
 -edges $EDGES -edge_shift "$THREE_EIGHTH $THREE_EIGHTH $THREE_EIGHTH" -add 
 -master_clock [get_clocks $CLK_PHASE_0_SRC ] $clk_dqs_pin0
create_generated_clock -name clk_dqs_phase_7 -source $CLK_PHY_PORT 
 -edges $EDGES -edge_shift "$SEVEN_EIGHTH $SEVEN_EIGHTH $SEVEN_EIGHTH" -add 
 -master_clock [get_clocks $CLK_PHASE_0_SRC ] $clk_dqs_pin0

Some source 2 clk frequency 1.2GHz

#Command reference

Models multiple generated clocks on the same source when multiple clocks must fan into the source pin. Ideally, one generated clock must be specified for each clock that fans into the master pin. Specify this option with the -name and -master_clock options.

By default, the software creates one generated clock at the pin by using the fastest clock present on the source pin as the master clock. However, use the -add option to specify a different clock name for each generated clock when used with the -master_clock option. Subsequently, you can use this clock name for setting other constraints, such as the set_false_path command and the set_input_delay command.

Clock Tree Latency Skew Uncertainty

Clock to a SoC/chip is like blood to a dog body. If you want the pet smart and strong, hematological system would be healthy. Just the way blood flows to each and every part of the body and regulates metabolism, clock reaches each and every sequential device and controls the digital events inside the chip.

There are two phases in the design of a clock signal.

1st the clock is in “ideal mode” (e.g.: during RTL design, during synthesis and during placement). An “ideal” clock has no physical distribution tree, it just shows up magically on time at all the clock pins.

2nd phase comes when clock tree synthesis (CTS) inserts an actual tree of buffers into the design that carries the clock signal from the clock source pin to the (thousands/millions) of flip-flops that need to get it. CTS is done after placement and before routing. After CTS is finished, the clock is said to be in “propagated mode”.

What is clock latency?

Clock latency is an ideal mode term. It refers to the delay that is specified to exist between the source of the clock signal and the flip-flop clock pin. This is a delay specified by the user – not a real, measured thing. When the clock is actually created, then that same delay is now referred to as the “insertion delay”. Insertion delay (ID) is a real, measurable delay path through a tree of buffers. Sometimes the clock latency is interpreted as a desired target value for the insertion delay.

Clock latency is the time taken by the clock to reach the sink pin from the clock source. It is divided into two parts – Clock Source Latency and Clock Network Latency. Clock Source Latency defines the delay between the clock waveform origin point to the definition point. Clock Network Latency is the delay form the clock definition point to the sink pin. Clock Latency is also called Clock Insertion Delay. Please see the below 2 pictures to get a better understanding of what Clock Latency is.

Master_Clock

Generated_Clock

create_clock [get_ports clk_phy ] -name clk_phase_0 -period 1.874 -waveform "0 0.5*1.874"
set_clock_uncertainty -setup $SKEW_MAX [get_clocks clk_phase_0]
set_clock_uncertainty -hold $SKEW_MIN [get_clocks clk_phase_0]
create_generated_clock -name slice_clk_ctlr_phase_0 -divide_by 2 -add 
-source [get_ports clk_phy] 
-master [get_clock clk_phase_0] 
[get_pin inst_clk_div/inst_clk_div_dff/hic_dnt_div_flop/Q]
set_clock_uncertainty -setup $SKEW_MAX [get_clock slice_clk_ctlr_phase_0]
set_clock_uncertainty -hold $SKEW_MIN [get_clock slice_clk_ctlr_phase_0]
create_clock dqs_ipad[0] -name read_mem_dqs_phase_0 -period 1.784 -waveform "0 0.5*1.784"
set_clock_uncertainty -setup $SKEW_READ_DQS_MAX [get_clock read_mem_dqs_phase_0]
set_clock_uncertainty -hold $SKEW_READ_DQS_MIN [get_clock read_mem_dqs_phase_0]
create_generated_clock -name delayed_dqs_phase_1 -source dqs_ipad[0] 
 -edges {1 2 3} -edge_shift "0.25*1.874 0.25*1.874 0.25*1.874" 
 -add -master_clock [get_clocks read_mem_dqs_phase_0] dll/dll_delay_line_rd_dqs/delay_0/r1/hic_dnt_dll_nand2/A1

delayed_dqs_phase_1 is dqs_ipad[0] shift 0.25 clock cycle
Path 1: MET Setup Check with Pin dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/CP 

Endpoint: dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/D (v) checked with leading edge of 'clk_phase_0'

Beginpoint: dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/Q (v) triggered by leading edge of 'delayed_dqs_phase_1'

Path Groups: {reg2reg}
Other End Arrival Time 0.462
- Setup 0.099
+ Path Delay 0.901
+ Path Ideal Arrival 0.468
+ CPPR Adjustment 0.000
- Uncertainty 0.050
= Required Time 1.683
- Arrival Time 1.674
= Slack Time 0.009
 Clock Rise Edge 0.000
 = Beginpoint Arrival Time 0.000

 dqs_ipad[0] dqs_ipad[0] ^ 0.000 0.150 0.004 2 
...
 dfi_dqs_in/hic_dll_dqs_mod_dqs0/hic_dnt_dll_dqs_not/ZN
 I v -> ZN ^ 
 CKND8BWP12T35P140 0.037 0.218 0.000 0.016 0.007 1.039 
 dll/dll_delay_line_rd_dqs/delay_0/r1/hic_dnt_dll_nand2/A1
 CKND2D1BWP12T35P140
 0.001 0.687 0.000 0.016 0.007 2 1.000 delayed_dqs_phase_1 Adj. = 0.468
...
 dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/CP
 EDFQD2BWP12T35P140
 0.005 0.957 0.002 0.019 0.027 11 1.000 
 dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/Q
 CP ^ -> Q v EDFQD2BWP12T35P140
 0.112 1.069 0.000 0.012 0.002 1.077 
...
What is clock skew?

Clock Skew between two sink pins is the the difference in the clock latency between them. If the capture clock latency is more than the launch clock, then it is positive skew. This helps setup checks. If the capture clock latency is less than the launch clock, then it is negative skew. This helps hold checks. Ideal clock skew in a design is zero which is not achieveable. Clock tree is built to reduce the clock skew values.

What is clock uncertainty?

Clock uncertainty is the deviation of the actual arrival time of the clock edge with respect to the ideal arrival time. In ideal mode the clock signal can arrive at all clock pins simultaneously. But in fact, that perfection is not achievable. So, to anticipate the fact that the clock will arrive at different times at different clock pins, the “ideal mode” clock assumes a clock uncertainty. For example, a 1 ns clock with a 100 ps clock uncertainty means that the next clock tick will arrive in 1 ns plus or minus 50 ps.
A deeper question gets into why the clock does not always arrive exactly one clock period later. There are several possible reasons but here will list 3 major ones:
(a) The insertion delay to the launching flip-flop’s clock pin is different than the insertion delay to the capturing flip-flop’s clock pin (one paths through the clock tree can be longer than another path). This is called clock skew.
(b) The clock period is not constant. Some clock cycles are longer or shorter than others in a random fashion. This is called clock jitter which can be contributed from PLL or crystal osillator, cables, transmitters, receivers, internal circuitry of the PLL, thermal noise of the osillator etc. In the case of Pre CTS, since clock tree is not built, uncertainty = skew + jitter . Post CTS uncertainty = jitter .
(c) Even if the launching clock path and the capturing clock path are absolutely identical, their path delays can still be different because of on-chip variation (OCV). This is where the chip’s delay properties vary across the die due to process variations or temperature variations or other reasons. This essentially increases the clock skew.

CTS Spec UnsyncPin RootPin based on Constraint and Netlist

Turbodebug check Design netlist about clock timing path:

Clock_GlobalUnsyncPin_RootPin_CTS_spec_BASED_ON_turbodebug_Netlist_0Fig. 1 Design/inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux

#Constraint

create_clock -name clk_ddl_test_fdbk [get_pin inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/$NEG_OUTPUT ] -period $PHY_DDL_SCALED_CLK_PERIOD -waveform "0 $PHY_DDL_SCALED_HALF"

#CTS Spec file

#Excluded Output pin due to create_clock inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/ZN
GlobalUnsyncPin
+inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand0/hic_dnt_nand2/A1
+inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand1/hic_dnt_nand2/A1
#----------------------------------------------------
# Clock Name : clk_ddl_test_fdbk
#----------------------------------------------------
AutoCTSRootPin inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/ZN

Clock divider and CTS

Turbodebug check Design netlist about clk div timing path:

Clock_divider_clk_div_0Fig. 1 Design/inst_clk_div

Clock_divider_clk_div_0_inst_clk_div_mux

Clock_divider_clk_div_0_inst_clk_div_mux_0_inst_mux_nand2

Fig. 2, 3 Design/inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN

Clock_divider_clk_div_0_inst_clk_div_dff

Fig. 4 Design/inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/Q (constraint below create_generated_clock set it as RootPin, but design inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN as RootPin in CTS Spec file, as Fig. 4 Q pin (out_p) connects to Fig. 2 in0 actually)

#Constraint

create_clock [get_ports clk4x ] -name dummy_clk4x -period 0.5*0.833 -waveform "0 0.25*0.833"
create_generated_clock -name slice_clk_ctlr_phase_0 -edges {1 5 9} -add 
 -source [get_ports clk4x] 
 -master [get_clock dummy_clk4x] 
 [get_pin inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/Q]

#CTS Spec file

ClkGroup
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN
GlobalLeafPin
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand0/hic_dnt_nand2/A1
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand1/hic_dnt_nand2/A1
GlobalUnsyncPin
+ inst_clk_div/inst_clk_div_dffn/hic_dnt_out_reg/CPN
+ inst_clk_div/inst_clk_div0_dff/hic_dnt_out_reg/CP
+ inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/CP
AutoCTSRootPin clk4x
AutoCTSRootPin inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN

#As Fig. 4 Q pin (out_p) connects to Fig. 2 in0 actually, design inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN as RootPin, it is more clear to CTS.

design flow simple

Simplify a general design flow post-floorplan should be:

1st timing driven placement according to constraints, skew/latency was considered as ‘ideal’ zero, optDesign –preCTS.

2nd CTS, optDesign –postCTS. Clock tree have insertion or propagation delay after CTS.

3rd routing, optDesign –postRoute, optDesign –hold -postRoute. Usually, fix setup violation first, then hold violation in order to obtain positive slacks.

Clock Tree Synthesis

In clock tree synthesis, do ONE thing only, insert CLK INV (NOT CKBUFF !) which could fix rising and falling transition/duty, to min clock tree latency and skew, balance sink/leaf pins which should be balanced, don’t balance pins which should not be balanced.

CTS Macro Model

Let tool know the segment of clock path latency which from assertion pin to sink/leaf pin, balance sink/leaf pins considering this segment. For example, one clock from chip/top level to block level in hierarchy, clock path from the junction point of PHY_TOP and data_slice to reg/ck pin in data_slice.

In the picture below, clock root pin is A, the segment of clock path latency from point B to point C is CTS Macro Model delay.

ctsmdl

The value of Macro Model in CTS spec file below is 550ps.

In PHY_TOP CTS spec file:
MacroModel pin databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy 550ps 550ps 550ps 550ps 0fF

Tell tool that the latency from port of data_slice (block level) to reg/ck in data_slice is about 550ps. The report below is part of PHY_TOP STA. Full STA timing report: STA_report_PHY_TOP

(1.520 – 0.962) ÷ 1.000 = 0.558 (ns) = 558 (ps)

DDR PHY_TOP STA report:
Path 2: MET Setup Check with Pin databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/wr_l_reg/CKN 
Endpoint: databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/wr_l_reg/D (v) checked with trailing edge of 'clk_dqs_0_phase_0'
Beginpoint: databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/Q (v) triggered by leading edge of 'clk_phase_0'
Path Groups: {reg2reg}
Other End Arrival Time 2.068
- Setup 0.075
+ Phase Shift 0.000
+ CPPR Adjustment 0.097
- Uncertainty 0.105
= Required Time 1.985
- Arrival Time 1.980
= Slack Time 0.005
 Clock Fall Edge 1.250
 = Beginpoint Arrival Time 1.250
 -------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 Pin Arc Cell Delay Arrival Incr Slew Load Fanout User Generated Clock 
 Time Delay Derate Adjustment 
 -------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 clk_ctlr_sync clk_ctlr_sync v 1.250 0.200 0.003 1 
 clk_ctlr_sync_I_xIOx/I CLKBUFV8_12TR35 0.000 1.250 0.000 0.200 0.003 1 1.000 
 clk_ctlr_sync_I_xIOx/Z I v -> Z v CLKBUFV8_12TR35 0.125 1.375 0.000 0.047 0.019 1.000 
 clk_ctlr_sync_N_xIOx__L1_I0/I CLKINV12_12TR35 0.004 1.379 0.000 0.047 0.019 1 1.000 
 clk_ctlr_sync_N_xIOx__L1_I0/ZN I v -> ZN ^ CLKINV12_12TR35 0.037 1.416 0.000 0.024 0.019 1.000 
 clk_ctlr_sync_N_xIOx__L2_I0/I CLKINV12_12TR35 0.004 1.420 0.000 0.025 0.019 1 1.000 
 clk_ctlr_sync_N_xIOx__L2_I0/ZN I ^ -> ZN v CLKINV12_12TR35 0.027 1.447 0.000 0.029 0.017 1.000 
 clk_ctlr_sync_N_xIOx__L3_I0/I CLKINV12_12TR35 0.003 1.450 0.000 0.030 0.017 1 1.000 
 clk_ctlr_sync_N_xIOx__L3_I0/ZN I v -> ZN ^ CLKINV12_12TR35 0.021 1.471 0.000 0.010 0.005 1.000 
...
 databahn_dll_phy/clk_ctlr_sync clk_ctlr_sync v databahn_dll_phy 1.525 1.000 clk_ctlr_phase_0 Adj. = 0.000
...
 databahn_dll_phy/dll_phy_pll_clk_source/deskew_pll/FREF
 PLLSM28HKLVDESKEW 0.004 1.641 0.000 0.016 0.026 2 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/deskew_pll/FOUTP
 FREF v -> FOUTP ^ PLLSM28HKLVDESKEW 0.000 0.391 0.000 0.017 0.026 1.000 clk_phase_0 Adj. = -1.250
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_pll_clkgate/hic_dnt_pll_clkgate/CK
 CLKLANQV12_12TR35 0.001 0.392 0.000 0.034 0.026 2 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_pll_clkgate/hic_dnt_pll_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.052 0.445 0.000 0.012 0.004 1.000 
...
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_phybyp_clkgate/hic_dnt_pll_clkgate/CK
 CLKLANQV12_12TR35 0.000 0.474 0.000 0.009 0.004 1 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_phybyp_clkgate/hic_dnt_pll_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.042 0.516 0.000 0.031 0.026 1.000 
 databahn_dll_phy/dll_phy_lp_control/inst_hic_lp_clkgate_phy/hic_dnt_io_clkgate/CK
 CLKLANQV12_12TR35 0.001 0.517 0.000 0.031 0.026 2 1.000 
 databahn_dll_phy/dll_phy_lp_control/inst_hic_lp_clkgate_phy/hic_dnt_io_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.048 0.565 0.000 0.025 0.019 1.000 
...
 databahn_dll_phy/dll_phy_slice_core/FE_ECOC1_lp_clk_phy/I
 CLKINV8_12TR35 0.000 0.948 0.000 0.012 0.003 1 1.000 
 databahn_dll_phy/dll_phy_slice_core/FE_ECOC1_lp_clk_phy/ZN
 I v -> ZN ^ CLKINV8_12TR35 0.014 0.962 0.000 0.010 0.006 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy__L1_I0/I
 CLKINV16_12TR35 0.000 0.963 0.000 0.010 0.006 1 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy__L1_I0/ZN
 I ^ -> ZN v CLKINV16_12TR35 0.011 0.973 0.000 0.008 0.004 1.000 
...

 databahn_dll_phy/dll_phy_slice_core/data_slice_0/data_slice_data_byte_disable/inst_hic_lp_clkgate_dfi_data_byte_disable_phy/hic_dnt_io_clkgate/CK
 CLKLANQV2_12TR35 0.001 1.014 0.000 0.017 0.009 2 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/data_slice_data_byte_disable/inst_hic_lp_clkgate_dfi_data_byte_disable_phy/hic_dnt_io_clkgate/Q
 CK ^ -> Q ^ CLKLANQV2_12TR35 0.072 1.087 0.002 0.063 0.009 1.000 
...
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy_dfi_data_byte_en__L16_I20/I
 CLKINV16_12TR35 0.001 1.481 0.000 0.039 0.027 4 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy_dfi_data_byte_en__L16_I20/ZN
 I v -> ZN ^ CLKINV16_12TR35 0.039 1.520 0.000 0.022 0.027 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/CK
 SDRNQV2_12TR35 0.002 1.521 0.000 0.022 0.027 18 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/Q
 CK ^ -> Q v SDRNQV2_12TR35 0.137 1.658 0.000 0.014 0.001 1.000 

...
CCOpt

CCOpt extends CCOpt-CTS to replace traditional global skew balancing with a combination of CTS, timing driven useful skew, and datapath optimization.

In traditional CTS flows an ideal clock model is used before CTS to simplify clock timing analysis. With the ideal clock model, launch and capture clock paths are assumed to have the same delay. After CTS, the ideal clock model is replaced by a propagated clock model that takes account of actual delays along clock launch and capture paths.

In traditional CTS global skew balancing attempts to make the propagated clock timing match the ideal mode clock timing by balancing the insertion delay (clock latency) between all sinks. However, a number of factors combine such that skew balancing does not lead to timing closure. These include:

  • OCV – On-chip variation means that skew, measured using a single metric such as the ‘late’ configuration of a delay corner, no longer directly corresponds to timing impact because launch and capture paths have differing timing derates. In addition, Common Path Pessimism Removal (CPPR) and per-library cell timing derates mean that it is not possible to accurately estimate clock or datapath timing without synthesizing a clock tree. Advanced OCV (AOCV) further complicates this by adding path and bounding box dependent factors.
  • Clock gating – Clock gating uses datapath signals to inhibit or permit clock edges to propagate from a clock source to clock sinks. The clock arrival time at a clock gating cell is unknown prior to CTS and this arrival time determines the required time for the datapath control signal to reach the clock gating cell enable input. Therefore the setup slack at a clock gating enable input is hard to predict preCTS. In addition, clock gating cells have an earlier clock arrival time than regular sinks and are therefore often timing critical. Typically, the fan-in registers controlling clock gating may need to have an earlier clock arrival time than regular sinks in order to avoid a clock gating slack violation – which means the fan-in registers need to be skewed early.
  • Unequal datapath delays – Front end logic synthesis will attempt to ensure logic between registers is roughly delay balanced to optimize the target clock frequency. However, with wire delay dominating many datapath stages it is likely that after placement and preCTS optimization there will exist some combinational paths with unavoidably longer delays than others. Useful skew clock scheduling permits slack to be moved between register stages to increase clock frequency. In contrast, global skew balancing is independent of timing slack. In addition, CCOpt useful skew scheduling can avoid unnecessarily balancing of sinks where there is excess slack in order to reduce clock area and clock power.
    CCOpt treats both clock launch, clock capture, and datapath delays as flexible parameters that can be manipulated to optimize timing as illustrated below.

CCOPT_manipulating_Clock_delays_and_Logic_delays

Manipulating Clock Delays and Logic Delays

At each clock sink (flip-flop) in the design, CCOpt can adjust both datapath and clock delays in order to improve negative setup timing slack – specifically the high effort path group(s) WNS. This is performed using the propagated clock timing model at all times.