Configure STA environment

  1. What’s STA environment?
  2. Specifying Clocks. Clock uncertainty and Clock latency
  3. Generated clocks
  4. Input paths constraint
  5. Output paths constraint 
  6. Timing path groups
  7. External attributes modeling 
  8. Check design rules
  9. Refine timing analysis
  10. Point-to-point specification

Set up environment for static timing analysis. Specification of correct constraints is important in analyzing STA results. Design environment should be specified accurately so that STA analysis can identify all the timing issues in the design. Preparing for STA, setting up clocks, specifying IO timing characteristics, and specifying false paths and multicycle paths.

1.  What’s STA environment?

A_synchronous_design

Figure 1  A synchronous design

Most digital designs are synchronous where the data computed from previous clock cycle is latched in the flip-flops at the active clock edge. Consider a typical synchronous design shown in Figure 1. It is assumed that Design Under Analysis (DUA) interacts with other synchronous designs. This means that DUA receives data from a clocked flip-flop and outputs data to another clocked flip-flop external to DUA.

To perform STA on this design, one needs to specify clocks to the flip-flops, and timing constraints for all path leading into the design and for all paths exiting the design.

Example in Figure 1 assumes that there is only one clock and C1, C2, C3, C4, and C5 represent combination blocks. The combination blocks C1 and C5 are outside of the design being analyzed.

In a typical design, there can be multiple clocks with many paths from one clock domain to another. The following sections describe how the environment is specified in such scenarios.

2.  Specifying Clocks

To define a clock, we need to provide the following information:

i. Clock source: it can be a port of design, or be a pin of a cell inside design (typically that is a part of a clock generation logic).

ii. Period: time period of clock.

iii. Duty cycle: high duration (positive phase) and low duration (negative phase).

iv. Edge times: times for rising edge and falling edge.

A_clock_definition

Figure 2  A clock definition

Figure 2  shows basic definitions. By defining clocks, all the internal timing paths (all flip-flop to flip-flop paths) are constrained; this implies that all internal paths can be analyzed with just the clock specifications. The clock specification specifies that a flip-flop to flip-flop path must take one cycle. We shall later describe how this requirement (of one cycle timing) can be relaxed.

Here is a basic clock specification.

create_clock 
-name SYSCLK 
-period 20 
-waveform { 0 5 } 
[get_ports SCLK]

The name of the clock is SYSCLK and is defined at the port SCLK. The period of SYSCLK is specified as 20 units – the default time unit is nanoseconds if none has been specified. (In general, time unit is specified as part of technology library.) The first argument in waveform specifies time at which rising edge occurs and the second argument specifies time at which falling edge occurs.

There can be any number of edges specified in a waveform option, however, all edges must be within one period. The edge times alternate starting from the first rising edge after time zero, then a falling edge, then a rising edge, and so on. This implies that all time values in the edge list must be monotonically increasing.

-waveform {time_rise time_fall time_rise time_fall ... }

In addition, there must be an even number of edges specified. The waveform option specifies waveform within one clock period, which then repeats itself.

If no waveform option is specified, default is:

-waveform { 0 , period/2 }

Here is an example of a clock specification with no waveform specification.

create_clock -period 5 [ get_ports SCAN_CLK ]

In this specification, since no -name option is specified, the name of clock is the same as the name of the port, which is SCAN_CLK.

Clock_specification_example

Figure 3  Clock specification example

Here is another example of a clock specification in which the edges of the waveform are in the middle of a period.

create_clock -name BDYCLK -period 15 
-waveform { 5 12 } [get_ports GBLCLK]

Clock_specification_with_arbitray_edges

Figure 4  Clock specification with arbitrary edges

The name of the clock is BDYCLK and it is defined at the port GBLCLK. In practice, it is a good idea to keep the clock name the same as the port name.

Here are some more clock specifications.

# See Figure 5a:
create_clock -period 10 -waveform { 5 10 } [get_ports FCLK]
# Creates a clock with the rising edge at 5ns and the falling edge at 10ns.

# See Figure 5b:
create_clock -period 125 
-waveform { 100 150 } [get_ports ARMCLK]
# Since the first edge has to be rising edge, 
# the edge at 100ns is specified first and then the falling
# edge at 150ns is specified. The falling edge at 25ns is 
# automatically inferred.

Example_clock_waveforms

Figure 5  Example clock waveform

# See Figure 6a:
create_clock -period 1.0 -waveform { 0.5 1.375 } MAIN_CLK
# The first rising edge and the next falling edge is 
# specified. Falling edge at 0.375ns is inferred 
# automatically.

# See Figure 6b:
create_clock -period 1.2 -waveform { 0.3 0.4 0.8 1.0 } JTAG_CLK
# Indicates a rising edge at 300ps, a falling edge at 400ps
# a rising edge at 800ps and a falling edge at 1ns, this
# pattern is repeated every 1.2ns.

Example_with_general_clock_waveforms

Figure 6 Example with general clock waveform

2.1  Clock uncertainty

The timing uncertainty of a clock period can be specified using the set_clock_uncertainty specification. The uncertainty can be used to model various factors that can reduce the effective clock period. These factors can be the clock jitter and any other pessimism that one may want to include for timing analysis.

set_clock_uncertainty -setup 0.2 [get_clocks CLK_CONFIG]
set_clock_uncertainty -hold 0.05 [get_clocks CLK_CONFIG]

Note that clock uncertainty for setup effectively reduces available clock period by specified amount as illustrated in Figure 7. For hold checks, clock uncertainty for hold is used as an additional timing margin that needs to be satisfied.

Specifying_clock_uncertainty

Figure 7  Specifying clock uncertainty

The following commands specify uncertainty to be used on paths crossing specified clock boundaries, called inter-clock uncertainty.

set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-hold 0.05
set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-setup 0.3
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -hold 0.05
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -setup 0.1

 Figure 8 shows a path between two different clock domains, SYS_CLK and CFG_CLK. Based on the inter-clock uncertainty specifications above, 100ps is used as an uncertainty for setup checks and 50ps is used as an uncertainty for hold checks.

Inter-clock_path

Figure 8  Inter-clock paths

2.2  Clock latency

Latency of a clock can be specified using the set_clock_latency command.

# Rise clock latency on MAIN_CLK is 1.8ns:
set_clock_latency 1.8 -rise [get_clocks MAIN_CLK]
# Fall clock latency on all clocks is 2.1ns:
set_clock_latency 2.1 -fall [all_clocks]
# The -rise, -fall refer to the edge at the clock pin of a # flip-flop.

There are two types of clock latency: network latency and source latency. Network latency is the delay from clock definition point (create_clock) to clock pin of a flip-flop. Source latency, also called insertion delay, is the delay from clock source to clock definition point. Source latency could represent either on-chip or off-chip latency. Figure 9 shows both the scenarios. The total clock latency at the clock pin of a flip-flop is the sum of source and network latency.

Here are some example commands that specify source and network latency.

# Specify a network latency (no -source option) of 0.8ns 
# for rise, fall, max and min:
set_clock_latency 0.8 [get_clocks CLK_CONFIG] 
# Specify a source latency:
set_clock_latency 1.9 -source [get_clocks SYS_CLK]
# Specify a min source latency:
set_clock_latency 0.851 -source -min [get_clocks CFG_CLK]
# Specify a max source latency:
set_clock_latency 1.322 -source -max [get_clocks CFG_CLK]

Two_type_clock_latency

Figure 9 Clock latency

3.  Generated clocks

A generated clock is a clock derived from a master clock. A master clock is a clock defined using the create_clock specification.

When a new clock is generated in a design that is based on a master clock, the new clock can be defined as a generated clock. For example, if there is a divide-by-3 circuitry for a clock, one would define a generated clock definition at the output of this circuitry. This definition is needed as STA does not know that the clock period has changed at the output of the divide-by logic, and more importantly what the new clock period is. Figure 10 shows an example of a generated clock which is a divide-by-2 of the master clock, CLKP.

create_clock -name CLKP 10 [get_pins UPLL0/CLKOUT]
# Create a master clock with name CLKP of period 10ns
# with 50% duty cycle at the CLKOUT pin of the PLL.
create_generated_clock -name CLKPDIV2 -source UPLL0/CLKOUT -divide_by 2 [get_pins UFF0/Q]
# Creates a generated clock with name CLKPDIV2 at the Q
# pin of flip-flop UFF0. The master clock is at the CLKOUT 
# pin of PLL. Period of generated clock is double that of 
# clock CLKP, that is, 20ns.

Generated_clock_at_output_of_divider

Figure 10  Generated clock at output of divider

Can a new clock (a master clock) be defined at the output of flip-flop instead of a generated clock? The answer is yes, however, there are some disadvantages. Defining a master clock instead of a generated clock creates a new clock domain. This is not a problem in general except that there are more clock domains to deal with in setting up the constraints for STA. Defining the new clock as a generated clock does not create a new clock domain, and generated clock is considered to be in phase with its master clock. The generated clock does not require additional constraints to be developed. Thus, one must attempt to define a new internally generated clock as a generated clock instead of set it as another master clock.

Another important difference between a master clock and a generated clock is the notion of clock origin. In a master clock, the origin of the clock is at the point of definition of the master clock. In a generated clock, the clock origin is that of the master clock and not that of the generated clock. This implies that in a clock path report, the start point of a clock path is always the master clock definition point. This is a big advantage of a generated clock over defining a new master clock as the source latency is not automatically included for the case of a new master clock.

Figure 11 shows an example where the clock SYS_CLK is gated by the output of a flip-flop. Since the output of the flip-flop may not be a constant, one way to handle this situation is to define a generated clock at the output of the and cell which is identical to the input clock.

 Clock_gated_by_a_flip-flop

Clock_gated_by_a_flip-flop_update

Figure 11  Clock gated by a flip-flop*

* It might be CKN in left FF, or it would not meet clock gating hold requirement, details and explain in Check clock gating

create_clock 0.1 [get_ports SYS_CLK]
# Create a master clock of period 100ps with 50% duty 
# cycle.
create_generated_clock -name CORE_CLK -divide_by 1 
-source SYS_CLK [get_pins UAND1/Z]
# Create a generated clock called CORE_CLK at the output of
# the AND cell and the clock waveform is the same as that
# of the master clock.

 Master_clock_and_multiply-by-2_generated_clockFigure 12  Master clock and multiply-by-2 generated clock

create_clock -period 10 -waveform { 0 5 } [get_ports PCLK]
# Create a master clock with name PCLK of period 10ns
# with rise edge at 0ns and fall edge at 5ns.
create_generated_clock -name PCLKx2 
-source [get_ports PCLK] 
-multiply_by 2 [get_pins UCLKMULTREG/Q]
# Creates a generated clock called PCLKx2 from the master 
# clock PCLK and the frequency is double that of the master
# clock. The generated clock is defined at the output of 
# the flip-flop UCLKMULTREG.

Note that -multiply_by and -divide_by options refer to frequency of clock, even though a clock period is specified in a master clock definition.

 Clock_generationFigure 13  Clock generation

Figure 13 shows an example of generated clocks. A divide-by-2 clock in addition to out-of-phase clocks are generated. The waveform for clocks are also shown in figure.

create_clock 2 [get_ports DCLK]
# Name of clock is DCLK, has period of 2ns with a rise edge
# at 0ns and a fall edge at 1ns.
create_generated_clock -name DCLKDIV2 -edges {2 4 6}
-source DCLK [get_pins UBUF2/Z]
create_generated_clock -name PH0CLK -edges {3 4 7} 
-source DCLK [get_pins UAND0/Z]
create_generated_clock -name PH1CLK -edges {1 2 5} 
-source DCLK [get_pins UAND1/Z]

 Clock Latency for Generated Clocks

Latency_on_generated_clock

Figure 14  Latency on generated clock

A generated clock can have another generated clock as its source, that is, one can have generated clocks of generated clocks, and so on, however, a generated clock can have only one master clock.

Typical Clock Generation Scenario

Clock_distribution_in_a_tyical_ASIC

Figure 15  Clock distribution in a typical ASIC

Figure 15 shows a scenario of how a clock distribution may appear in a typical ASIC. The oscillator is external to the chip and produces a low frequency (10-50 MHz typical) clock which is used as a reference clock by on-chip PLL to generate a high-frequency low-jitter clock (200-800 MHz typical). This PLL clock is then fed to a clock divider logic that generates required clocks for ASIC.

On some of the branches of the clock distribution, there may be clock gates that are used to turn off the clock to an inactive portion of design to save power when necessary. PLL can also have a multiplexer at its output so that the PLL can be bypassed if necessary. A master clock is defined for the reference clock at the input pin of chip where it enters the design, and a second master clock is defined at the output of PLL. PLL output clock has no phase relationship with reference clock. Therefore, output clock should not be a generated clock of reference clock. Most likely, all clocks generated by the clock divider logic are specified as generated clocks of the master clock at PLL output.

4.  Input paths constraint

STA cannot check any timing on a path that is not constrained. Thus, all paths should be constrained to enable their analysis.

Input_port_timing_path

Figure 16  Input port timing path

Figure 16 shows an input path of Design Under Analysis (DUA). Flip-flop UFF0 is external to DUA and provides data to flip-flop UFF1 which is internal to DUA. Data is connected through input port INP1.

set Tclk2q 0.9 
set Tc1    0.6
set_input_delay -clock CLKA -max [expr Tclk2q + Tc1]
[get_ports INP1]

The constraint specifies that external delay on input INP1 is 1.5ns and this is with respect to clock CLKA. (in fact, input_delay equals to one part of data_path delay). Assuming clock period for CLKA is 2ns, then logic for INP1 pin has only 500ps (=2ns – 1.5ns) available for propagating internally in DUA. Tc2 + Tsetup <= 500ps for flip-flop UFF1 to reliably capture data launched by flip-flop UFF0.

5.  Output paths constraint

Example A

Output_port_timing_path_a

Figure 17  Output timing path

set Tc2  3.9
set Tsetup 1.1
set_output_delay -clock CLKQ -max [expr Tc2 + Tsetup] 
[get_ports OUTB]

Example B

Output_port_timing_path_b_max_min_delays

Figure 18  Output timing path Max Min delays

Tc2max + Tsetup = 7ns + 0.4ns = 7.4ns

Tc2min – Thold = 0 – 0.2ns = 0.2ns

create_clock -period 20 -waveform {0 15} [get_ports CLKQ]
set_output_delay -clock CLKQ -min -0.2 [get_ports OUTC]
set_output_delay -clock CLKQ -max 7.4 [get_ports OUTC]

Example C

Input_output_timing_path

Figure 19  Input and output timing path

create_clock -period 100 -waveform {5 55} [get_ports MCLK]
set_input_delay 25 -max -clock MCLK [get_ports DATAIN]
set_input_delay 5 -min -clock MCLK [get_ports DATAIN]
set_output_delay 20 -max -clock MCLK [get_ports DATAOUT]
set_output_delay -5 -min -clock MCLK [get_ports DATAOUT]

6.  Timing path groups

 Timing_paths

Figure 20  Timing paths

Path_groupsFigure 21  Path groups

Timing paths in a design can be considered as a collection of paths. Each path has a startpoint and an endpoint.

In STA, paths are timed based on valid startpoints and valid endpoints. Valid startpoints are: input ports and clock pins of synchronous device, such as flip-flops and memories. Valid endpoints are output ports and data input pins of synchronous devices. Thus, a valid timing path can be:

i.  an input port —> an output port,

A —> Z

ii.  an input port —> a data input pin of a flip-flop (FF) or a memory,

A —> UFFA/D

iii.  a clock pin of FF —> a data input of FF,

UFFA/CLK —> UFFB/D

iv.  a clock pin of FF —> an output port,

UFFB/CLK —> Z

Timing paths are sorted into path groups by the clock associated with endpoint of the path. Thus, each clock has a set of paths associated with it. There is also a default path group that includes all non-clocked (asynchronous) paths.

  • CLKA group: A —> UFFA/D.
  • CLKB group: UFFA/CK —> UFFB/D.
  • DEFAULT group: A —> Z, UFFB/CK —> Z.

7.  External attributes modeling 

While create_clock, set_input_delay and set_output_delay are enough to constrain all paths in a design for performing timing analysis, these are not enough to obtain accurate timing for IO pins of block. The following attributes are required to accurately model environment of a design also. For inputs, one needs to specify slew at input. This information can be provided using:

  • set_driving_cell
  • set_input_transition

For outputs, one need to specify capacity load seen by output. This is specified by using following specification:

  • set_load

set_input_transition_specification_representation

Figure 22  set_input_transition specification representation

set_input_transition 0.85 [get_ports INPC]
# Specifies an input transition of 850ps on port INPC.

set_load_specification_representation

Figure 23  Capacity load on output port

set_load 5 [get_ports OUTX]
# Place a 5pF load on output port OUTX

The set_load specification can be used for specifying a load on an internal net in design.

set_load 0.25 [get_nets UCNT5/NET6]
# Set net capacitance to be 0.25pF.

8.  Check design rules

Two of frequently used design rules for STA are max transition and max capacitance. These rules check all ports and pins in design meet specified limits for transition time and capacitance.

  • set_max_transition
  • set_max_capacitance

 9.  Refine timing analysis

 Four common commands that are used to constrain analysis are:

i.  set_case_analysis: Specify constant value on a pin of a cell, or on an input port.

ii.  set_disable_timing: Break a timing arc of a cell.

iii.  set_false_path: Specify paths that are not real which implies that these paths are not checked in STA.

iv.  set_multicycle_path: Specify paths that can take longer than one clock cycle.

9.1  Specify inactive signals

In a design, certain signals have a constant value in a specific mode of chip. For example, if a chip has DFT logic in it, then Scan pin of chip should be at 0 in normal functional mode.

set_case_analysis_0_scan_for_functional_mode

9.2  Break timing arcs in cells

Apply set_disable_timing to break timing arcs, for example, timing arcs in delay element is not real timing path in DDR PHY dataslice level STA.

set_disable_timing_dll_delay_element_in_dataslice_simple

Note, One should caution when apply set_disable_timing as it removes all timing paths through specified pins. Where possilbe, it is preferable to apply set_false_path and set_case_analysis commands.

In fact, set_false_path is available for replacing set_disable_timing in some situation. For example, set_false_path during delay_element hardening, so it is no need to set_disable_timing in data_slice level after set_false_path in delay_element hardening.

set_false_path_in_dll_delay_element_simple

9.3  Multicycle paths

In some case, data path between two flip-flops might take more than one clock cycle to propagate through logic. In such cases, this combination data path is declared as a multicycle path. Even though data is captured by capture FF on every clock edge, we direct STA that relevant capture edge occurs after specified number of clock cycles.

A_three-cycle_multicycle_path

Figure 24  A three-cycle multicycle path

Figure 24 shows an example, since data path takes 3 clock cycles, a setup multicycle check of 3 cycles should be specified. Multicycle setup constraints specified are given below.

create_clock -name CLKM -period 10 [get_ports CLKM] 
set_multicycle_path 3 -setup 
-from [get_pins UFF0/Q]  
-to [get_pins UFF1/D]

A hold multicycle check should be checked as it was in a single cycle setup case, which is the one shown in Figure 24.  It ensures that data is free to change anytime between 3 cycles. In absence of such a hold multicycle specification, default hold check is done on active edge prior to setup capture edge which is not intent. We need to move hold check 2 cycles prior to default hold check edge and hence a hold multicycle of 2 is specified. The intended behavior is shown in Figure 25.

set_multicycle_path 2 -hold 
-from [get_pins UFF0/Q] 
-to [get_pins UFF1/D]

Hold_check_moved_back_to_launch_edge

Figure 25  Hold check moved back to launch edge

The number of cycles denoted on a multicycle hold specifies how many clock cycles to move back from its default hold check edge which is one active edge prior to setup capture edge.

In most designs, if max path (or setup) requires N clock cycles, it is not feasible to achieve min path constraint to greater than (N-1) clock cycles.

Thus, in most designs, a multicycle setup specified as N cycles should be accompanied by a multicycle hold constraint specified as N-1 cycles.

10.  Point-to-point specification

set_min_delay

set_max_delay

###########################################
### clk --> read_mem_dqs
###########################################
set_max_delay [expr ($PHY_THREEQUARTER - $skew_clk_to_read_mem_dqs_max)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]
set_min_delay [expr ($PHY_THREEQUARTER - $PHY_CLK_PERIOD + $skew_clk_to_read_mem_dqs_min)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]

Does delay in set_max/min_delay refer to source clock latency vs target clock latency skew? or data path delay?

Modeling Power Terminology

The power a circuit dissipates falls into two broad categories:

  • Static power
  • Dynamic power

Static Power

Static power is the power dissipated by a gate when it is not switching – that is, when it is inactive or static.

Static power is dissipated in several ways. The largest percentage of static power results from source-to-drain subthreshold leakage. This leakage is caused by reduced threshold voltage that prevent the gate from turning off completely. Static power also results when current leaks between the diffusion layers and substrate. For this reason, static power is often called leakage power.

Dynamic Power

Dynamic power is the power dissipated when a circuit is active. A circuit is active anytime the voltage on a net changes due to some stimulus applied to the circuit. Because voltage on a net can change without necessarily resulting in a logic transition, dynamic power can result even when a net does not change its logic state.

The dynamic power of a circuit is composed of

  • Internal power
  • Switching power

Internal Power

During switching, a circuit dissipates internal power by the charging or discharging of any existing capacitance internal to the cell. The definition of internal power includes power dissipated by a momentary short circuit between the P and N transistors of a gate, called short-circuit power.

Components_of_Power_Dissipation

Figure 1 Components of Power Dissipation

Figure 1 illustrates components of power dissipation and shows the cause of short-circuit power. In this figure, there is a slow rising signal at the gate input IN. As the signal makes a transition from low to high, the N-type transistor turns on and the P-type transistor turns off. However, during signal transition, both the P- and N-type transistors can be on simultaneously for a short time. During this time, current flows from VDD to GND, resulting in short-circuit power.

Short-circuit power varies according to the circuit. For circuits with fast transition times, the amount of short-circuit power can be small. For circuits with slow transition times, short-circuit power can account for up to 30 percent of the total power dissipated. Short-circuit power is also affected by the dimensions of the transistors and the load capacitance at the output of the gate.

In most simple library cells, internal power is due primarily to short-circuit power. For this reason, the terms internal power and short-circuit power are often considered synonymous.

Note:

A transition implies either a rising or a falling signal; therefore, if the power characterization involves running a full-cycle simulation, which includes both rising and falling signals, then you must average the energy dissipation measurement by dividing by 2.

Switching Power

The switching power, or capacitance power, of a driving cell is the power dissipated by the charging and discharging of the load capacitance at the output of the cell. The total load capacitance at the output of a driving cell is the sum of the net and gate capacitance on the driver.

Because such charging and discharging is the result of the logic transition at the output of the cell, switching power increases as logic transition increase. The switching power of a cell is the function of both the total load capacitance at the cell output and the rate of logic transitions.

Figure 1 shows how the capacitance (Cload) is charged and discharged as the N or P transistor turns on. Switching power accounts for 70 to 90 percent of the power dissipation of an active CMOS circuit.

Path delay in cross clock domain

Sometimes, for cross clock domain timing analysis, incorrect timing report from mistake Path Delay due to big source clock paths latency/skew or target paths latency/skew would lead tool report and fix timing violation wrongly. Designer analysis launch clock paths latency/skew and capture clock paths latency/skew, find out which path(s) is/are too long, for example, it might be a path in capture clock path group. One might contact with team members about STA constraint to check if the constraint is correct or not.

same source create_generated_clock -add

#Constraint

set CLK_PHASE_0_SRC "dummy_clk4x"
set EDGES {1 3 5}
set CLK_PHY_PORT "clk4x"
set PERIOD [expr 0.833 * $TOOL_TIME_SCALE * $LIB_TIME_SCALE]
set PHY_CLK_PERIOD $PERIOD
set PHY_HALF [expr 0.5 * $PHY_CLK_PERIOD]
set PHY_QUARTER [expr 0.25 * $PHY_CLK_PERIOD]
set CTLR_CLK_PERIOD [expr 2 * $PHY_CLK_PERIOD]
set PHY_DDL_CLK_PERIOD [expr 0.2 * $TOOL_TIME_SCALE * $LIB_TIME_SCALE]
create_clock [get_ports $CLK_PHY_PORT ] -name dummy_clk4x -period $PHY_HALF -waveform "0 $PHY_QUARTER"
set clk_dqs_pin0 [cdn_get_pin inst_data_path_tb/inst_write_path_tb/inst_clk_wrdqs_base_delay_macro/inst_wrdqs_base_delay_line/inst_exit_inv/hic_dnt_inv/${NEG_OUTPUT} inst_data_path_tb/inst_write_path_tb/inst_clk_wrdqs_base_delay_macro/inst_wrdqs_base_delay_line/base_delay_out]
create_generated_clock -name clk_dqs_phase_3 -source $CLK_PHY_PORT 
 -edges $EDGES -edge_shift "$THREE_EIGHTH $THREE_EIGHTH $THREE_EIGHTH" -add 
 -master_clock [get_clocks $CLK_PHASE_0_SRC ] $clk_dqs_pin0
create_generated_clock -name clk_dqs_phase_7 -source $CLK_PHY_PORT 
 -edges $EDGES -edge_shift "$SEVEN_EIGHTH $SEVEN_EIGHTH $SEVEN_EIGHTH" -add 
 -master_clock [get_clocks $CLK_PHASE_0_SRC ] $clk_dqs_pin0

Some source 2 clk frequency 1.2GHz

#Command reference

Models multiple generated clocks on the same source when multiple clocks must fan into the source pin. Ideally, one generated clock must be specified for each clock that fans into the master pin. Specify this option with the -name and -master_clock options.

By default, the software creates one generated clock at the pin by using the fastest clock present on the source pin as the master clock. However, use the -add option to specify a different clock name for each generated clock when used with the -master_clock option. Subsequently, you can use this clock name for setting other constraints, such as the set_false_path command and the set_input_delay command.

CTS Spec UnsyncPin RootPin based on Constraint and Netlist

Turbodebug check Design netlist about clock timing path:

Clock_GlobalUnsyncPin_RootPin_CTS_spec_BASED_ON_turbodebug_Netlist_0Fig. 1 Design/inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux

#Constraint

create_clock -name clk_ddl_test_fdbk [get_pin inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/$NEG_OUTPUT ] -period $PHY_DDL_SCALED_CLK_PERIOD -waveform "0 $PHY_DDL_SCALED_HALF"

#CTS Spec file

#Excluded Output pin due to create_clock inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/ZN
GlobalUnsyncPin
+inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand0/hic_dnt_nand2/A1
+inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand1/hic_dnt_nand2/A1
#----------------------------------------------------
# Clock Name : clk_ddl_test_fdbk
#----------------------------------------------------
AutoCTSRootPin inst_adrctl_slice_bist_ddl/inst_ddl_fdbk_clk_mux/inst_mux_nand2/hic_dnt_nand2/ZN

Clock divider and CTS

Turbodebug check Design netlist about clk div timing path:

Clock_divider_clk_div_0Fig. 1 Design/inst_clk_div

Clock_divider_clk_div_0_inst_clk_div_mux

Clock_divider_clk_div_0_inst_clk_div_mux_0_inst_mux_nand2

Fig. 2, 3 Design/inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN

Clock_divider_clk_div_0_inst_clk_div_dff

Fig. 4 Design/inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/Q (constraint below create_generated_clock set it as RootPin, but design inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN as RootPin in CTS Spec file, as Fig. 4 Q pin (out_p) connects to Fig. 2 in0 actually)

#Constraint

create_clock [get_ports clk4x ] -name dummy_clk4x -period 0.5*0.833 -waveform "0 0.25*0.833"
create_generated_clock -name slice_clk_ctlr_phase_0 -edges {1 5 9} -add 
 -source [get_ports clk4x] 
 -master [get_clock dummy_clk4x] 
 [get_pin inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/Q]

#CTS Spec file

ClkGroup
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN
GlobalLeafPin
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand0/hic_dnt_nand2/A1
+ inst_clk_div/inst_clk_div_mux/inst_mux_nand1/hic_dnt_nand2/A1
GlobalUnsyncPin
+ inst_clk_div/inst_clk_div_dffn/hic_dnt_out_reg/CPN
+ inst_clk_div/inst_clk_div0_dff/hic_dnt_out_reg/CP
+ inst_clk_div/inst_clk_div_dff/hic_dnt_out_reg/CP
AutoCTSRootPin clk4x
AutoCTSRootPin inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN

#As Fig. 4 Q pin (out_p) connects to Fig. 2 in0 actually, design inst_clk_div/inst_clk_div_mux/inst_mux_nand2/hic_dnt_nand2/ZN as RootPin, it is more clear to CTS.

design flow simple

Simplify a general design flow post-floorplan should be:

1st timing driven placement according to constraints, skew/latency was considered as ‘ideal’ zero, optDesign –preCTS.

2nd CTS, optDesign –postCTS. Clock tree have insertion or propagation delay after CTS.

3rd routing, optDesign –postRoute, optDesign –hold -postRoute. Usually, fix setup violation first, then hold violation in order to obtain positive slacks.

Clock Tree Synthesis

In clock tree synthesis, do ONE thing only, insert CLK INV (NOT CKBUFF !) which could fix rising and falling transition/duty, to min clock tree latency and skew, balance sink/leaf pins which should be balanced, don’t balance pins which should not be balanced.

CTS Macro Model

Let tool know the segment of clock path latency which from assertion pin to sink/leaf pin, balance sink/leaf pins considering this segment. For example, one clock from chip/top level to block level in hierarchy, clock path from the junction point of PHY_TOP and data_slice to reg/ck pin in data_slice.

In the picture below, clock root pin is A, the segment of clock path latency from point B to point C is CTS Macro Model delay.

ctsmdl

The value of Macro Model in CTS spec file below is 550ps.

In PHY_TOP CTS spec file:
MacroModel pin databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy 550ps 550ps 550ps 550ps 0fF

Tell tool that the latency from port of data_slice (block level) to reg/ck in data_slice is about 550ps. The report below is part of PHY_TOP STA. Full STA timing report: STA_report_PHY_TOP

(1.520 – 0.962) ÷ 1.000 = 0.558 (ns) = 558 (ps)

DDR PHY_TOP STA report:
Path 2: MET Setup Check with Pin databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/wr_l_reg/CKN 
Endpoint: databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/wr_l_reg/D (v) checked with trailing edge of 'clk_dqs_0_phase_0'
Beginpoint: databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/Q (v) triggered by leading edge of 'clk_phase_0'
Path Groups: {reg2reg}
Other End Arrival Time 2.068
- Setup 0.075
+ Phase Shift 0.000
+ CPPR Adjustment 0.097
- Uncertainty 0.105
= Required Time 1.985
- Arrival Time 1.980
= Slack Time 0.005
 Clock Fall Edge 1.250
 = Beginpoint Arrival Time 1.250
 -------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 Pin Arc Cell Delay Arrival Incr Slew Load Fanout User Generated Clock 
 Time Delay Derate Adjustment 
 -------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 clk_ctlr_sync clk_ctlr_sync v 1.250 0.200 0.003 1 
 clk_ctlr_sync_I_xIOx/I CLKBUFV8_12TR35 0.000 1.250 0.000 0.200 0.003 1 1.000 
 clk_ctlr_sync_I_xIOx/Z I v -> Z v CLKBUFV8_12TR35 0.125 1.375 0.000 0.047 0.019 1.000 
 clk_ctlr_sync_N_xIOx__L1_I0/I CLKINV12_12TR35 0.004 1.379 0.000 0.047 0.019 1 1.000 
 clk_ctlr_sync_N_xIOx__L1_I0/ZN I v -> ZN ^ CLKINV12_12TR35 0.037 1.416 0.000 0.024 0.019 1.000 
 clk_ctlr_sync_N_xIOx__L2_I0/I CLKINV12_12TR35 0.004 1.420 0.000 0.025 0.019 1 1.000 
 clk_ctlr_sync_N_xIOx__L2_I0/ZN I ^ -> ZN v CLKINV12_12TR35 0.027 1.447 0.000 0.029 0.017 1.000 
 clk_ctlr_sync_N_xIOx__L3_I0/I CLKINV12_12TR35 0.003 1.450 0.000 0.030 0.017 1 1.000 
 clk_ctlr_sync_N_xIOx__L3_I0/ZN I v -> ZN ^ CLKINV12_12TR35 0.021 1.471 0.000 0.010 0.005 1.000 
...
 databahn_dll_phy/clk_ctlr_sync clk_ctlr_sync v databahn_dll_phy 1.525 1.000 clk_ctlr_phase_0 Adj. = 0.000
...
 databahn_dll_phy/dll_phy_pll_clk_source/deskew_pll/FREF
 PLLSM28HKLVDESKEW 0.004 1.641 0.000 0.016 0.026 2 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/deskew_pll/FOUTP
 FREF v -> FOUTP ^ PLLSM28HKLVDESKEW 0.000 0.391 0.000 0.017 0.026 1.000 clk_phase_0 Adj. = -1.250
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_pll_clkgate/hic_dnt_pll_clkgate/CK
 CLKLANQV12_12TR35 0.001 0.392 0.000 0.034 0.026 2 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_pll_clkgate/hic_dnt_pll_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.052 0.445 0.000 0.012 0.004 1.000 
...
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_phybyp_clkgate/hic_dnt_pll_clkgate/CK
 CLKLANQV12_12TR35 0.000 0.474 0.000 0.009 0.004 1 1.000 
 databahn_dll_phy/dll_phy_pll_clk_source/inst_hic_phybyp_clkgate/hic_dnt_pll_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.042 0.516 0.000 0.031 0.026 1.000 
 databahn_dll_phy/dll_phy_lp_control/inst_hic_lp_clkgate_phy/hic_dnt_io_clkgate/CK
 CLKLANQV12_12TR35 0.001 0.517 0.000 0.031 0.026 2 1.000 
 databahn_dll_phy/dll_phy_lp_control/inst_hic_lp_clkgate_phy/hic_dnt_io_clkgate/Q
 CK ^ -> Q ^ CLKLANQV12_12TR35 0.048 0.565 0.000 0.025 0.019 1.000 
...
 databahn_dll_phy/dll_phy_slice_core/FE_ECOC1_lp_clk_phy/I
 CLKINV8_12TR35 0.000 0.948 0.000 0.012 0.003 1 1.000 
 databahn_dll_phy/dll_phy_slice_core/FE_ECOC1_lp_clk_phy/ZN
 I v -> ZN ^ CLKINV8_12TR35 0.014 0.962 0.000 0.010 0.006 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy__L1_I0/I
 CLKINV16_12TR35 0.000 0.963 0.000 0.010 0.006 1 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy__L1_I0/ZN
 I ^ -> ZN v CLKINV16_12TR35 0.011 0.973 0.000 0.008 0.004 1.000 
...

 databahn_dll_phy/dll_phy_slice_core/data_slice_0/data_slice_data_byte_disable/inst_hic_lp_clkgate_dfi_data_byte_disable_phy/hic_dnt_io_clkgate/CK
 CLKLANQV2_12TR35 0.001 1.014 0.000 0.017 0.009 2 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/data_slice_data_byte_disable/inst_hic_lp_clkgate_dfi_data_byte_disable_phy/hic_dnt_io_clkgate/Q
 CK ^ -> Q ^ CLKLANQV2_12TR35 0.072 1.087 0.002 0.063 0.009 1.000 
...
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy_dfi_data_byte_en__L16_I20/I
 CLKINV16_12TR35 0.001 1.481 0.000 0.039 0.027 4 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/clk_phy_dfi_data_byte_en__L16_I20/ZN
 I v -> ZN ^ CLKINV16_12TR35 0.039 1.520 0.000 0.022 0.027 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/CK
 SDRNQV2_12TR35 0.002 1.521 0.000 0.022 0.027 18 1.000 
 databahn_dll_phy/dll_phy_slice_core/data_slice_0/io_datacell_3/write_data_l_reg_reg/Q
 CK ^ -> Q v SDRNQV2_12TR35 0.137 1.658 0.000 0.014 0.001 1.000 

...
CCOpt

CCOpt extends CCOpt-CTS to replace traditional global skew balancing with a combination of CTS, timing driven useful skew, and datapath optimization.

In traditional CTS flows an ideal clock model is used before CTS to simplify clock timing analysis. With the ideal clock model, launch and capture clock paths are assumed to have the same delay. After CTS, the ideal clock model is replaced by a propagated clock model that takes account of actual delays along clock launch and capture paths.

In traditional CTS global skew balancing attempts to make the propagated clock timing match the ideal mode clock timing by balancing the insertion delay (clock latency) between all sinks. However, a number of factors combine such that skew balancing does not lead to timing closure. These include:

  • OCV – On-chip variation means that skew, measured using a single metric such as the ‘late’ configuration of a delay corner, no longer directly corresponds to timing impact because launch and capture paths have differing timing derates. In addition, Common Path Pessimism Removal (CPPR) and per-library cell timing derates mean that it is not possible to accurately estimate clock or datapath timing without synthesizing a clock tree. Advanced OCV (AOCV) further complicates this by adding path and bounding box dependent factors.
  • Clock gating – Clock gating uses datapath signals to inhibit or permit clock edges to propagate from a clock source to clock sinks. The clock arrival time at a clock gating cell is unknown prior to CTS and this arrival time determines the required time for the datapath control signal to reach the clock gating cell enable input. Therefore the setup slack at a clock gating enable input is hard to predict preCTS. In addition, clock gating cells have an earlier clock arrival time than regular sinks and are therefore often timing critical. Typically, the fan-in registers controlling clock gating may need to have an earlier clock arrival time than regular sinks in order to avoid a clock gating slack violation – which means the fan-in registers need to be skewed early.
  • Unequal datapath delays – Front end logic synthesis will attempt to ensure logic between registers is roughly delay balanced to optimize the target clock frequency. However, with wire delay dominating many datapath stages it is likely that after placement and preCTS optimization there will exist some combinational paths with unavoidably longer delays than others. Useful skew clock scheduling permits slack to be moved between register stages to increase clock frequency. In contrast, global skew balancing is independent of timing slack. In addition, CCOpt useful skew scheduling can avoid unnecessarily balancing of sinks where there is excess slack in order to reduce clock area and clock power.
    CCOpt treats both clock launch, clock capture, and datapath delays as flexible parameters that can be manipulated to optimize timing as illustrated below.

CCOPT_manipulating_Clock_delays_and_Logic_delays

Manipulating Clock Delays and Logic Delays

At each clock sink (flip-flop) in the design, CCOpt can adjust both datapath and clock delays in order to improve negative setup timing slack – specifically the high effort path group(s) WNS. This is performed using the propagated clock timing model at all times.

set_disable_timing

Sometimes, it is necessary to add constraint, such as set_disable_timing to let tool ignore timing path which should not be checked, critical path would be fixed by tool correctly.

set_disable_timing -from in1 -to pass [get_cells dll/dll_delay_line_master/delay_0]
set_disable_timing -from in1 -to pass [get_cells dll/dll_delay_line_clk_wr/delay_0]
#ezp set_disable_timing -from in1 -to pass [get_cells dll_delay_line/delay_0]
#ezp set_disable_timing -from in1 -to pass [get_cells dll/dll_delay_line_wr_dqs/delay_0]
set_disable_timing -from in1 -to pass [get_cells dll/dll_delay_line_rd_dqs/delay_0]
set_disable_timing -from ret -to out1 [get_cells dll/dll_delay_line_master/delay_0]
set_disable_timing -from ret -to out1 [get_cells dll/dll_delay_line_clk_wr/delay_0]
#ezp set_disable_timing -from ret -to out1 [get_cells dll_delay_line/delay_0]
#ezp set_disable_timing -from ret -to out1 [get_cells dll/dll_delay_line_wr_dqs/delay_0]
set_disable_timing -from ret -to out1 [get_cells dll/dll_delay_line_rd_dqs/delay_0]

For example, delay_element is only used for DDR DQS signal delay which is not timing path. Data transfer successfully only when the data strobe signal falls with the data eye with sufficient setup and hold margin, Data-Valid-Window shrink along with the upgrade from DDR1 to DDR4 (seup/hold margin shrink with period shrink)

data_eye_delay_DQSdata_eyeTwo key points:
1. Good data skew to achieve maximizing the open eye.
2. Accurate shift the DQS to the date bus center and jitter control to maximize the setup/hold margin.

one delay_line is combined by 128 delay_elements,

delay_line

so

set_disable_timing -from delay_line_begin_element_input_1 -to delay_line_begin_element_pass

set_disable_timing -from delay_line_begin_element_return -to delay_line_begin_element_out_1

update clock latency

Pre CTS or placement, clock latency, skew, transition are considered as ideal zero, but tool add clock buffer/inverter in CTS period in order to minimums clock latency/skew/transition as much as possible, clock tree has insertion delay after CTS.

Post CTS and post route, comparing to datapath delay change, clock path in launch path ‘stretch’, especially cross clock domain or async clock, source clock in launch and target clock in capture path latency change differently, so it must re-calculate via constraint, for example reuse set_max_delay and set_min_delay to update Path Delay (STA) between 2 different clock domain, to update timing path in post CTS and post route, ask tool to repair design rule violations and timing violations correctly, such as reg to reg setup/hold violation. In fact, constraint for update clock latency comes from front-end design constraint‘s IO, input/output or etc. segments. As a result, besides set_max_delay and set_min_delay, there should be set_input_delay, set_output_delay, set_load, set_false_path, set_input_transition and etc.

constraint update_latency
set_max_delay [expr ($PHY_HALF - $skew_delayed_dqs_to_clk_max)] -from [get_clock delayed_dqs*_phase_1] -to [get_clock clk_phase_0]
set_min_delay [expr ($PHY_HALF - $PHY_CLK_PERIOD + $skew_delayed_dqs_to_clk_min)] -from [get_clock delayed_dqs*_phase_1] -to [get_clock clk_phase_0]
STA report
Path 1: MET Setup Check with Pin dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/CP 

Endpoint: dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/D (v) checked with leading edge of 'clk_phase_0'

Beginpoint: dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/Q (v) triggered by leading edge of 'delayed_dqs_phase_1'

Path Groups: {reg2reg}
Other End Arrival Time 0.462
- Setup 0.098
+ Path Delay 0.901
+ Path Ideal Arrival 0.468
+ CPPR Adjustment 0.000
- Uncertainty 0.050
= Required Time 1.684
- Arrival Time 1.664
= Slack Time 0.020

Clock to a SoC/chip is like blood to a dog body. If you want the pet smart and strong, hematological system would be healthy. Just the way blood flows to each and every part of the body and regulates metabolism, clock reaches each and every sequential device and controls the digital events inside the chip.

There are two phases in the design of a clock signal.

1st the clock is in “ideal mode” (e.g.: during RTL design, during synthesis and during placement). An “ideal” clock has no physical distribution tree, it just shows up magically on time at all the clock pins.

2nd phase comes when clock tree synthesis (CTS) inserts an actual tree of buffers into the design that carries the clock signal from the clock source pin to the (thousands/millions) of flip-flops that need to get it. CTS is done after placement and before routing. After CTS is finished, the clock is said to be in “propagated mode”.

What is clock latency?

Clock latency is an ideal mode term. It refers to the delay that is specified to exist between the source of the clock signal and the flip-flop clock pin. This is a delay specified by the user – not a real, measured thing. When the clock is actually created, then that same delay is now referred to as the “insertion delay”. Insertion delay (ID) is a real, measurable delay path through a tree of buffers. Sometimes the clock latency is interpreted as a desired target value for the insertion delay.

Clock latency is the time taken by the clock to reach the sink pin from the clock source. It is divided into two parts – Clock Source Latency and Clock Network Latency. Clock Source Latency defines the delay between the clock waveform origin point to the definition point. Clock Network Latency is the delay form the clock definition point to the sink pin. Clock Latency is also called Clock Insertion Delay. Please see the below 2 pictures to get a better understanding of what Clock Latency is.

Master_Clock

Generated_Clock

create_clock [get_ports clk_phy ] -name clk_phase_0 -period 1.874 -waveform "0 0.5*1.874"
set_clock_uncertainty -setup $SKEW_MAX [get_clocks clk_phase_0]
set_clock_uncertainty -hold $SKEW_MIN [get_clocks clk_phase_0]
create_generated_clock -name slice_clk_ctlr_phase_0 -divide_by 2 -add 
-source [get_ports clk_phy] 
-master [get_clock clk_phase_0] 
[get_pin inst_clk_div/inst_clk_div_dff/hic_dnt_div_flop/Q]
set_clock_uncertainty -setup $SKEW_MAX [get_clock slice_clk_ctlr_phase_0]
set_clock_uncertainty -hold $SKEW_MIN [get_clock slice_clk_ctlr_phase_0]
create_clock dqs_ipad[0] -name read_mem_dqs_phase_0 -period 1.784 -waveform "0 0.5*1.784"
set_clock_uncertainty -setup $SKEW_READ_DQS_MAX [get_clock read_mem_dqs_phase_0]
set_clock_uncertainty -hold $SKEW_READ_DQS_MIN [get_clock read_mem_dqs_phase_0]
create_generated_clock -name delayed_dqs_phase_1 -source dqs_ipad[0] 
 -edges {1 2 3} -edge_shift "0.25*1.874 0.25*1.874 0.25*1.874" 
 -add -master_clock [get_clocks read_mem_dqs_phase_0] dll/dll_delay_line_rd_dqs/delay_0/r1/hic_dnt_dll_nand2/A1

delayed_dqs_phase_1 is dqs_ipad[0] shift 0.25 clock cycle
Path 1: MET Setup Check with Pin dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/CP 

Endpoint: dfi_read_datablk/read_datablk_fifo/io_datain_l_reg_0_/D (v) checked with leading edge of 'clk_phase_0'

Beginpoint: dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/Q (v) triggered by leading edge of 'delayed_dqs_phase_1'

Path Groups: {reg2reg}
Other End Arrival Time 0.462
- Setup 0.099
+ Path Delay 0.901
+ Path Ideal Arrival 0.468
+ CPPR Adjustment 0.000
- Uncertainty 0.050
= Required Time 1.683
- Arrival Time 1.674
= Slack Time 0.009
 Clock Rise Edge 0.000
 = Beginpoint Arrival Time 0.000

 dqs_ipad[0] dqs_ipad[0] ^ 0.000 0.150 0.004 2 
...
 dfi_dqs_in/hic_dll_dqs_mod_dqs0/hic_dnt_dll_dqs_not/ZN
 I v -> ZN ^ 
 CKND8BWP12T35P140 0.037 0.218 0.000 0.016 0.007 1.039 
 dll/dll_delay_line_rd_dqs/delay_0/r1/hic_dnt_dll_nand2/A1
 CKND2D1BWP12T35P140
 0.001 0.687 0.000 0.016 0.007 2 1.000 delayed_dqs_phase_1 Adj. = 0.468
...
 dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/CP
 EDFQD2BWP12T35P140
 0.005 0.957 0.002 0.019 0.027 11 1.000 
 dfi_read_datablk/read_datablk_fifo/dll_entry_flop_l_40/hic_dnt_dll_entry_flop/Q
 CP ^ -> Q v EDFQD2BWP12T35P140
 0.112 1.069 0.000 0.012 0.002 1.077 
...
What is clock skew?

Clock Skew between two sink pins is the the difference in the clock latency between them. If the capture clock latency is more than the launch clock, then it is positive skew. This helps setup checks. If the capture clock latency is less than the launch clock, then it is negative skew. This helps hold checks. Ideal clock skew in a design is zero which is not achieveable. Clock tree is built to reduce the clock skew values.

What is clock uncertainty?

Clock uncertainty is the deviation of the actual arrival time of the clock edge with respect to the ideal arrival time. In ideal mode the clock signal can arrive at all clock pins simultaneously. But in fact, that perfection is not achievable. So, to anticipate the fact that the clock will arrive at different times at different clock pins, the “ideal mode” clock assumes a clock uncertainty. For example, a 1 ns clock with a 100 ps clock uncertainty means that the next clock tick will arrive in 1 ns plus or minus 50 ps.
A deeper question gets into why the clock does not always arrive exactly one clock period later. There are several possible reasons but here will list 3 major ones:
(a) The insertion delay to the launching flip-flop’s clock pin is different than the insertion delay to the capturing flip-flop’s clock pin (one paths through the clock tree can be longer than another path). This is called clock skew.
(b) The clock period is not constant. Some clock cycles are longer or shorter than others in a random fashion. This is called clock jitter which can be contributed from PLL or crystal osillator, cables, transmitters, receivers, internal circuitry of the PLL, thermal noise of the osillator etc. In the case of Pre CTS, since clock tree is not built, uncertainty = skew + jitter . Post CTS uncertainty = jitter .
(c) Even if the launching clock path and the capturing clock path are absolutely identical, their path delays can still be different because of on-chip variation (OCV). This is where the chip’s delay properties vary across the die due to process variations or temperature variations or other reasons. This essentially increases the clock skew.