chgrp

>newgrp project_name_old
>cp project_name_old/debug.tcl .
>exit
>newgrp current_project_name
>chgrp current_project_name debug.tcl
# old script access permission for other users should be modify to current project
>chmod 775 debug.tcl

Copy old project scripts debug.tcl to current project, let other users in current project to access to debug.tcl.

Trace clock tree

updateStatus -force designIsPlaced
selectInst *
dbSet selected.pStatus placed
setCTSMode -engine ck

cleanupSpecifyClockTree
specifyClockTree -file /proj/$project_name/work/user/$design_name/$script_path/${design_name}.seperated.trace.ctstch
ckSynthesis -check -forceReconvergent -trace /proj/$project_name/work/user/$design_name/$script_path/${design_name}.seperated.trace

Check clock gating

A_clock_gating_check

Figure 1  A clock gating check

A clock gating check occurs when a gating signal can control the path of a clock signal at a logic cell. An example is shown in Figure 1. The pin of logic cell connected to clock is called clock pin and pin where gating signal is connected to is gating pin. Logic cell where clock gating occurs is also referred to as gating cell.

One condition for a clock gating check is that clock that goes through cell must be used as a clock downstream. Downstream clock usage can be either as a FF clock or it can fanout to an output port or as generated clock that refers to output of gating cell as its master. If clock is not used as a clock after gating cell, then no clock gating check is inferred.

Another condition for clock gating check applies to gating signal. The signal at gating pin of check should not be a clock or if it was a clock, it should not be used as a clock downstream.

In a general scenario, clock signal and gating signal do not need to be connected to a single logic cell such as and, or or, but may be inputs to an arbitrary logic block. In such cases, for a clock gating check to be inferred, clock pin of the check must fan out to a common output pin.

There are two types of clock gating checks inferred:

  • Active-high clock gating check: Occurs when gating cell has an and or a nand function.
  • Active-low clock gating check: Occurs when gating cell has an or or a nor function.

Active-high and active-low refer to logic state of gating signal which activates clock signal at output of gating cell. If gating cell is a complex function where gating relationship is not obvious, such as a multiplexer or an xor cell,  STA output will typically provide a warning that no clock gating check is being inferred. But this can be changed by specifying a clock gating relationship for gating cell explicitly by using command set_clock_gating_check. In such cases, if set_clock_gating_check specification disagrees with functionality of gating cell, STA will normally provide a warning.

As specified earlier, a clock can be a gating signal only if it is not used as a clock downstream. Consider example in Figure 2. CLKB is not used as a clock downstream due to definition of generated clock of CLKA – path of CLKB is blocked by generated clock definition. Hence a clock gating check for clock CLKA is inferred for and cell.

Gating_check_inferred

Figure 2  Gating check inferred – clock at gating pin not used as a clock downstream

Active-High Clock Gating

We examine timing relationship of an active-high clock gating check now. This occurs at an and or a nand cell; an example using and is shown in Figure 3. Pin B of gating cell is clock signal, and pin A of gating cell is gating signal.

Let us assume that both clocks CLKA and CLKB have the same waveform.

create_clock -name CLKA -period 10 
-waveform {0 5} [get_ports CLKA]
create_clock -name CLKB -period 10 
-waveform {0 5} [get_ports CLKB]

Active_high_clock_gating_using_an_AND_cell

Figure 3  Active high clock gating using an AND cell

Because it is an and cell, a high on gating signal UAND0/A opens up gating cell and allows clock to propagate through. Clock gating check is intended to validate that gating pin transition does not create an active edge for fanout clock. For positive edge-triggered logic, this implies that rising edge of gating signal occurs during inactive period of clock (when it is low). Similarly, for negative edge-triggered logic, falling edge of gating signal should occur only when clock is low. Note that if clocks drives both positive and negative edge-triggered FF, any transition of gating signal (rising or falling edge) must occur only when clock is low. Figure 4 shows an example of a gating signal transition during active edge which needs to be delayed to pass clock gating check.

Gating_signal_needs_to_be_delayed

Figure 4  Gating signal needs to be delayed

The active-high clock gating setup check requires that gating signal changes before clock goes high. Here is setup path report.

Clock_gating_setup_check_path_report_0

Clock_gating_setup_check_path_report_1

Notice that Endpoint indicates that it is a clock gating check. In addition, path is in clock_gating_default group of paths as specified in Path Group. Check validates that gating signal changes before next rising edge of clock CLKB at 10ns.

Active-high clock gating hold check requires that gating signal changes only after falling edge of clock. Here is hold path report.

Clock_gating_hold_check_path_report

Hold gating check fails because gating signal is changing too fast, before falling edge of CLKB at 5ns. If a 5ns delay was added between UDFF0/Q and UAND0/A1 pins, both setup and hold gating checks would pass validating that gating signal changes only in the specified window.

One can see that hold time requirement is quite large. This is caused by fact that sense of gating signal and FF being gated are the same. This can be resolved by using a different type of launch FF, a negative edge-triggered FF to generate gating signal.

Gating_signal_clocked_on_falling_edge

Figure 5  Gating signal clocked on falling edge

In Figure 5, FF UFF0 is controlled by negative edge of clock CLKA. Safe clock gating impies that the output of FF UFF0 must change during inactive part of gating clock, which is between 5ns and 10ns.

The signal waveform corresponding to schematic in Figure 5 are depicted in Figure 6. Here is clock gating setup report.

 Gating_signal_generated_from_negative_edge_FF_meets_gating_checksFigure 6  Gating signal generated from negative edge FF meets gating checks

Clock_gating_setup_check_path_report_CKN_0

Clock_gating_setup_check_path_report_CKN_1

Clock_gating_setup_check_path_report_CKN_2

Here is clock gating hold report. Notice that hold time check is much easier to meet with new design.

Clock_gating_hold_check_path_report_CKN

Since clock edge (negative edge) that launches gating signal is opposite of clock being gated (active-high), setup and hold requirements are easy to meet. This is the most common structure used for gated clocks.

Active-Low Clock Gating

Figure 7 shows an example of an active-low clock gating check.

Active_low_clock_gating_check

Figure 7  Active-low clock gating check

create_clock -name MCLK -period 8 
-waveform {0 4} [get_ports MCLK]
create_clock -name SCLK -period 8 
-waveform {0 4} [get_ports SCLK]

Active-low clock gating check validates that rising edge of gating signal arrives at active portion of clock (when it is high) for positive edge-triggered logic. As described previously, the key is that gating signal should not cause an active edge for output gated clock. When gating signal is high, clock cannot go through. Thus gating signal should switch only when clock is high as illustrated in Figure 8.

Here is active-low clock gating setup timing report. This check ensures that gating signal arrives before clock edge becomes inactive, in this case, at 4ns.

Gating_signal_changes_when_clock_is_highFigure 8  Gating signal changes when clock is high

Clock_gating_setup_check_path_report_active_low_0

Clock_gating_setup_check_path_report_active_low_1

Here is clock gating hold timing report. This check ensures that gating signal changes only after rising edge of clock signal, which in this case is at 0ns.

Clock_gating_hold_check_path_report_active_low

Clock Gating with a Multiplexer

Figure 9 shows an example of clock gating using a multiplexer cell. A clock gating check at multiplexer inputs ensures that multiplexer select signal arrives at right time to clearly switch between MCLK and TCLK. For example, we are interested in switching to and from MCLK and assume that TCLK is low when select signal switches. This implies that select signal of multiplexer should switch only when MCLK is low. This similar to active-high clock gating check.

Clock_gating_using_a_mux

Figure 9  Clock gating using a multiplexer

Gating_signal_arrives_when_clock_is_low

Figure 10  Gating signal arrives when clock is low

Figure 10 shows timing relationships. The select signal for multiplexer must arrive at time MCLK is low. Also, assume TCLK will be low when select changes.

Since gating cell is a multiplexer, clock gating check is not inferred automatically, as evidenced in this message reported during STA.

Warning: No clock-gating check is inferred for clock MCLK at pins UMUX0/S and UMUX0/I0 of cell UMUX0. Warning: No clock-gating check is inferred for clock TCLK at pins UMUX0/S and UMUX0/I1 of cell UMUX0.

But a clock gating check can be explicitly forced by providing a set_clock_gating_check specification.

set_clock_gating_check -high 
[get_cells UMUX0] 
# The -high option indicates an active-high check. 
set_disable_clock_gating_check UMUX0/I1

The disable check turns off clock gating check on specific pin, as we are not concerned with this pin. Clock gating check on multiplexer has been specified to be an active-high clock gating check.

set_clock_gating_check_constraint_example

Here is setup timing path report.

Clock_gating_setup_check_path_report_mux_0

Clock_gating_setup_check_path_report_mux_1

The clock gating hold timing report is next.

Clock_gating_hold_check_path_report_mux_0

Clock_gating_hold_check_path_report_mux_1

Crosstalk delay

Basics

Capacitance extraction for a typical net in a nanometer design consists of contributions from many neighboring conductors. Some of these are grounded capacitance while many others are from traces which are part of other signal nets. The grounded as well as inter-signal capacitance are illustrated in Figure 1.

Example_of_coupled_interconnect

Figure 1  Example of coupled interconnect

All of these capacitance are considered as part of total net capacitance during basic delay calculation (without considering any crosstalk). When neighboring nets are steady (or not switching), inter-signal capacitance can be treated as grounded. When a neighboring net is switching, charging current through coupling capacitance impacts timing of the net. The equivalent capacitance seen from a net can be larger or smaller based upon direction of aggressor net switching. This is explained in a simple example below.

Crosstalk_impact_example

Figure 2  Crosstalk impact example

Figure 2 shows net N1 which has a coupling capacitance Cc to a neighboring net (labeled Aggressor) and a capacitance Cg to ground. This example assumes that net N1 has a rising transition at output and considers different scenarios depending on whether or not aggressor net is switching at the same time.

The capacitive charge required from driving cell in various scenarios can be different as described next.

i.  Aggressor net steady. Driving cell for net N1 provides charge for Cg and Cc to be charge to Vdd. Total charge provided by driving cell of this net is thus (Cg + Cc) * Vdd. Base delay calculation obtains delay for this scenario where no crosstalk is considered from aggressor nets. Table 1 shows charge on Cg and Cc before and after switching of N1 for this scenario.

Base_delay_calculation_no_crosstalk

Table 1  Base delay calculation – no crosstalk

ii.  Aggressor switching in same direction. Driving cell is aided by aggressor switching in the same direction. If aggressor transitions at the same time with the same slew (identical transition time), total charge provided by driving cell is only (Cg * Vdd). If slew of aggressor net is faster than that of N1, actual charge required can be even smaller than (Cg * Vdd) since aggressor net can provide charging current for Cg also. Thus, required charge from driving cell with aggressor switching in same direction is smaller than corresponding charge for steady aggressor described in Table 1. Therefore, aggressor switching in the same direction results in a smaller delay for switching net N1; the reduction in delay is labeled as negative crosstalk delay. See Table 2. This scenario is normally considered for min path analysis.

Aggressor_switching_in_same_direction_negative_crosstalk

Table 2  Aggressor switching in same direction – negative crosstalk

iii.  Aggressor switching in opposite direction. Coupling capacitance is charged from -Vdd to Vdd. Thus, charge on coupling capacitance change by (2 * Cc * Vdd) before and after transitions. This additional charge is provided by both driving cell of net N1 as well as aggressor net. This scenario results in a larger delay for switching net N1; increase in delay is labeled as positive crosstalk delay. See Table 3.  This scenario is normally considered for max path analysis.

Aggressor_switching_in_opposite_direction_positive_crosstalk

Table 3  Aggressor switching in opposite direction – positive crosstalk

Example above illustrates charging of Cc in various cases and how it can impact delay of switching net (labelled as N1). Example considers only a rising transition at net N1, similar analysis holds for falling transition also.

Positive and Negative Crosstalk

Base delay calculation (without any crosstalk) assumes that driving cell provides all necessary charge for rail-to-rail transition of total capacitance of a net, Ctotal (= Cground + Cc). As described in previous subsection, charge required for coupling capacitance Cc is larger when coupled (aggressor) net and victim net are switching in opposite directions. The aggressor switching in opposite direction increases the amount of charge required from driving cell of victim net and increases delays for driving cell and interconnect for victim net.

Similarly, when coupled (aggressor) net and victim net are switching in the same direction, charge on Cc remains the same before and after transitions of the victim and aggressor. This reduces charge required from driving cell of the victim net. The delays for driving cell and interconnect for victim net are reduced.

As described above, concurrent switching of victim and aggressor affects the timing of victim transition. Depending on switching direction of the aggressor, the crosstalk delay effect can be positive (slow down victim transition) or negative (speed up victim transition).

An example of positive crosstalk delay effect is shown in Figure 3. The aggressor net is rising at at same time when victim net has a falling transition. The aggressor net switching in opposite direction increases the delay for the victim net. Positive crosstalk impacts driving cell as well as interconnect – the delay for both of these gets increased.

Positive_crosstalk_delay

Figure 3  Positive crosstalk delay

The case of negative crosstalk delay is illustrated in Figure 4. The aggressor net is rising at the same time as the victim net. Aggressor net switching in the same direction as victim reduced delay of victim net. Negative crosstalk affects timing of driving cell as well as interconnect – delay for both of these is reduced.

Negative_crosstalk_delay

Figure 4  Negative crosstalk delay

Note that the worst positive and worst negative crosstalk delays are computed for rise and fall delays separately. The worst set of aggressor for rise max, rise min, fall max, fall min delay with crosstalk  are different in general.

Sequential cells timing models

Figure 1  Sequential cell timing arcs

Consider timing arcs of a sequential cell shown in Figure 1. For synchronous inputs, such as D pin (or SI, SE), there are following timing arcs:

i.  Setup check arc (rising and falling)

ii.  Hold check arc (rising and falling)

For asynchronous inputs, such as CDN pin, there are following timing arcs:

i.  Recovery check arc

ii.  Removal check arc

For synchronous output of a FF, such as Q or QN pins, there is following timing arc:

i.  CK-to-output propagation delay arc (rising and falling)

All of synchronous timing arcs are with respect to active edge of clock, edge of clock that causes sequential cell to capture data. In addition, clock pin and asynchronous pins such as clear, can have pulse width timing checks.

Sequential_cell_timing_arcs_normal Timing_arcs_on_an_active_rising_clock_edge_normal

Figure 2  Timing arcs on an active rising clock edge

Figure 2 shows timing checks using various signal waveforms.

1.  Synchronous checks: Setup and Hold

Setup and hold synchronous timing checks are needed for proper propagation of data through sequential cells. These checks verify that data input is unambiguous at active edge of clock and proper data is latched in at active edge. These timing checks validate if data input is stable around active clock edge. The minimum time before active clock when data input must remain stable is called setup time. This is measured as time interval from the latest data signal crossing its threshold (normally 50% of Vdd) to active clock edge crossing its threshold (normally 50% of Vdd).

Similarly, hold time is the minimum time data input must remain stable just after active edge of clock. This is measured as time interval from active clock edge crossing it threshold to earliest data signal crossing it threshold. As mentioned previously, active edge of clock for a sequential cell is rising and falling edge that causes sequential cell to capture data.

Example of Setup and Hold checks

Setup and hold constraints for a synchronous pin of a sequential cell are normally described in terms of 2D tables as illustrated below.

example_setup_hold_in_lib_0

example_setup_hold_in_lib_1

example_setup_hold_in_lib_2

Example above shows setup and hold constraints on input D pin with respect to rising edge of clock CK of a sequential cell. 2D models are in terms of transition times at constraint pin (D) and related pin (CK). 

Negative values in setup and hold checks

Notice that some of hold values in example above are negative. This is acceptable and normally happens when path from pin of FF to internal latch point for data is longer than corresponding path for clock. Thus, a negative hold check implies that data pin of FF can change ahead of clock pin and still meet hold time check.

Setup values of a FF can also be negative. This means that pins of FF, data can change after clock pin and still meet setup time check.

Can both setup and hold be negative? No. For setup and hold checks to be consistent, sum of setup and hold values should be positive. Thus, if setup (or hold) check contains negative values – corresponding hold (or setup) should be sufficiently positive so that setup plus hold value is a positive quantity.

Negative_value_for_hold_timing_check_normal

Figure 3  Negative value for hold timing check

For FF, it is helpful to have a negative hold time on scan data input pins. This gives flexibility in terms of clock skew and can eliminate need for almost all buffer insertion for fixing hold violations in scan mode (scan mode is the one in which FFs are tied serially forming a scan chain – output of FF is typically connected to scan data input pin of the next FF in series; these connections are for test-ability).

Similar to setup or hold check on synchronous data inputs, there are constraint checks governing asynchronous pins.

2.  Asynchronous checks

Recovery and Removal checks

Asynchronous pins such as asynchronous clear or asynchronous set override any synchronous behavior of cell. When an asynchronous pin is active, output is governed by asynchronous pin and not by clock latching in data inputs. However, when asynchronous pin becomes inactive, active edge of clock starts latching in data input. Asynchronous recovery and removal constraint checks verify that asynchronous pin has returned unambiguously to an inactive state at next active clock edge.

Recovery time is the minimum time that an asynchronous input is stable after being de-asserted before next active clock edge.

Similarly, removal time is the minimum time after an active clock edge that asynchronous pin must remain active before it can be de-asserted.

eetop.cn_rec and rem

eetop.cn_setup and hold

Figure 4  Recovery (Removal) vs Setup (Hold)

Pulse width checks

In addition to synchronous and asynchronous timing checks, there is a check which ensures that pulse width at an input pin of a cell meets minimum requirement. For example, if width of pulse at clock pin is smaller than specified minimum, clock may not latch data properly. The pulse width checks can be specified for relevant synchronous and asynchronous pins also. The minimum pulse width checks can be specified for high pulse and for low pulse also.

Example of Recovery, Removal and Pulse width checks

Example_recovery_removal_pulse_widith_in_lib

Example_recovery_removal_pulse_widith_in_lib_1

An example of recovery time, removal time, (both of them are with respect to clock pin CK) and pulse width check for an asynchronous clear pin CDN of a FF is given above.

3.  Propagation delay

Propagation delay of a sequential cell is from active edge of clock to a rising or falling edge on output. This is a non-unate timing arc as the active edge of clock can cause either a rising or a falling edge on ouput Q. Here is an example of a propagation delay arc for a negative edge-triggered FF, from clock pin CKN to output Q.

Example_propagation_delay_in_lib_0Example_propagation_delay_in_lib_1

Crosstalk delay on timing verificaiton

crosstalk_for_data_path_and_clock_path

Figure 1  Crosstalk in data and clock paths

1.  Setup analysis

  • Launch clock path sees positive crosstalk delay so that data is launched late.
  • Data path sees positive crosstalk delay so that it takes longer for data to reach destination (D pin in capture FF).
  • Capture clock path sees negative crosstalk delay so that data is captured by capture FF early.

Since launch and capture clock edges for a setup check are different (normally one clock cycle apart), common clock path can have different crosstalk contributions for launch and capture clock edges.

2.  Hold analysis

There is one important difference between hold and setup analyses related to crosstalk on common portion of clock path (launch and capture). Launch and capture clock edge are normally the same edge for hold analysis. Clock edge through common clock portion cannot have different crosstalk contributions for launch clock path and capture clock path. Therefore, the worst-case hold analysis removes crosstalk contribution from common clock path.
Setup analysis concerns two different edges of clock which may be impacted differently in time. Thus, common path crosstalk contributions are considered for both launch and capture paths during setup analysis.
  • Launch clock (not including common path) sees negative crosstalk delay so that data is launched early.
  • Data path sees negative cross talk delay so that it reaches destination early (D pin in capture FF).
  • Capture clock (not including common path) sees positive crosstalk delay so that data is captured by capture FF late.

 

 

Timing verification

Two primary checks are setup and hold checks. Once a clock is defined at clock pin of a flip-flop (FF), setup and hold checks are automatically inferred for the FF.

Timing checks are generally performed at multiple conditions including worst-case slow condition and best-case fast condition. Typically, worst-case slow condition is critical for setup check and best-case fast condition is critical for hold check – though hold check may be performed at worst-case slow condition also.

1.  Setup timing check

A setup timing check verifies timing relationship between clock and data pin of a FF so that setup requirement is met. In other words, setup check ensures that data is available at input of FF before it is clocked in FF. The data should be stable for a certain amount of time, namely setup time of FF, before active edge of clock arrives at the FF.

Setup requirement ensures that data is captured reliably into FF.

setup_requirement_of_a_flip-flop

Figure 1  Setup requirement of a FF

In general, there is a launch FF – which launches data and a capture FF which captures data whose setup time must be satisfied. The setup check validates the long (or max) path from launch FF to capture FF. The clocks to two FF can be the same or can be different.

data_and_clock_signals_for_setup_timing_check

Figure 2  Data and clock signals for setup check

Tlaunch + Tck2q + Tdp < Tcapture + Tcycle - Tsetup

Setup check impose max constraint for paths to data pin on the capture FF; the slowest path to D pin of capture FF needs to be determined. This implies that setup check is verified using the slowest paths. Thus, setup check is typically performed at slow timing corner.

2.  Hold timing check

Hold timing check ensures that a FF output value that is changing does not pass through to a capture FF and overwrite its output before FF has a chance to capture its original value. 
Hold specification of a FF requires that data being latched should be held stable for a specified amount of time after active edge of clock.

 

hold_requirement_of_a_flip-flop

Figure 3  Hold requirement of a FF

data_and_clock_signals_for_hold_timing_check

Figure 4  Data and clock signals for hold check

Tlaunch + Tck2q + Tdp > Tcapture + Thold

Hold check impose min constraint for paths to data pin on the capture FF; the fastest path to D pin of capture FF needs to be determined. This implies that hold check is verified using the fastest paths. Thus, hold check is typically performed at fast timing corner.

Configure STA environment

  1. What’s STA environment?
  2. Specifying Clocks. Clock uncertainty and Clock latency
  3. Generated clocks
  4. Input paths constraint
  5. Output paths constraint 
  6. Timing path groups
  7. External attributes modeling 
  8. Check design rules
  9. Refine timing analysis
  10. Point-to-point specification

Set up environment for static timing analysis. Specification of correct constraints is important in analyzing STA results. Design environment should be specified accurately so that STA analysis can identify all the timing issues in the design. Preparing for STA, setting up clocks, specifying IO timing characteristics, and specifying false paths and multicycle paths.

1.  What’s STA environment?

A_synchronous_design

Figure 1  A synchronous design

Most digital designs are synchronous where the data computed from previous clock cycle is latched in the flip-flops at the active clock edge. Consider a typical synchronous design shown in Figure 1. It is assumed that Design Under Analysis (DUA) interacts with other synchronous designs. This means that DUA receives data from a clocked flip-flop and outputs data to another clocked flip-flop external to DUA.

To perform STA on this design, one needs to specify clocks to the flip-flops, and timing constraints for all path leading into the design and for all paths exiting the design.

Example in Figure 1 assumes that there is only one clock and C1, C2, C3, C4, and C5 represent combination blocks. The combination blocks C1 and C5 are outside of the design being analyzed.

In a typical design, there can be multiple clocks with many paths from one clock domain to another. The following sections describe how the environment is specified in such scenarios.

2.  Specifying Clocks

To define a clock, we need to provide the following information:

i. Clock source: it can be a port of design, or be a pin of a cell inside design (typically that is a part of a clock generation logic).

ii. Period: time period of clock.

iii. Duty cycle: high duration (positive phase) and low duration (negative phase).

iv. Edge times: times for rising edge and falling edge.

A_clock_definition

Figure 2  A clock definition

Figure 2  shows basic definitions. By defining clocks, all the internal timing paths (all flip-flop to flip-flop paths) are constrained; this implies that all internal paths can be analyzed with just the clock specifications. The clock specification specifies that a flip-flop to flip-flop path must take one cycle. We shall later describe how this requirement (of one cycle timing) can be relaxed.

Here is a basic clock specification.

create_clock 
-name SYSCLK 
-period 20 
-waveform { 0 5 } 
[get_ports SCLK]

The name of the clock is SYSCLK and is defined at the port SCLK. The period of SYSCLK is specified as 20 units – the default time unit is nanoseconds if none has been specified. (In general, time unit is specified as part of technology library.) The first argument in waveform specifies time at which rising edge occurs and the second argument specifies time at which falling edge occurs.

There can be any number of edges specified in a waveform option, however, all edges must be within one period. The edge times alternate starting from the first rising edge after time zero, then a falling edge, then a rising edge, and so on. This implies that all time values in the edge list must be monotonically increasing.

-waveform {time_rise time_fall time_rise time_fall ... }

In addition, there must be an even number of edges specified. The waveform option specifies waveform within one clock period, which then repeats itself.

If no waveform option is specified, default is:

-waveform { 0 , period/2 }

Here is an example of a clock specification with no waveform specification.

create_clock -period 5 [ get_ports SCAN_CLK ]

In this specification, since no -name option is specified, the name of clock is the same as the name of the port, which is SCAN_CLK.

Clock_specification_example

Figure 3  Clock specification example

Here is another example of a clock specification in which the edges of the waveform are in the middle of a period.

create_clock -name BDYCLK -period 15 
-waveform { 5 12 } [get_ports GBLCLK]

Clock_specification_with_arbitray_edges

Figure 4  Clock specification with arbitrary edges

The name of the clock is BDYCLK and it is defined at the port GBLCLK. In practice, it is a good idea to keep the clock name the same as the port name.

Here are some more clock specifications.

# See Figure 5a:
create_clock -period 10 -waveform { 5 10 } [get_ports FCLK]
# Creates a clock with the rising edge at 5ns and the falling edge at 10ns.

# See Figure 5b:
create_clock -period 125 
-waveform { 100 150 } [get_ports ARMCLK]
# Since the first edge has to be rising edge, 
# the edge at 100ns is specified first and then the falling
# edge at 150ns is specified. The falling edge at 25ns is 
# automatically inferred.

Example_clock_waveforms

Figure 5  Example clock waveform

# See Figure 6a:
create_clock -period 1.0 -waveform { 0.5 1.375 } MAIN_CLK
# The first rising edge and the next falling edge is 
# specified. Falling edge at 0.375ns is inferred 
# automatically.

# See Figure 6b:
create_clock -period 1.2 -waveform { 0.3 0.4 0.8 1.0 } JTAG_CLK
# Indicates a rising edge at 300ps, a falling edge at 400ps
# a rising edge at 800ps and a falling edge at 1ns, this
# pattern is repeated every 1.2ns.

Example_with_general_clock_waveforms

Figure 6 Example with general clock waveform

2.1  Clock uncertainty

The timing uncertainty of a clock period can be specified using the set_clock_uncertainty specification. The uncertainty can be used to model various factors that can reduce the effective clock period. These factors can be the clock jitter and any other pessimism that one may want to include for timing analysis.

set_clock_uncertainty -setup 0.2 [get_clocks CLK_CONFIG]
set_clock_uncertainty -hold 0.05 [get_clocks CLK_CONFIG]

Note that clock uncertainty for setup effectively reduces available clock period by specified amount as illustrated in Figure 7. For hold checks, clock uncertainty for hold is used as an additional timing margin that needs to be satisfied.

Specifying_clock_uncertainty

Figure 7  Specifying clock uncertainty

The following commands specify uncertainty to be used on paths crossing specified clock boundaries, called inter-clock uncertainty.

set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-hold 0.05
set_clock_uncertainty -from VIRTUAL_SYS_CLK -to SYS_CLK 
-setup 0.3
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -hold 0.05
set_clock_uncertainty -from SYS_CLK -to CFG_CLK -setup 0.1

 Figure 8 shows a path between two different clock domains, SYS_CLK and CFG_CLK. Based on the inter-clock uncertainty specifications above, 100ps is used as an uncertainty for setup checks and 50ps is used as an uncertainty for hold checks.

Inter-clock_path

Figure 8  Inter-clock paths

2.2  Clock latency

Latency of a clock can be specified using the set_clock_latency command.

# Rise clock latency on MAIN_CLK is 1.8ns:
set_clock_latency 1.8 -rise [get_clocks MAIN_CLK]
# Fall clock latency on all clocks is 2.1ns:
set_clock_latency 2.1 -fall [all_clocks]
# The -rise, -fall refer to the edge at the clock pin of a # flip-flop.

There are two types of clock latency: network latency and source latency. Network latency is the delay from clock definition point (create_clock) to clock pin of a flip-flop. Source latency, also called insertion delay, is the delay from clock source to clock definition point. Source latency could represent either on-chip or off-chip latency. Figure 9 shows both the scenarios. The total clock latency at the clock pin of a flip-flop is the sum of source and network latency.

Here are some example commands that specify source and network latency.

# Specify a network latency (no -source option) of 0.8ns 
# for rise, fall, max and min:
set_clock_latency 0.8 [get_clocks CLK_CONFIG] 
# Specify a source latency:
set_clock_latency 1.9 -source [get_clocks SYS_CLK]
# Specify a min source latency:
set_clock_latency 0.851 -source -min [get_clocks CFG_CLK]
# Specify a max source latency:
set_clock_latency 1.322 -source -max [get_clocks CFG_CLK]

Two_type_clock_latency

Figure 9 Clock latency

3.  Generated clocks

A generated clock is a clock derived from a master clock. A master clock is a clock defined using the create_clock specification.

When a new clock is generated in a design that is based on a master clock, the new clock can be defined as a generated clock. For example, if there is a divide-by-3 circuitry for a clock, one would define a generated clock definition at the output of this circuitry. This definition is needed as STA does not know that the clock period has changed at the output of the divide-by logic, and more importantly what the new clock period is. Figure 10 shows an example of a generated clock which is a divide-by-2 of the master clock, CLKP.

create_clock -name CLKP 10 [get_pins UPLL0/CLKOUT]
# Create a master clock with name CLKP of period 10ns
# with 50% duty cycle at the CLKOUT pin of the PLL.
create_generated_clock -name CLKPDIV2 -source UPLL0/CLKOUT -divide_by 2 [get_pins UFF0/Q]
# Creates a generated clock with name CLKPDIV2 at the Q
# pin of flip-flop UFF0. The master clock is at the CLKOUT 
# pin of PLL. Period of generated clock is double that of 
# clock CLKP, that is, 20ns.

Generated_clock_at_output_of_divider

Figure 10  Generated clock at output of divider

Can a new clock (a master clock) be defined at the output of flip-flop instead of a generated clock? The answer is yes, however, there are some disadvantages. Defining a master clock instead of a generated clock creates a new clock domain. This is not a problem in general except that there are more clock domains to deal with in setting up the constraints for STA. Defining the new clock as a generated clock does not create a new clock domain, and generated clock is considered to be in phase with its master clock. The generated clock does not require additional constraints to be developed. Thus, one must attempt to define a new internally generated clock as a generated clock instead of set it as another master clock.

Another important difference between a master clock and a generated clock is the notion of clock origin. In a master clock, the origin of the clock is at the point of definition of the master clock. In a generated clock, the clock origin is that of the master clock and not that of the generated clock. This implies that in a clock path report, the start point of a clock path is always the master clock definition point. This is a big advantage of a generated clock over defining a new master clock as the source latency is not automatically included for the case of a new master clock.

Figure 11 shows an example where the clock SYS_CLK is gated by the output of a flip-flop. Since the output of the flip-flop may not be a constant, one way to handle this situation is to define a generated clock at the output of the and cell which is identical to the input clock.

 Clock_gated_by_a_flip-flop

Clock_gated_by_a_flip-flop_update

Figure 11  Clock gated by a flip-flop*

* It might be CKN in left FF, or it would not meet clock gating hold requirement, details and explain in Check clock gating

create_clock 0.1 [get_ports SYS_CLK]
# Create a master clock of period 100ps with 50% duty 
# cycle.
create_generated_clock -name CORE_CLK -divide_by 1 
-source SYS_CLK [get_pins UAND1/Z]
# Create a generated clock called CORE_CLK at the output of
# the AND cell and the clock waveform is the same as that
# of the master clock.

 Master_clock_and_multiply-by-2_generated_clockFigure 12  Master clock and multiply-by-2 generated clock

create_clock -period 10 -waveform { 0 5 } [get_ports PCLK]
# Create a master clock with name PCLK of period 10ns
# with rise edge at 0ns and fall edge at 5ns.
create_generated_clock -name PCLKx2 
-source [get_ports PCLK] 
-multiply_by 2 [get_pins UCLKMULTREG/Q]
# Creates a generated clock called PCLKx2 from the master 
# clock PCLK and the frequency is double that of the master
# clock. The generated clock is defined at the output of 
# the flip-flop UCLKMULTREG.

Note that -multiply_by and -divide_by options refer to frequency of clock, even though a clock period is specified in a master clock definition.

 Clock_generationFigure 13  Clock generation

Figure 13 shows an example of generated clocks. A divide-by-2 clock in addition to out-of-phase clocks are generated. The waveform for clocks are also shown in figure.

create_clock 2 [get_ports DCLK]
# Name of clock is DCLK, has period of 2ns with a rise edge
# at 0ns and a fall edge at 1ns.
create_generated_clock -name DCLKDIV2 -edges {2 4 6}
-source DCLK [get_pins UBUF2/Z]
create_generated_clock -name PH0CLK -edges {3 4 7} 
-source DCLK [get_pins UAND0/Z]
create_generated_clock -name PH1CLK -edges {1 2 5} 
-source DCLK [get_pins UAND1/Z]

 Clock Latency for Generated Clocks

Latency_on_generated_clock

Figure 14  Latency on generated clock

A generated clock can have another generated clock as its source, that is, one can have generated clocks of generated clocks, and so on, however, a generated clock can have only one master clock.

Typical Clock Generation Scenario

Clock_distribution_in_a_tyical_ASIC

Figure 15  Clock distribution in a typical ASIC

Figure 15 shows a scenario of how a clock distribution may appear in a typical ASIC. The oscillator is external to the chip and produces a low frequency (10-50 MHz typical) clock which is used as a reference clock by on-chip PLL to generate a high-frequency low-jitter clock (200-800 MHz typical). This PLL clock is then fed to a clock divider logic that generates required clocks for ASIC.

On some of the branches of the clock distribution, there may be clock gates that are used to turn off the clock to an inactive portion of design to save power when necessary. PLL can also have a multiplexer at its output so that the PLL can be bypassed if necessary. A master clock is defined for the reference clock at the input pin of chip where it enters the design, and a second master clock is defined at the output of PLL. PLL output clock has no phase relationship with reference clock. Therefore, output clock should not be a generated clock of reference clock. Most likely, all clocks generated by the clock divider logic are specified as generated clocks of the master clock at PLL output.

4.  Input paths constraint

STA cannot check any timing on a path that is not constrained. Thus, all paths should be constrained to enable their analysis.

Input_port_timing_path

Figure 16  Input port timing path

Figure 16 shows an input path of Design Under Analysis (DUA). Flip-flop UFF0 is external to DUA and provides data to flip-flop UFF1 which is internal to DUA. Data is connected through input port INP1.

set Tclk2q 0.9 
set Tc1    0.6
set_input_delay -clock CLKA -max [expr Tclk2q + Tc1]
[get_ports INP1]

The constraint specifies that external delay on input INP1 is 1.5ns and this is with respect to clock CLKA. (in fact, input_delay equals to one part of data_path delay). Assuming clock period for CLKA is 2ns, then logic for INP1 pin has only 500ps (=2ns – 1.5ns) available for propagating internally in DUA. Tc2 + Tsetup <= 500ps for flip-flop UFF1 to reliably capture data launched by flip-flop UFF0.

5.  Output paths constraint

Example A

Output_port_timing_path_a

Figure 17  Output timing path

set Tc2  3.9
set Tsetup 1.1
set_output_delay -clock CLKQ -max [expr Tc2 + Tsetup] 
[get_ports OUTB]

Example B

Output_port_timing_path_b_max_min_delays

Figure 18  Output timing path Max Min delays

Tc2max + Tsetup = 7ns + 0.4ns = 7.4ns

Tc2min – Thold = 0 – 0.2ns = 0.2ns

create_clock -period 20 -waveform {0 15} [get_ports CLKQ]
set_output_delay -clock CLKQ -min -0.2 [get_ports OUTC]
set_output_delay -clock CLKQ -max 7.4 [get_ports OUTC]

Example C

Input_output_timing_path

Figure 19  Input and output timing path

create_clock -period 100 -waveform {5 55} [get_ports MCLK]
set_input_delay 25 -max -clock MCLK [get_ports DATAIN]
set_input_delay 5 -min -clock MCLK [get_ports DATAIN]
set_output_delay 20 -max -clock MCLK [get_ports DATAOUT]
set_output_delay -5 -min -clock MCLK [get_ports DATAOUT]

6.  Timing path groups

 Timing_paths

Figure 20  Timing paths

Path_groupsFigure 21  Path groups

Timing paths in a design can be considered as a collection of paths. Each path has a startpoint and an endpoint.

In STA, paths are timed based on valid startpoints and valid endpoints. Valid startpoints are: input ports and clock pins of synchronous device, such as flip-flops and memories. Valid endpoints are output ports and data input pins of synchronous devices. Thus, a valid timing path can be:

i.  an input port —> an output port,

A —> Z

ii.  an input port —> a data input pin of a flip-flop (FF) or a memory,

A —> UFFA/D

iii.  a clock pin of FF —> a data input of FF,

UFFA/CLK —> UFFB/D

iv.  a clock pin of FF —> an output port,

UFFB/CLK —> Z

Timing paths are sorted into path groups by the clock associated with endpoint of the path. Thus, each clock has a set of paths associated with it. There is also a default path group that includes all non-clocked (asynchronous) paths.

  • CLKA group: A —> UFFA/D.
  • CLKB group: UFFA/CK —> UFFB/D.
  • DEFAULT group: A —> Z, UFFB/CK —> Z.

7.  External attributes modeling 

While create_clock, set_input_delay and set_output_delay are enough to constrain all paths in a design for performing timing analysis, these are not enough to obtain accurate timing for IO pins of block. The following attributes are required to accurately model environment of a design also. For inputs, one needs to specify slew at input. This information can be provided using:

  • set_driving_cell
  • set_input_transition

For outputs, one need to specify capacity load seen by output. This is specified by using following specification:

  • set_load

set_input_transition_specification_representation

Figure 22  set_input_transition specification representation

set_input_transition 0.85 [get_ports INPC]
# Specifies an input transition of 850ps on port INPC.

set_load_specification_representation

Figure 23  Capacity load on output port

set_load 5 [get_ports OUTX]
# Place a 5pF load on output port OUTX

The set_load specification can be used for specifying a load on an internal net in design.

set_load 0.25 [get_nets UCNT5/NET6]
# Set net capacitance to be 0.25pF.

8.  Check design rules

Two of frequently used design rules for STA are max transition and max capacitance. These rules check all ports and pins in design meet specified limits for transition time and capacitance.

  • set_max_transition
  • set_max_capacitance

 9.  Refine timing analysis

 Four common commands that are used to constrain analysis are:

i.  set_case_analysis: Specify constant value on a pin of a cell, or on an input port.

ii.  set_disable_timing: Break a timing arc of a cell.

iii.  set_false_path: Specify paths that are not real which implies that these paths are not checked in STA.

iv.  set_multicycle_path: Specify paths that can take longer than one clock cycle.

9.1  Specify inactive signals

In a design, certain signals have a constant value in a specific mode of chip. For example, if a chip has DFT logic in it, then Scan pin of chip should be at 0 in normal functional mode.

set_case_analysis_0_scan_for_functional_mode

9.2  Break timing arcs in cells

Apply set_disable_timing to break timing arcs, for example, timing arcs in delay element is not real timing path in DDR PHY dataslice level STA.

set_disable_timing_dll_delay_element_in_dataslice_simple

Note, One should caution when apply set_disable_timing as it removes all timing paths through specified pins. Where possilbe, it is preferable to apply set_false_path and set_case_analysis commands.

In fact, set_false_path is available for replacing set_disable_timing in some situation. For example, set_false_path during delay_element hardening, so it is no need to set_disable_timing in data_slice level after set_false_path in delay_element hardening.

set_false_path_in_dll_delay_element_simple

9.3  Multicycle paths

In some case, data path between two flip-flops might take more than one clock cycle to propagate through logic. In such cases, this combination data path is declared as a multicycle path. Even though data is captured by capture FF on every clock edge, we direct STA that relevant capture edge occurs after specified number of clock cycles.

A_three-cycle_multicycle_path

Figure 24  A three-cycle multicycle path

Figure 24 shows an example, since data path takes 3 clock cycles, a setup multicycle check of 3 cycles should be specified. Multicycle setup constraints specified are given below.

create_clock -name CLKM -period 10 [get_ports CLKM] 
set_multicycle_path 3 -setup 
-from [get_pins UFF0/Q]  
-to [get_pins UFF1/D]

A hold multicycle check should be checked as it was in a single cycle setup case, which is the one shown in Figure 24.  It ensures that data is free to change anytime between 3 cycles. In absence of such a hold multicycle specification, default hold check is done on active edge prior to setup capture edge which is not intent. We need to move hold check 2 cycles prior to default hold check edge and hence a hold multicycle of 2 is specified. The intended behavior is shown in Figure 25.

set_multicycle_path 2 -hold 
-from [get_pins UFF0/Q] 
-to [get_pins UFF1/D]

Hold_check_moved_back_to_launch_edge

Figure 25  Hold check moved back to launch edge

The number of cycles denoted on a multicycle hold specifies how many clock cycles to move back from its default hold check edge which is one active edge prior to setup capture edge.

In most designs, if max path (or setup) requires N clock cycles, it is not feasible to achieve min path constraint to greater than (N-1) clock cycles.

Thus, in most designs, a multicycle setup specified as N cycles should be accompanied by a multicycle hold constraint specified as N-1 cycles.

10.  Point-to-point specification

set_min_delay

set_max_delay

###########################################
### clk --> read_mem_dqs
###########################################
set_max_delay [expr ($PHY_THREEQUARTER - $skew_clk_to_read_mem_dqs_max)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]
set_min_delay [expr ($PHY_THREEQUARTER - $PHY_CLK_PERIOD + $skew_clk_to_read_mem_dqs_min)] -from [get_clock clk_phase_0] -to [get_clock read_mem_dqs*_phase_0]

Does delay in set_max/min_delay refer to source clock latency vs target clock latency skew? or data path delay?

Modeling Power Terminology

The power a circuit dissipates falls into two broad categories:

  • Static power
  • Dynamic power

Static Power

Static power is the power dissipated by a gate when it is not switching – that is, when it is inactive or static.

Static power is dissipated in several ways. The largest percentage of static power results from source-to-drain subthreshold leakage. This leakage is caused by reduced threshold voltage that prevent the gate from turning off completely. Static power also results when current leaks between the diffusion layers and substrate. For this reason, static power is often called leakage power.

Dynamic Power

Dynamic power is the power dissipated when a circuit is active. A circuit is active anytime the voltage on a net changes due to some stimulus applied to the circuit. Because voltage on a net can change without necessarily resulting in a logic transition, dynamic power can result even when a net does not change its logic state.

The dynamic power of a circuit is composed of

  • Internal power
  • Switching power

Internal Power

During switching, a circuit dissipates internal power by the charging or discharging of any existing capacitance internal to the cell. The definition of internal power includes power dissipated by a momentary short circuit between the P and N transistors of a gate, called short-circuit power.

Components_of_Power_Dissipation

Figure 1 Components of Power Dissipation

Figure 1 illustrates components of power dissipation and shows the cause of short-circuit power. In this figure, there is a slow rising signal at the gate input IN. As the signal makes a transition from low to high, the N-type transistor turns on and the P-type transistor turns off. However, during signal transition, both the P- and N-type transistors can be on simultaneously for a short time. During this time, current flows from VDD to GND, resulting in short-circuit power.

Short-circuit power varies according to the circuit. For circuits with fast transition times, the amount of short-circuit power can be small. For circuits with slow transition times, short-circuit power can account for up to 30 percent of the total power dissipated. Short-circuit power is also affected by the dimensions of the transistors and the load capacitance at the output of the gate.

In most simple library cells, internal power is due primarily to short-circuit power. For this reason, the terms internal power and short-circuit power are often considered synonymous.

Note:

A transition implies either a rising or a falling signal; therefore, if the power characterization involves running a full-cycle simulation, which includes both rising and falling signals, then you must average the energy dissipation measurement by dividing by 2.

Switching Power

The switching power, or capacitance power, of a driving cell is the power dissipated by the charging and discharging of the load capacitance at the output of the cell. The total load capacitance at the output of a driving cell is the sum of the net and gate capacitance on the driver.

Because such charging and discharging is the result of the logic transition at the output of the cell, switching power increases as logic transition increase. The switching power of a cell is the function of both the total load capacitance at the cell output and the rate of logic transitions.

Figure 1 shows how the capacitance (Cload) is charged and discharged as the N or P transistor turns on. Switching power accounts for 70 to 90 percent of the power dissipation of an active CMOS circuit.