



# Modelling TDC Circuit Performance for SPAD Sensor Arrays

Dr. Daniel Van Blerkom Forza Silicon June, 2020

#### **Basic TDC operation**

FORZA



#### Histogram of photon arrivals





- Designed for a scanning LiDAR system.
- Detects SPAD events over multiple capture cycles and builds up a histogram in memory.
- Two memory banks allow for simultaneous histogram capture and readout for analysis by DSP block.
- TDC timing resolution is determined by the TDC clock rate.
- Typical LiDAR would have an array of these blocks, one for each vertical pixel.





- Designed for a scanning LiDAR system.
- Detects SPAD events over multiple capture cycles and builds up a histogram in memory.
- Two memory banks allow for simultaneous histogram capture and readout for analysis by DSP block.
- TDC timing resolution is determined by the TDC clock rate.
- Typical LiDAR would have an array of these blocks, one for each vertical pixel.



- Designed for a scanning LiDAR system.
- Detects SPAD events over multiple capture cycles and builds up a histogram in memory.
- Two memory banks allow for simultaneous histogram capture and readout for analysis by DSP block.
- TDC timing resolution is determined by the TDC clock rate.
- Typical LiDAR would have an array of these blocks, one for each vertical pixel.



- Designed for a scanning LiDAR system.
- Detects SPAD events over multiple capture cycles and builds up a histogram in memory.
- Two memory banks allow for simultaneous histogram capture and readout for analysis by DSP block.
- TDC timing resolution is determined by the TDC clock rate.
- Typical LiDAR would have an array of these blocks, one for each vertical pixel.



- Designed for a scanning LiDAR system.
- Detects SPAD events over multiple capture cycles and builds up a histogram in memory.
- Two memory banks allow for simultaneous histogram capture and readout for analysis by DSP block.
- TDC timing resolution is determined by the TDC clock rate.
- Typical LiDAR would have an array of these blocks, one for each vertical pixel.



#### **Study Goals**

- Investigate implementation of TDC / Histogram Acquisition architecture in a • 40nm SPAD process using digital flow
- Advantages
  - Timing closure inside digital tool
  - Area efficiency
  - Ease of revision
- Disadvantages
  - Speed / resolution limited compared to analog techniques
  - Timing variations & supply dependence





#### Study Goals (continued)

- Compare implementation of 500 MHz and 1 GHz design
  - Speed, power and area of design
  - Clocking & routing bottlenecks
  - Clock stability
- How does the design scale with faster TDC clock speeds?

| Specification            | 500 MHz design | 1 GHz design |
|--------------------------|----------------|--------------|
| Number of SPAD inputs    | 100            | 100          |
| SPAD aggregation         | SST            | SST          |
| Histogram length         | 2 usec         | 2 usec       |
| TDC clock                | 500 MHz        | 1 GHz        |
| Histogram bins           | 1024           | 2048         |
| Histogram bits           | 12 bits        | 12 bits      |
| Testibility, SPI control | Included       | Included     |

SST = Synchronous Summation Technique



#### **Block Diagram**



#### Implemented in digital place & route tool



#### SPAD event summation (SST)

Synchronous Summation Technique (SST)



Patanwala et. al., IISW 2019

The first operation on the incoming SPAD pulses is to latch them on the TDC clock.

After latching, we sum the number of pulses that occurred in that clock cycle.

The stability of the clock on the input latches is critical to the TDC timing accuracy.





#### **TDC Clock Tree**

The timing on the clock tree to the first set of latches is critical – any shifts of this clock will cause SPAD pulses to change histogram bins, leading to TOF errors.

Supply variations modulate the delay of the clock tree - modelling the supply distribution is important.





#### **TDC Clock Tree**

The timing on the clock tree to the first set of latches is critical – any shifts of this clock will cause SPAD pulses to change histogram bins, leading to TOF errors.

Supply variations modulate the delay of the clock tree - modelling the supply distribution is important.





### Pipelined SRAM operation to meet throughput requirement



- Histogram acquisition requires a read, summation, and write to each memory location.
- The SRAM IP is typically not fast enough to meet the TDC clock timing requirements.
- For 500 MHz operation, we need to pipeline two SRAMs to meet the read/write timing.



#### Complete Layout – 500 MHz TDC



- Complete design includes SPI register controls and testability (Scan and MBIST).
- Approximately 50% utilization of standard cell area.





#### Supply voltage drop

DC current estimate: 5 mA digital current 2 mA per SRAM cell 13 mA total current

Approximately 5 mV DC droop on supply and ground

Doesn't include resistance of the supply routes to the block.







### Supply Resistance Map

- Multiple metal straps keep the resistance of power distribution grid < 2 ohms down to Metal 2.</li>
- More power busses and more internal decoupling lowers area utilization efficiency.



#### VDD (VSS similar)





#### Peak current draw and supply variation during operation

Simulation of the transistor level schematic with extracted capacitances, to capture peak current and clock delays.

When the SRAM turns on, peak current draw increases ~50% and the supply drops 10mV.





#### TDC clock tree delay



There is a 20 psec change in the delay through the clock tree when the SRAM is enabled.

The change in delay increases slightly to 25 psec for 2.5 ohms supply impedance.





## 1 GHz design



• The 1 GHz design includes a clock divider – this allows us to reduce the clock rate on circuits after the SRAM multiplexor.





### Pipelined SRAM for 1 GHz throughput



• For 1 GHz operation, we now need to pipeline four SRAMs to meet the read/write timing requirements.

#### Layout – 1 GHz TDC



Additional SRAMs and muxing logic increase layout area by 45%





#### VDD & VSS supply voltage drop – 1 GHz

DC current estimate: 7 mA digital current 2 mA per SRAM cell 23 mA total current

Approximately 10 mV DC droop on supply and ground.





#### Peak current draw & supply variation – 1 GHz design

Peak current increases by 50%, and supply droop increase by ~ 2X, versus 500 MHz design.





#### TDC clock tree delay – 1 GHz design

- The clock tree delay changes by 35 psec when the SRAM is enabled.
- With 2.5 ohms of supply resistance, the delay change is 83 psec.
- The 1 GHz clock tree has more branches and devices, and is more sensitive to supply shifts.









#### Comparison of designs





| Specification            | 500 MHz design       | 1 GHz design         |
|--------------------------|----------------------|----------------------|
| Number of SPAD inputs    | 100                  | 100                  |
| SPAD aggregation         | SST                  | SST                  |
| Histogram length         | 2 usec               | 2 usec               |
| TDC clock                | 500 MHz              | 1 GHz                |
| Histogram bins           | 1024                 | 2048                 |
| Histogram bits           | 12 bits              | 12 bits              |
| Testibility, SPI control | Included             | Included             |
| Area                     | 0.15 mm <sup>2</sup> | 0.22 mm <sup>2</sup> |
| Power                    | 14 mW                | 25 mW                |
| Peak Current             | 38 mA                | 58 mA                |
| Clock Shift              | 20 psec              | 35 psec              |
| Clock Shift (2.5 ohm)    | 25 psec              | 83 psec              |



#### **Power Routing Issues**



- SPADs & AFEs require biases, supplies and grounds which are very sensitive.
- Top metal routing resources and input pins must be dedicated to these signals, reducing the amount of power routing available to the digital circuitry.



### Clock jitter mitigation techniques

- Avoid big changes in supply current:
  - Algorithm should be designed to avoid bursts of activity
  - Clock gating insertion must be scrutinized carefully
- Improve supply connections & decoupling
  - Trade-off with layout efficiency
- Separate supplies and grounds
  - This can have diminishing returns, due to metal and pin limitations
- · Use divided down clocks to reduce power requirement
  - But beware slow clocks beating with the high speed TDC clock

Higher speed TDC may not always lead to more accurate TOF results, if timing jitter degrades.

