A 1Mjot 1040fps 0.22e-rms Stacked BSI Quanta Image Sensor with Cluster-Parallel Readout

Saleh Masoodian\(^1\), Jiaju Ma\(^1\), Dakota Starkey\(^1\), Yuichiro Yamashita\(^2\), and Eric R. Fossum\(^1\)

\(^1\)Thayer School of Engineering, Dartmouth College, Hanover, New Hampshire, USA
\(^2\)Taiwan Semiconductor Manufacturing Company (TSMC), Hsinchu, Taiwan
E-mail: saleh.masoodian.th@dartmouth.edu

Abstract— A 1Mjot single-bit quanta image sensor (QIS) implemented in a stacked backside-illuminated (BSI) process is presented. This is the first work to report a megapixel photon-counting CMOS-type image sensor to the best of our knowledge. A QIS with 1.1\(\mu\)m pitch tapered-pump-gate jots is implemented with cluster-parallel readout, where each cluster of jots is associated with its own dedicated readout electronics stacked under the cluster. Power dissipation is reduced with this cluster readout because of the reduced column bus parasitic capacitance, which is important for the development of 1Gjot arrays. The QIS functions at 1040fps with binary readout and dissipates only 17.6mW, including I/O pads. The readout signal chain uses a fully differential charge-transfer amplifier (CTA) gain stage before a 1b-ADC to achieve an energy/bit FOM of 16.1pJ/b and 6.9pJ/b for the whole sensor and gain stage+ADC, respectively. Analog outputs with on-chip gain are implemented for pixel characterization purposes.

I. INTRODUCTION

The quanta image sensor (QIS) was introduced in 2005 as a paradigm shift in image capture to take advantage of shrinking pixel sizes enabled by technological advancements [1]. The QIS contains a large number of sub-diffraction-limit, high-conversion-gain, low-full-well-capacity pixels, called “jots.” The key aspects of the QIS involve counting individual photoelectrons using the jots at high readout rates, representing this binary output as a bit cube (x,y,t), and finally, processing the bit cubes to form high dynamic range images [2]. The QIS concept is illustrated in Figure 1.

A QIS may contain over a billion jots, each producing just a small amount of signal per electron conversion, with a field readout rate 10-100 times faster than conventional CMOS image sensors.

The two biggest challenges in designing a high-field-readout rate gigajot QIS are realizing the jot devices and the high-speed, low-power readout circuits. Several prior works have been published from our research group to address these challenges and develop solutions for the jot device as well as high-speed, low-power readout circuits. Most of these works are covered in [2]. In this paper the first practical stacked backside-illuminated (BSI) 1Mjot single-bit QIS is presented. It achieves a frame rate of at least 1040fps.

II. SENSOR

Sensor Architecture

The simplified structure of the imager is shown in Figure 2. A stacked QIS uses two substrates, with the substrates being stacked vertically and electrically connected, with the photo-detectors and circuits on different substrates. The jots are implemented on the detector substrate and the readout circuits and addressing circuits are located on the ASIC substrate. Each cluster of jots is associated with its own dedicated readout electronics stacked under the cluster. Twenty (20) different 1Mjot arrays are implemented in this chip, where every array has a different variation of jot and readout design. Figure 3 shows a photograph of the QIS test chip. There were two classes of readout designs, one supporting analog readout for detailed characterization, and one supporting binary data readout at much higher field rates.

The jots are two-way shared (2Hx1V) and every 4096 jots (one cluster of jots) share a 1-bit ADC (Figure 4). There are a total number of 256 1-bit ADCs for all the 256 clusters in the imager. The readout of all the clusters is performed in a parallel manner and 32 high-speed digital pads are used to send the data off-chip.

Jot Device

In this paper, the results of the arrays (analog and digital) with the tapered-pump-gate (TPG) jots [3] are presented. The TPG jots are fabricated with a 45nm BSI CMOS image sensor (CIS) process. The schematic of the 2 way-shared jots is shown in Figure 4. The simplified layout of the jots is depicted in Figure 5a. As shown in Figure 5a, the pump-gate photodiode doping profile is adapted from the previously demonstrated design and optimized for an improved effective fill-factor and better response in the shorter wavelength regime.

Analog-to-Digital Converter

The 1-bit ADC is comprised of two cascaded charge-transfer amplifiers (CTA) followed by a dynamic latch (d-latch). Cascading two CTAs amplifies the input voltage to a level which is larger than the input-referred offset of the d-latch [4]. In
comparison with [4], in this design, the stacked structure makes it possible to reduce the aspect ratio of the CTA layout, therefore, instead of 4 cascaded CTAs, two cascaded CTAs are used. The threshold of the ADC can be set by adjusting the Vpro in the CTA.

The 1-bit ADC is designed and fabricated with 0.25µm gate-length CMOS transistors to provide a wide input common-mode range (ICMR) to handle the wide range of jot designs. However, to reduce the power consumption of the ADCs in future designs, smaller feature nodes such as 65nm can be used by limiting the ICMR according to a specific jot design. At 1040fps, the ADC sampling rate is about 4MSa/s and each ADC consumes 29µW.

**Sensor Function**

The sensor is read out using a rolling shutter with full-frame integration. After a row is selected, the jots are reset, and the voltage values of the reset levels are stored on the CR capacitor in the CDS unit. By activating the TG, the collected charge is transferred from the storage-well in the jot to the floating diffusion, and the signal values are sampled onto the CS capacitor in the CDS unit. The differential signals stored in the CDS units are applied to the input of the ADC for quantization. After quantization is completed, the outputs of the 256 ADCs are sent out off-chip via 32 high-speed pads.

**III. EXPERIMENTAL RESULTS**

Figure 5 shows the characterized results of the TPG jots. These results were achieved using the slower analog outputs with an on-chip gain of 10(V/V) and a 14-bit on-board ADC. Each column output line is connected to a correlated double sampling (CDS) unit. The output of every 4 CDS units are selected by a multiplexer and sent to a unity-gain buffer. The buffered signal is then amplified by a switch capacitor programmable gain amplifier (PGA). Another unity-gain amplifier is used to drive the output pad after the PGA. For the best noise performance, correlated multiple sampling (CMS) was used to suppress the noise in the full readout chain, where 20 cycles of signal were collected in series. It was found that additional cycles could not reduce the read noise, probably because of the addition of low-frequency noise due to the extended readout process.

The CG and read noise were characterized with the photon counting histogram (PCH) method. In this measurement, the PCH of each jot was created from 20k continuous reads. The read noise was extracted from the valley-peak-modulation (VPM), and the conversion gain was extracted from the peak-to-peak distance. The inevitable variability in the fabrication process always leads to performance variation between each jot. For example, small misalignment of masks may lead to variation of CG, and the randomness in the number of traps in each jot’s source-follower may lead to different voltage noise magnitude. Since the analog readout speed is currently limited on the testing board, about 8192 jots of each type were tested. Figure 5b shows the photon-counting histogram (PCH) of the best jot in the array with a read noise of 0.18e-r.m.s.. Histograms of the conversion gain and read noise for the jots are shown in Figure 5c and 5d, respectively. The average conversion gain is 345µV/e- and the conversion gain variation is 2.9%. The average read noise is 0.236e-, the peak of the read noise histogram is 0.22e-, the best and worst read noise cases are 0.18 and 0.39e-r.m.s., respectively. The variation of read noise is 16%. Note that a relatively long tail was observed in the read noise distribution, and the jots with higher noise were found to have stronger high-frequency noise. Further investigation is needed to discover the source of noise, but the suspicion is interface-trap-related RTS noise.

Specifications of the imager are shown in Table I. The sensor functions at 1040fps and with 1Gb/s output data rate, resulting in a total power consumption of 17.6mW. The energy-per-bit FOM is defined as the chip power/(# of pixels×fps×N), where N represents the ADC resolution in bits, which for algorithmic converters is the number of comparator strobes per conversion. The 1Mjot QIS has a FOM of 16.1pJ/b (including output pads) and considering only the gain+ADC power, the FOM of the QIS becomes 6.9pJ/b. Compared to [4], even though the FOM of the ADC is increased (which we expected, because a bigger feature node is used in this design), the total FOM is reduced because the stacked architecture and cluster readout helps to reduce the needed bias current to achieve this high-speed readout. Considering only the array power, the FOM in this stacked sensor is improved more than 3 times in comparison with [4].

Examples of experimental images formed from collected QIS data are shown in figure 6.

**V. ACKNOWLEDGMENTS**

The authors appreciate the sponsorship and collaboration of Rambus Inc. in the initial stages of this work, as well as the support and collaboration of the Taiwan Semiconductor Manufacturing Company (TSMC). The characterization work was sponsored by the DARPA DETECT program through Army Research Office (ARO) Cooperative Agreement Number W911NF-16-2-0162. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The authors also appreciate discussion with Prof. Stanley Chan at Purdue regarding image formation.
VI. REFERENCES


Figure 1. QIS concept. Every solid square in each temporal field represents a photo-electron created by absorbing a photon. A cubicle of jots, consists of spatial and temporal jots and forms each output image pixel. In this illustration, a 4x4x4 jot cubicle is used. In a QIS system, the cubicle size is a parameter in the image formation processing that occurs post capture.

Figure 2. Simplified architecture of 1Mjot stacked QIS.

Figure 3. Photograph of the 20x1Mjot QIS test chip.

Table I. Specifications of the 1Mjot single-bit QIS at room temperature.

<table>
<thead>
<tr>
<th>Process</th>
<th>45nm (jot layer), 65nm (ASIC layer)</th>
</tr>
</thead>
<tbody>
<tr>
<td>VDD</td>
<td>1.8V &amp; 2.5V (Analog, digital and array), 3V &amp; 2.2V (I/O pads)</td>
</tr>
<tr>
<td>Jot type</td>
<td>BSI Tapered Pump Gate/2-Way Shared RO</td>
</tr>
<tr>
<td>Jot pitch</td>
<td>1.1µm</td>
</tr>
<tr>
<td>BSI Fill Factor</td>
<td>~100%</td>
</tr>
<tr>
<td>Quantum Efficiency</td>
<td>To be measured, visible band</td>
</tr>
<tr>
<td>CG on column</td>
<td>345µV/e-</td>
</tr>
<tr>
<td>Input Referred Noise</td>
<td>0.22e- r.m.s.</td>
</tr>
<tr>
<td>Corresponding BER</td>
<td>~1%</td>
</tr>
<tr>
<td>Avg. Dark current</td>
<td>0.16e-/s</td>
</tr>
<tr>
<td>Equiv. Dark Count Rate</td>
<td>0.16Hz/jot</td>
</tr>
<tr>
<td>Equiv. PD Dead Time</td>
<td>&lt;0.1%</td>
</tr>
<tr>
<td>Array</td>
<td>1024 (H) x 1024 (V)</td>
</tr>
<tr>
<td>Field rate</td>
<td>1040fps</td>
</tr>
<tr>
<td>ADC sampling rate</td>
<td>4MSa/s</td>
</tr>
<tr>
<td>ADC resolution</td>
<td>1 bit</td>
</tr>
<tr>
<td>Output data rate</td>
<td>32 (output pins) x 34Mb/s = 1090Mb/s</td>
</tr>
<tr>
<td>Package</td>
<td>PGA with 224 pins</td>
</tr>
<tr>
<td>Power</td>
<td>2.3mW</td>
</tr>
<tr>
<td>256 ADCs</td>
<td>7.5mW</td>
</tr>
<tr>
<td>Addressing</td>
<td>4.1mW</td>
</tr>
<tr>
<td>I/O pads</td>
<td>3.7mW</td>
</tr>
<tr>
<td>Total</td>
<td>17.6mW</td>
</tr>
<tr>
<td>FOM ADC</td>
<td>6.9pJ/b</td>
</tr>
</tbody>
</table>
formation processing that occurs post capture. In a QIS system, the cubicule size is a parameter in the image output image pixel. In this illustration, a $4 \times 4 \times 4$ jot cubicule is used.

A cubicle of jots, consists of spatial and temporal jots and forms each represents a photo-electron created by absorbing a photon. A Figure 1. QIS concept. Every solid square in each temporal field

VI. REFERENCES


Figure 2. Simplified architecture of 1Mjot stacked QIS.

Figure 3. Photograph of the 20x1Mjot QIS test chip.

Input Referred Noise

Corresponding BER

Quantum Efficiency

ADC sampling rate

Average Dark current

Output data rate

ADC resolution

Table I. Specifications of the 1Mjot single-bit QIS at room temperature.

Jot pitch

Jot type

Process

BSI Tapered Pump Gate/2

.way &

Field rate

Array

Power

Equiv. PD Dead

I/O pads

Table II. Comparison of performance parameters for QIS.

VI. REFERENCES

Addressing

Detector Substrate

ASIC Substrate

16x16=256 readout clusters

8 CDS units and a 1b-ADC

4096 jots in each cluster

16x16=256 clusters

High-Speed Digital PADs

1126.4um

Read noise

r.m.s. read noise; (c) Conversion gain variation of the TPG jots; (d) pump

Figure 5. (a) Simplified layout and simulated doping profile of the

Figure 6. (Upper left) Image of printed scene taken with CMOS image sensor under normal lighting, reduced to 128x128 resolution; (Upper right) One field of binary single-photon data (1024x1024) grabbed from 1Mjot QIS at 1040fps continuous operation from same scene. Some fixed pattern noise (FPN) is observed. (Lower left) Image pixels formed from 8x8x8 cubicule summation from 8 fields of 1Mb data. The resulting image resolution is 128x128. (Lower right) Same QIS data as lower left but processed using Purdue de-noising algorithm [5] for 128x128 resolution.