## Trace-based timing fault localization with supply voltage sensor Miho Ueno, Masanori Hashimoto, Takao Onoye Osaka University {ueno.miho, hasimoto, onoye}@ist.osaka-u.ac.jp - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion ## Background #### **Electrical timing fault** - is a bug that arises even though the circuit is logically correct and is caused by dynamic events - arises only a certain situation and its reproduction is difficult - ⇒It is hard to debug the electrical timing fault. # Trace-based fault localization system Trace buffer[1] records circuit signals and status at every cycle. × Trace buffer involves additional area overhead. #### A trigger signal -> crucially important - is generated after a fault occurs. - prevents to overwrite the fault information. Information recorded in the trace buffer [1] M. Abramovici, P. Bradley, K. Dwarakanath, P. Levin, G. Memmi, and D. Miller, "A reconfigurable design-for-debug infrastructure for SoCs," *DAC*, pp. 7–12, 2006. ## Trigger quality - 1. Latency - Time interval between timing fault and trigger. - Shorter latency is desirable. - 2. # of trace analyses - # of traces to check if a timing faults is included until a trace captures the target timing fault. - Smaller number is desirable. ## Objective - Trigger quality is very important - for area overhead. - for fault localization efficiency. - Conventionally logical events are used as triggers (e.g. deadlock and segmentation fault [2]). - ×Electrical timing faults can be recorded only when the fault influence appears as a logical event. - ×It takes a long time for appearance. #### **Objective** Directly observe power supply noise and improve trigger quality. - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion ## Proposed timing fault localization system Proposed system consists of trace buffer, trigger generator and supply voltage sensor. ### Trace buffer - Trace buffer aims to store the information which is useful for localizing electrical timing faults. - Both width and depth of the trace buffer must be minimized for area overhead reduction. - ⇔Fault localization efficiency, i.e. # of trace analyses is maintained. ## Supply voltage sensor Provides supply voltage information to trigger generator and trace buffer. - Sensor should measure cycle-by-cycle supply voltage fluctuation, since a timing fault occurs depending on supply voltage within the corresponding cycle. - To immediately exploit sensing results for trigger generation, real-time sensing is demanded. - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion ## Sensor structure and operation Supply voltage sensor consists of a delay chain and TDC (Time to Digital Converter). Latch the signal at E43 - Distance of $E\downarrow 1$ propagation ( $N_{passed}$ ) represents Vdd. - Every cycle and one-shot sensing is achieved. ## Sensor structure and operation Supply voltage sensor consists of a delay chain and TDC (Time to Digital Converter). Latch the signal at E13 - Distance of $E\downarrow 1$ propagation ( $N_{passed}$ ) represents Vdd. - Every cycle and one-shot sensing is achieved. ## Measured voltage resolution Supply voltage sensor w/ 256-stage TDC - is implemented on a 65-nm process test chip. - occupies 0.138% of the test chip. Test chip $(4.2 \text{mm} \times 2.1 \text{mm})$ Voltage resolution of 3.9mV [3] Miho Ueno, Masanori Hashimoto, and Takao Onoye, "Real-Time Supply Voltage Sensor for Detecting/Debugging Electrical Timing Failures," in *Proc. of IEEE IPDPSW*, pp. 301-305, 2013 - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion #### **Evaluation environment** - To reproduce noise-induced timing faults, a gatelevel logic simulation framework that considers dynamic supply noise is developed. - We used - TOSHIBA MeP processor as a CUT. - three MiBench programs (SHA1, CRC32, dijkstra). ## An example of simulation result An electrical timing fault occurred when the voltage dropped to a low value of 0.823V. ## Setup for trigger quality evaluation #### **Trigger setting** - Threshold value of supply voltage. - (Any, $\leq 0.96$ , $\leq 0.92$ , $\leq 0.88$ , $\leq 0.84$ , $\leq 0.80$ ) **AND** - Instruction executing in the CUT. - Instructions supposed to activate timing-critical paths were selected. - For example, for instructions (ret, lw, sw, jmp) were selected from dijkstra program. #### Comparison to trigger based on logical event Memory access error, Exception handling etc. ## Reminder: Metrics of trigger quality - 1. Latency - Time interval between timing fault and trigger. - Shorter latency is desirable. - 2. # of trace analyses - # of traces to check if a timing faults is included until a trace captures the target timing fault. - Smaller number is desirable. ### Trigger quality improvement (dijkstra) | Voltage | Instruction Condition | | | | | | | | |--------------------------------------------------|---------------------------|---------|---------------------------|---------|---------------------------|----------|---------------------------|---------| | Condi-<br>tion | ret | | lw | | SW | | Jmp | | | | # of<br>trace<br>analyses | Latency | # of<br>trace<br>analyses | Latency | # of<br>trace<br>analyses | Latency | # of<br>trace<br>analyses | Latency | | Any | 461 | 26 | 10,439 | 52 | 4,502 | 2 | 318 | 140 | | ≤0.96 | 354 | 26 | 8,886 | 52 | 3,754 | 2 | 262 | 140 | | ≤0.92 | 237 | 1/13 | 6,075 | 1/45 | 2,583 | 2 1/25 ( | <b>179</b> | 1/21 | | ≤0.88 | 121 | 26 | 2,386 | 52 | 1,196 | 1/2 | 95 | 140 | | ≤0.84 | 36 | 26 | 232 | 52 | 183 | 2 | 15 | 140 | | ≤0.80 | 0 | N/A | 0 | N/A | 0 | 1,977 | 0 | N/A | | # of trace analyses can be significantly reduced | | | | | | | | | # of trace analyses can be significantly reduced thanks to voltage sensor. ## Comparison to logical event trigger Logical events were observed in all three programs. - Trigger activation w/ logical events was not useful. - Fault localization must be carried out without any clues. For example, supposing buffer depth is 16, - # of trace analyses is 3671. - w/ a trigger setting of "sw" and "≤0.84", it becomes 183. # of trace analyses is reduced to 1/20. ## Trigger sweep policy Repeating trace analyses w/ different trigger settings, we have two policies in sweeping priority. - A) Fix instruction and change voltage threshold first. - B) Fix voltage threshold and change instruction first. Trigger condition priority (dijkstra program) - Voltage threshold - $0.80V \rightarrow 0.84V \rightarrow 0.88V \rightarrow 0.92V \rightarrow 0.96V \rightarrow Any$ - Instruction - ret $\rightarrow$ lw $\rightarrow$ sw $\rightarrow$ jmp - This order is decided by the frequency of timing violation in the program. Compare # of trace analyses with policy A) and B). ## Evaluation result of trigger sweep policy - I. Trace buffer depth is 100. - Policy A): # of trace analyses is 38. comparable - Policy B): # of trace analyses is 41. ∠ - **II.** Trace buffer depth is 2. - Policy A): # of trace analyses is 56,879. - Policy B): # of trace analyses is 2,090. ← superior In this case, we should change **instruction condition** first before raising voltage threshold. - 1. Background and objectives - 2. Proposed timing fault localization system - 3. Supply voltage sensor - 4. Evaluation of fault localization performance - 5. Conclusion ### Conclusion We proposed a timing fault localization system with a supply voltage sensor. - Supply voltage sensor - could provide cycle-accurate voltage variation. - had 4mV voltage resolution on a 65-nm test chip. - Proposed system w/ a voltage sensor - was helpful for reducing trace buffer depth. - reduced # of trace analyses and improved efficiency of timing fault localization. - Future work includes evaluation of more complex trigger condition with a number of test programs.