What if your chips lived 20% longer without compromising performance, and even while reducing power consumption? How would it affect your product’s reliability and cost? What would be the effect on your profitability?
With the demand for longer-lasting chips growing across industries, designers and reliability engineers face increasing pressure to ensure their products perform correctly for the expected lifetime. Application stress is one of the key contributors to chip degradation over the years, as performance under demanding workloads leads to increased power consumption, higher temperatures, greater reliability risks, and eventually, reduced product lifespan.
This article explores an innovative approach to workload-aware and reliability-aware adaptive voltage scaling that is much safer than traditional methods. Therefore, it enables the use of fewer guard bands (i.e., reclaiming unutilized guard bands), significantly extending SoC lifetime.
Why Prolonging Chip Lifetime Matters
Datacenters, automotive systems, and consumer devices require installed chips to function reliably over extended periods. Hyperscale datacenters, for example, have publicly announced strategic business objectives that aim to reduce capital expenditure (CAPEX) and promote sustainability by stretching the useful life of servers. According to the table below1, this lifetime extension has increased Amazon’s quarterly net income by $0.9B. It has also increased the annual net income of Alphabet and Microsoft by $3B.
The table further shows that those giants performed a similar move to extend server lifetime a few years earlier, indicating a clear trend. Moreover, the impact on net income has increased from $2B-$2.7B in 2021 to $3B-$3.7B in 2023. This surge can be explained by the recent introduction of high-stress workloads, such as generative AI, which require much more computing resources. With the rising hardware expenses, it’s clear why hyperscalers strive to maximize chip lifetime more than ever.
Automotive electronics, meanwhile, must endure up to 15 years in the face of high temperature and mechanical stress while meeting safety requirements. This market has also witnessed an increase in computational requirements with the introduction of centralized ECU architectures and demanding use cases, such as ADAS. As these greater loads increase silicon degradation, providing sufficient lifetime is even more challenging.
Why Your Chips Don’t Live Longer
Many chipmakers rely on traditional methods to select VDDmin (the minimum operating voltage for reliable performance), resorting to predetermined guard bands that need to last throughout the committed operational lifespan. The guard bands need to account for expected performance degradation due to aging, as well as inaccuracies during test when VDDmin is set, among other things.
These methods are often too conservative, potentially requiring more guard bands than needed to compensate for the lack of accurate real-time data from within the SoC. This deficiency means that chips unnecessarily utilize worst-case guard bands, which limits the potential power savings available. Consequentially, devices may wear out earlier after being configured with higher-than-needed voltages.
How Safer Voltage Scaling Enables Longer Lifetime
The equation is simple. If you had a real-time solution that is both workload-aware and reliability-aware, you wouldn’t have to use conservative guard bands constantly. In other words, you could safely use lower voltages whenever applicable, and that would eventually delay the wear-out phase, besides reducing power consumption considerably.
The common degradation model (e.g. Negative Bias Temperature Instability, NBTI) is based on a physical phenomenon known as power law:
To estimate the life extension at lower VDD, f(t) and g(t) should be compared at a certain threshold. As depicted below, g(t) that has safer voltage and temperature down-scaling facilitates an acceleration factor that makes it degrade slower, i.e. f(t1) = g(t2).
Further analysis produces
which leads to
revealing the acceleration factor
As seen here, the degradation model is influenced by voltage and temperature, with an acceleration factor (AF) modifying the degradation rate. Lower voltage leads to a lower temperature and to a slower degradation process, resulting in an extended component lifetime due to the fractional effect of the acceleration factor.
proteanTecs AVS Pro™: Longer SoC Lifetime with Safer Voltage Scaling
Unlike common canary circuits and fixed voltage guard bands, proteanTecs AVS Pro leverages in-chip timing margin monitoring. This solution combines Margin Agents that monitor millions of true paths in real time with dedicated algorithms for better-informed decisions. AVS Pro allows precise guard-band reclamation based on real workloads, real aging, and actual IR drops to reduce more power in real-life scenarios while ensuring reliability.
In-situ monitoring of the true paths per chip is paramount, as the critical paths can change over time according to different aging patterns of individual devices.
By reducing the nominal voltage throughout a chip’s life, AVS Pro lowers power consumption and temperature, reducing the stress on the SoC and prolonging its lifespan. As the chip ages, the guard band reclamation is optimized accordingly, to ensure reliability while maximizing efficiency.
AVS Pro enables engineers to reclaim excess guard bands safely. The resulting decrease in voltage reduces power consumption, temperature, and stress, which delays wear out.
In the simulations above, the degradation that occurs within the chip’s first year without AVS Pro is delayed by more than two years with AVS Pro (blue dotted line). Consequently, the total chip lifetime is extended by a similar factor (blue dotted line).
Real-World Results: Stretching System Lifespans
proteanTecs AVS Pro has demonstrated the potential to extend chip lifespans by up to 18%. In datacenters, this translates to hundreds of millions of dollars saved annually by reducing the need for premature replacements. In automotive, it helps meet the 15-year lifespan goal for electronics, ensuring that critical safety features remain intact.
The example above was taken from a mass-produced 5nm communications chip. AVS Pro enabled a 12.5% power reduction, resulting in an 18% increase in predicted lifespan. AVS Pro prevents unnecessary degradation, delaying the onset of wear-out and critical failures by fine-tuning the voltage throughout the chip's life, and at the same time reduces the power consumption without performance impact.
NOTE: The nominal voltage in this case is 0.65v. The predicted lifespan increase is smaller than for chips that work at 0.75v nominal voltage.
Looking Ahead: A Greener, More Reliable Future
As the pressure to extend chip lifetimes continues to mount, proteanTecs AVS Pro is emerging as a must-have solution for industries that demand long-lasting, reliable electronics. With the ability to balance performance, power reduction and longevity, chipmakers can now offer products that not only perform reliably but also help reduce CAPEX and promote sustainability.
Learn more about how AVS Pro can extend the life of your chips - Download our white paper or contact us here.
1 S&P Global, “AWS, Azure and Google Cloud intensify capex to prevent customer churn”, Jean Atelsek, March 2024
2 J.W. McPherson, Reliability Physics and Engineering, 3rd Ed., Springer Publishing, 2019.