-
Website
http://codingrelic.geekhold.com/ -
Original page
http://codingrelic.geekhold.com/2009/09/soft-errors-are-hard-problems.html -
Subscribe
All Comments -
Community
-
Top Commenters
-
i80and
1 comment · 1 points
-
Affordable SEO Services
1 comment · 1 points
-
teich
1 comment · 1 points
-
Sam Stokes
1 comment · 2 points
-
panini
1 comment · 0 points
-
-
Popular Threads
-
Crowdsourcing Backup
2 days ago · 4 comments
-
North Pole Compression Algorithm
1 week ago · 2 comments
-
Satellites Should Respond to My Whims
1 week ago · 1 comment
-
Memory Matters
3 weeks ago · 3 comments
-
Untouchable Code
2 weeks ago · 2 comments
-
Crowdsourcing Backup
I work for a company that designs and sells equipment that contains various ASICs supplied by third parties. We test our products and the ASICs they contain by renting time in a facility that has a source of high energy particles (e.g. neutrons), placing our products in the path of the particle beam, and then operating the product with the beam aimed at various ASICS. The results are interesting.
Perhaps this is an extreme way to test the code that handles parity errors in various memories on the ASICs, but until you do it how do you know that the ASIC either detects parity errors, or in the case of ECC, corrects them? Some ASIC vendors are better than others.
cycle to recover after a soft error.
http://www.ewh.ieee.org/r6/scv/rl/articles/ser-...
See slide 34.
Search for [stuck latch cosmic ray] on Google.
Actually cosmic ray don't come from the Sun, but from outer space. It seems counter intuitive, but when there's a solar irruption, it results in less cosmic rays hitting the ground.
Some the industry's main concerns right now are:
- multi bit upsets in memories (mostly SRAM): one particle creates several bitflips. As you know, most advanced ECC detect two and correct one error per word. Memory architecture should be carefully taken care of (using inteleaving or scrambling) to avoid multi bit flips within the same word. But as you say, it might not be feasible for small memory instances spread out within the ASIC.
- more dramatic that the previous effect: the increased sensitivity of FF and registers to Soft Error at smaller technologies (we're talking 65nm here). Mitigation techniques are not as obvious as for memories and can cost a lot of area and power to the designer's budget.
- assessing derating: not all error will actually affect the functionality of the circuit. Either the upset occurred in a part of the circuit not involved in the ongoing function, or the errors has been masked along its propagation path by either timing or logic constraint. Understanding derating for large and complex chips is not a simple problem!
A good overview of the source of the problem can be found in this quick youtube file: http://www.youtube.com/watch?v=pXc8Xh_0WJo