Machine-check exception

A machine-check exception (MCE) is a type of computer hardware error that occurs when a computer's central processing unit detects a hardware problem.

Modern versions of Microsoft Windows handle machine check exceptions through the Windows Hardware Error Architecture. When WHEA detects a machine check exception, it displays the error in a Blue Screen of Death, with the following parameters (which vary, but the first parameter is always 0x0 for a machine check exception):[1]

 *** STOP: 0x00000124 (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000) 

Older versions of Windows handle similar exceptions through the Machine Check Architecture. In this case the Blue Screen of Death will show an error similar to the following:[2]

 STOP: 0x0000009C (0x00000030, 0x00000002, 0x00000001, 0x80003CBA) 

On Linux, a process (such as klogd[3]) writes a message to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):

CPU 0: Machine Check Exception: 0000000000000004
Bank 2: f200200000000863
Kernel panic: CPU context corrupt

The error usually occurs due to failure or overheating of hardware components where the error cannot be more specifically identified with a different error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.

Most MCEs require a restart of the system before users can continue normal operation, and indicate a long-term problem of a general nature.

Problem types

Most of these errors relate specifically to the Pentium processor family. Similar errors may occur on other processors and will cause similar problems.

Some of the main hardware problems that cause MCEs include:

Possible causes

Machine checks are a hardware problem, not a software problem. They're most often the result of the CPU overheating, causing it to make errors or hit a thermal limit where it must shut itself down to avoid permanent damage. But they can also be caused by bus errors introduced by other failing components, including memory and i/o devices. Possible causes include:

  • Poor CPU cooling due to a CPU heatsink and fan that's clogged with dust or come loose.
  • Overclocking, which increases power dissipation, creating more heat.
  • Poor case cooling due to inadequate or clogged case fans or filters.
  • Failing memory or i/o cards.
  • Inadequate or failing power supply.

Decoding MCEs

As noted previously, decoding MCE errors can prove difficult. Normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes. Consult the Intel 64 and IA-32 Architectures Software Developer's Manual[4] Chapter 15 (Machine-Check Architecture), or the Microsoft KB Article on Windows Exceptions.[5]

Programs to Decode MCEs

  • mcat: A Windows command-line program from AMD to decode MCEs from AMD K8, Family 0x10 and 0x11 processors.
  • mcelog[6] A Linux daemon by Andi Kleen to handle MCEs for modern x86 processors. mcelog can also decode machine checks.
  • parsemce[7] a Linux program by Dave Jones to decode MCEs from AMD K7 processors.
  • mced[8] a Linux program by Tim Hockin to gather MCEs from the kernel and alert interested applications. It does not try to interpret the MCE data, it just alerts other programs.

See also

References

  1. "Bug Check 0x124: WHEA_UNCORRECTABLE_ERROR". MSDN. 2016-09-29. Retrieved 2017-07-13.
  2. "Bug Check 0x9C: MACHINE_CHECK_EXCPETION". Microsoft Support. 2018-03-31. Retrieved 2018-03-31.
  3. Steve Lord, Greg Wettstein. "klogd(8) - Linux man page". Retrieved 2017-07-13. klogd is a system daemon which intercepts and logs Linux kernel messages.
  4. "Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3A / System Programming Guide, Part 1". Intel. May 2011. Retrieved 2017-07-13.
  5. "Stop error message in Windows XP that you may receive: "0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151)"". MSDN. 2015-12-07. Retrieved 2017-07-13.
  6. "mcelog: Advanced hardware error handling for x86 Linux". 2015-04-20. Retrieved 2017-07-13.
  7. "parsemce: Linux Machine check exception handler parser". 2003-07-22. Retrieved 2017-07-13.
  8. mcedaemon on GitHub
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.