Embedded / IoT Device Error Codes: Firmware, Connectivity, and Hardware Fault Patterns

Use this guide to recognize firmware, memory, and connectivity fault patterns in embedded/IoT devices and choose safe triage steps before hardware guesswork.

More systems guides Systems & Devices hub Embedded Systems hub

TL;DR

✓ Fault names often describe CPU exception categories (HardFault, BusFault, etc.).
✓ Memory bugs can look random; logging and crash context are key.
✓ Connectivity issues can be power, RF, or configuration rather than “server down”.
✓ Avoid unsafe hardware changes; isolate with logs and controlled tests.

Quick Navigation

Symptoms Root Causes Step-by-step fixes What NOT to do If it persists Code directory Related guides & hubs FAQ References

Symptoms / When you see this

✓ Device resets, enters safe mode, or reports exception names.
✓ “Out of memory” or allocation failures occur over time.
✓ Connectivity drops under load or after updates.
✓ Firmware update failures or boot loops occur.

Root causes (grouped)

✓ Memory access bugs (invalid pointers, stack corruption).
✓ Allocator issues (fragmentation, leaks, double frees).
✓ Peripheral/bus access issues and clock/power configuration.
✓ Connectivity issues (RF environment, power, configuration).
✓ Update/boot chain issues (rollback, partial update).

Step-by-step fixes (safe, prioritized)

✓ Capture crash context (PC/LR, fault registers) when available.
✓ Check power stability and supply tolerances under load.
✓ Review recent firmware changes around buffers, interrupts, and allocation.
✓ Reduce allocation churn or use static buffers where appropriate.
✓ Use staged rollouts and rollback plans for firmware updates.

What NOT to do

✓ Do not guess hardware repairs from exception names alone.
✓ Do not disable safety monitors without a documented reason.
✓ Do not deploy untested firmware widely after a fault spike.

If it persists (escalation checklist)

✓ Collect crash dumps and logs with timestamps.
✓ Reproduce in controlled environment with debug symbols.
✓ Engage vendor/firmware team for root cause isolation.

Code directory within this guide

✓ Embedded fault codes are often firmware-implementation dependent. Use these pages as a general map, then rely on your MCU/vendor docs for exact meaning.

Code	Meaning	Next step
BUSFAULT	Bus fault exception — The CPU trapped on a bus access error when reading or writing memory or peripherals.	Follow the checklist on the code page
DEBUGMON	Debug monitor exception — A debug monitor exception occurred, typically in debug builds or when a debug event triggers an exception.	Follow the checklist on the code page
DOUBLE FREE	Memory freed twice — The firmware attempted to free a heap pointer more than once, corrupting allocator state.	Follow the checklist on the code page
HEAP CORRUPTION	Allocator state corrupted — The memory allocator detected corrupted heap metadata, often from out-of-bounds writes or invalid frees.	Follow the checklist on the code page
MALLOC FAILED	Heap allocation failed — A dynamic memory allocation failed because the heap is exhausted or fragmented.	Follow the checklist on the code page
MEMMANAGE	Memory management fault — A Cortex-M memory protection fault occurred due to an invalid memory access or region violation.	Follow the checklist on the code page
NMI	Non-maskable interrupt — A non-maskable interrupt occurred, often indicating a critical hardware or safety event.	Follow the checklist on the code page
PENDSV	PendSV exception — PendSV is commonly used for context switching; being reported as a fault usually indicates an exception handling or stack issue.	Follow the checklist on the code page
SVCALL	Supervisor call — A supervisor call handler was invoked; in RTOS systems this can be part of normal operation or indicate a fault if unexpected.	Follow the checklist on the code page
USAGEFAULT	Usage fault exception — The CPU detected an illegal instruction, undefined state, or invalid execution condition.	Follow the checklist on the code page
ASSERTION FAILED	Runtime assertion — A runtime safety check triggered and halted or reset the system.	Follow the checklist on the code page
BROWNOUT RESET	Undervoltage event — The system reset because supply voltage dropped below a safe threshold.	Follow the checklist on the code page
FLASH WRITE FAILED	Storage operation failed — A firmware or configuration write to flash storage did not complete successfully.	Follow the checklist on the code page
GURU MEDITATION ERROR	Runtime panic — An embedded runtime hit a fatal exception and reported a panic-style diagnostic.	Follow the checklist on the code page
HARDFAULT	CPU fault exception — A Cortex-M style fault handler was triggered due to an invalid memory or instruction condition.	Follow the checklist on the code page
I2C BUS ERROR	Bus communication failure — A peripheral bus operation failed due to signaling or device state issues.	Follow the checklist on the code page
ILLEGAL INSTRUCTION	Invalid CPU instruction — The CPU attempted to execute an invalid instruction and trapped into a fault state.	Follow the checklist on the code page
OUT OF MEMORY	Allocation failure — A required memory allocation failed due to exhausted heap or fragmentation.	Follow the checklist on the code page
STACK OVERFLOW	Stack exhaustion — A task or thread exceeded its stack allocation and entered a fault state.	Follow the checklist on the code page
WATCHDOG RESET	Watchdog triggered — The system rebooted because the watchdog timer was not serviced in time.	Follow the checklist on the code page

Tip: If your exact code isn’t listed, use the closest hub link above and browse related prefixes or message patterns.

FAQ

Is HardFault always hardware failure?

No. It often indicates software faults like invalid memory access or stack corruption.

Why do memory bugs look random?

Corruption can occur earlier and only crash later when the corrupted state is used.

What’s the best first data to capture?

Fault registers, stack trace, and the last log lines before reset.

Should I avoid dynamic allocation?

Not always, but uncontrolled allocation/free can cause fragmentation and failures in constrained systems.

Can power issues cause crashes?

Yes. Brownouts and supply instability can mimic software faults.

Why do updates brick devices?

Partial updates, interrupted power, or bootloader rollback issues can leave inconsistent state.

How do I reduce field risk?

Use staged rollouts, telemetry, and rollback capability.

When to involve hardware team?

When faults correlate with temperature/power/EMI or reproduce only in specific hardware conditions.

References / Notes

✓ MCU vendor documentation
✓ RTOS configuration guides
✓ Crash dump and logging best practices