How a BMS Knows What It Cannot See

The Core Challenge — You Cannot Measure SOC
Method 1 — Coulomb Counting
Method 2 — OCV Lookup
Method 3 — The Kalman Filter
Why LFP Makes All of This Harder
Cell Balancing — Passive vs Active
Protection Architecture — Two Layers, One Purpose
State of Health — The Slow Drift
How the BMS Communicates
Key Takeaways
References

The Core Challenge — You Cannot Measure SOC

State of Charge is defined as:

SOC = Q_remaining / Q_max × 100%

Where Q_remaining is the energy currently stored and Q_max is the maximum the pack can hold at its current age and temperature.

The problem: neither Q_remaining nor Q_max has a sensor. You cannot probe a lithium cell and get a direct readout of how full it is. The BMS has to infer SOC from things it can measure — voltage, current, and temperature — using mathematical models.

This is the fundamental challenge of BMS engineering. Every other problem flows downstream from this one.

The BMS uses three methods in combination. None of them is sufficient alone.

Method 1 — Coulomb Counting

The simplest method: measure current flowing in and out of the battery and integrate it over time.

SOC(t) = SOC(t0) + (1 / Q_max) × ∫ I(t) dt

If you start at 80% SOC and 10 amperes flows out for 1 hour from a 100 Ah cell, you've used 10 Ah — so you're now at 70% SOC. Simple arithmetic.

Why it works: It's continuous. It updates every millisecond. It doesn't care what the load profile looks like — city driving, highway, aggressive regen. It tracks all of it.

Why it drifts: The integral starts from an initial SOC estimate. If that estimate is wrong by 3%, every subsequent reading is wrong by 3% — permanently, until you get a correction from another method. Worse, current sensors have small DC offsets — a 50 mA bias on a 100 Ah cell accumulates 1.2 Ah of error per day. Over a week of parking, that's a visible SOC error.

Coulomb counting is not a standalone estimator. It is a predictor that needs periodic correction.

Method 2 — OCV Lookup

Open Circuit Voltage (OCV) is the voltage a cell settles to after resting with zero current for a sufficient period. At rest, the electrochemical reactions inside the cell reach equilibrium, and the voltage that results is a reliable, repeatable function of SOC.

The BMS stores an OCV-SOC lookup table — measured during cell characterization at the factory — and uses it to correct Coulomb counting whenever the vehicle is parked and the pack has rested.

Why it works: OCV is a thermodynamic property. It doesn't drift. It doesn't accumulate error. One clean OCV reading after a 15-minute rest resets whatever error Coulomb counting has accumulated.

The catch: The cell must be at rest. Under load, the measured voltage is not OCV — it's OCV minus voltage drops from internal resistance and polarization effects. Using terminal voltage during driving as if it were OCV will give large SOC errors.

For most chemistries (NMC, NCA), the OCV-SOC curve has a useful slope that makes OCV a reliable SOC indicator across most of the range. For LFP — which dominates commercial EVs in India — the curve is nearly flat between 20–80% SOC. More on this shortly.

Image source : Batterydesign.net

Method 3 — The Kalman Filter

The Kalman filter is the algorithm that fuses Coulomb counting and voltage measurement into a single optimal SOC estimate in real time — including while the vehicle is driving.

To understand it, you need a simple mental model.

The BMS runs a mathematical model of the cell — equations that predict what the cell voltage should be at any given SOC, current, and temperature. At every timestep, it does two things:

Predict: Use the model to predict the current state (SOC) based on the last state and the current flowing. This is Coulomb counting. Error accumulates.

Update: Compare the predicted cell voltage against the measured cell voltage. The difference — called the innovation — tells the filter how wrong its prediction is. It corrects the SOC estimate proportionally.

The key insight: how much to trust the correction depends on the slope of the OCV-SOC curve at the current SOC. A steep slope means a small voltage error maps to a small SOC correction — the measurement is informative. A flat slope means a small voltage error could mean a large SOC range — the measurement is not informative, and the filter correctly trusts its model more.

This is expressed through the Kalman gain:

K = P × H^T × (H × P × H^T + R)^−1

Where H is the slope of the OCV curve (dV/dSOC), P is the estimate uncertainty, and R is the measurement noise. When H is large (steep OCV curve), K is large — trust the measurement. When H is small (flat OCV curve, i.e., LFP plateau), K is small — trust the model.

The result is an SOC estimate that is more accurate than Coulomb counting alone, more continuous than OCV lookup alone, and self-correcting over time.

The Easiest Tutorial on Kalman Filter | Wireless Pi

image source : Wireless pi

Why LFP Makes All of This Harder

Lithium Iron Phosphate (LFP) is the dominant chemistry in Indian commercial EVs — buses, trucks, LCVs — because of its safety, cycle life, and cost. It is also the hardest chemistry to estimate SOC for.

The reason is the flat OCV plateau:

Chemistry	OCV change across 20–80% SOC	SOC error from 2 mV voltage noise
NMC 811	~250 mV	~0.8% SOC
NMC 622	~220 mV	~0.9% SOC
LFP	~30 mV	~6–8% SOC

In the plateau region, which is most of normal operating SOC for LFP, a 2 mV voltage measurement error corresponds to a 6–8% SOC uncertainty. The Kalman filter's voltage-based correction becomes nearly useless. The filter falls back to Coulomb counting.

This means for LFP packs: current sensor quality is everything. A high-quality shunt with low DC offset, temperature-compensated, with a good zero-current detection algorithm, is more important for SOC accuracy than any algorithm sophistication.

Cell Balancing — Passive vs Active

A series pack carries the same current through every cell. But cells are never identical — manufacturing tolerances create small capacity and resistance differences at birth, and different temperatures and usage patterns cause them to age at different rates. SOC diverges.

Why it matters: The cell with the lowest SOC hits the discharge voltage limit first, terminating the discharge for the entire pack. All the energy in the other cells above that cell's SOC level is stranded — inaccessible until the weak cell is recharged. Over time, imbalance directly reduces usable pack capacity.

Passive balancing

The most common approach: switch a resistor across cells that are above the target SOC, bleeding their excess charge as heat until they match the lowest cell.

t_balance = (ΔSOC × Q_cell) / I_balance

At 100 mA balancing current on a 90 Ah cell with 3% SOC imbalance: t = (0.03 × 90) / 0.1 = 27 hours. Passive balancing is slow. It corrects gradual drift but cannot fix a large initial imbalance quickly. And it wastes energy as heat.

It always runs at the top of charge — when cells are fully charged and the voltage spread between cells is most visible.

Active balancing

Rather than dissipating excess energy, active balancing transfers it from high cells to low cells using inductors, capacitors, or transformers. Efficiency of 80–92% is typical — significantly better than the 0% efficiency of passive balancing.

Active balancing is faster and wastes less energy, but it costs significantly more and adds circuit complexity. For well-matched prismatic cells in commercial packs with good thermal management, passive balancing is usually sufficient. Where cells see large temperature gradients — and therefore differential aging — active balancing starts to make economic sense.

Battery Balancing: A Crucial Function of Battery Management Systems | Article | MPS

Image source : Monolithic power systems

Protection Architecture — Two Layers, One Purpose

BMS protection operates at two independent levels. This separation is not an accident — it exists because software can fail.

Layer 1 — Hardware (AFE)

The Analog Front End IC measures cell voltages and temperatures directly and can trigger a contactor open signal independently of the microcontroller. If the microcontroller hangs, crashes, or is running the wrong firmware version, the hardware protection still operates.

Typical thresholds for LFP:

Fault	Threshold	Action
Cell overvoltage	>3.65V	Immediate contactor open
Cell undervoltage	<2.50V	Immediate contactor open
Overtemperature (charge)	>45°C	Stop charging
Overtemperature (discharge)	>60°C	Immediate contactor open
Short circuit	>3–5C instantaneous	Immediate contactor open

Layer 2 — Firmware (graduated response)

The firmware implements a four-level alert ladder rather than binary trip or no-trip:

Warning: Threshold is being approached. Log, broadcast on CAN. No action.
Derate: Sustained approach. Reduce available power by 20–40%. Continue operation.
Controlled shutdown: Active fault. Ramp power down via CAN, then open contactors after current reaches zero.
Immediate trip: Hardware limit exceeded. Open contactors now.

The graduated ladder prevents the sudden power cut that is unacceptable in a vehicle carrying passengers. The driver gets a warning, then reduced performance, before a shutdown — not a cliff.

State of Health — The Slow Drift

SOC tells you how full the battery is right now. SOH — State of Health — tells you how much of the original capacity is still available at all.

SOH = Q_actual / Q_nominal × 100%

A cell at 85% SOH holds 85% of its original energy. The end-of-life threshold for automotive applications is typically 80% SOH — not because the cell is dead, but because range loss and power fade at that point become unacceptable for the application.

The BMS tracks SOH primarily through two indicators:

Capacity fade: Over many charge-discharge cycles, the BMS compares how many ampere-hours it took to charge the pack against the SOC change. As capacity fades, the same SOC swing requires fewer Ah — and the BMS updates Q_max accordingly.

Resistance growth: The BMS periodically calculates DC internal resistance from voltage steps during current transitions. As cells age, resistance grows — this reduces power capability even when capacity is still healthy.

SOH tracking requires accurate data across hundreds of cycles. It is a slow-moving estimate compared to SOC, updated over days and weeks rather than milliseconds.

How the BMS Communicates

The BMS talks to the vehicle controller, charger, thermal system, and dashboard over a CAN bus — a digital communication network running through the vehicle.

In commercial vehicles, this follows the SAE J1939 protocol with 29-bit identifiers. Every message has a defined ID, timing, and data format. The BMS transmits:

Pack voltage, current, and SOC at 100 ms intervals
Individual cell voltages at 500 ms intervals
Temperature readings at 1000 ms intervals
Maximum charge and discharge power limits at 100 ms intervals
Fault alerts immediately on occurrence, then repeated every 20 ms until resolved

The power limits — often called SOP (State of Power) — are critical. They tell the motor controller exactly how hard it can push the battery at any instant. Too aggressive and cells get damaged. Too conservative and performance suffers. The BMS updates these limits continuously based on SOC, temperature, and aging state.

Key Takeaways

SOC cannot be directly measured — it is estimated from voltage and current using a combination of Coulomb counting, OCV lookup, and Kalman filtering.
Coulomb counting is the continuous backbone but drifts without correction. OCV lookup corrects the drift but only works at rest. The Kalman filter fuses both in real time.
LFP's flat OCV plateau makes voltage-based SOC correction nearly useless during normal operation. Current sensor quality becomes the dominant factor in SOC accuracy.
Cell balancing corrects SOC divergence between cells — passive by dissipating heat, active by transferring energy. Slow drift is handled by passive; large imbalance or high thermal gradients warrant active.
Protection is always two-layer: hardware AFE that operates independently of software, and firmware with a graduated alert ladder.
SOH tracks the long-term health of the pack — capacity fade and resistance growth — updated slowly across hundreds of cycles.

References

Plett, G.L. (2015). Battery Management Systems, Volume I & II. Artech House.
Hu, X. et al. (2012). A Comparative Study of Equivalent Circuit Models for Li-Ion Batteries. Journal of Power Sources, 198, 359–367. https://doi.org/10.1016/j.jpowsour.2011.10.013
Dubarry, M. et al. (2012). Identify Capacity Fading Mechanism in a Commercial LiFePO4 Cell. Journal of Power Sources, 214, 46–56.
SAE International. (2015). SAE J1939-21: Data Link Layer.
IEC 62133-2:2017. Secondary Cells and Batteries Containing Alkaline or Other Non-Acid Electrolytes.

This is the Intermediate level of the VoltPulse BMS series.

← Previous: Your EV Has a Brain. It's Called the BMS — Basic

→ Next: Where BMS Implementations Actually Break — Expert — the failure modes, tradeoffs, and design decisions that separate deployed systems from textbook diagrams.

Published on VoltPulse — the most technically rigorous source for battery technology and EV engineering coverage.

How a BMS Knows What It Cannot See

Table of Contents

The Core Challenge — You Cannot Measure SOC

Method 1 — Coulomb Counting

Method 2 — OCV Lookup

Method 3 — The Kalman Filter

Why LFP Makes All of This Harder

Cell Balancing — Passive vs Active

Passive balancing

Active balancing

Protection Architecture — Two Layers, One Purpose

Layer 1 — Hardware (AFE)

Layer 2 — Firmware (graduated response)

State of Health — The Slow Drift

How the BMS Communicates

Key Takeaways

References

Sai Chaitanya Dasari

Part of the deepdive Series

Similar Topics

Table of Contents🔗

The Core Challenge — You Cannot Measure SOC🔗

Method 1 — Coulomb Counting🔗

Method 2 — OCV Lookup🔗

Method 3 — The Kalman Filter🔗

Why LFP Makes All of This Harder🔗

Cell Balancing — Passive vs Active🔗

Passive balancing🔗

Active balancing🔗

Protection Architecture — Two Layers, One Purpose🔗

Layer 1 — Hardware (AFE)🔗

Layer 2 — Firmware (graduated response)🔗

State of Health — The Slow Drift🔗

How the BMS Communicates🔗

Key Takeaways🔗

References🔗

Sai Chaitanya Dasari

Part of the deepdive Series

Similar Topics

Newsletter

Table of Contents

The Core Challenge — You Cannot Measure SOC

Method 1 — Coulomb Counting

Method 2 — OCV Lookup

Method 3 — The Kalman Filter

Why LFP Makes All of This Harder

Cell Balancing — Passive vs Active

Passive balancing

Active balancing

Protection Architecture — Two Layers, One Purpose

Layer 1 — Hardware (AFE)

Layer 2 — Firmware (graduated response)

State of Health — The Slow Drift

How the BMS Communicates

Key Takeaways

References