Where Q_remaining is the energy currently stored and Q_max is the maximum the pack can hold at its current age and temperature.
The problem: neither Q_remaining nor Q_max has a sensor. You cannot probe a lithium cell and get a direct readout of how full it is. The BMS has to infer SOC from things it can measure — voltage, current, and temperature — using mathematical models.
This is the fundamental challenge of BMS engineering. Every other problem flows downstream from this one.
The BMS uses three methods in combination. None of them is sufficient alone.
Method 1 — Coulomb Counting🔗
The simplest method: measure current flowing in and out of the battery and integrate it over time.
CopySOC(t) = SOC(t0) + (1 / Q_max) × ∫ I(t) dt
If you start at 80% SOC and 10 amperes flows out for 1 hour from a 100 Ah cell, you've used 10 Ah — so you're now at 70% SOC. Simple arithmetic.
Why it works: It's continuous. It updates every millisecond. It doesn't care what the load profile looks like — city driving, highway, aggressive regen. It tracks all of it.
Why it drifts: The integral starts from an initial SOC estimate. If that estimate is wrong by 3%, every subsequent reading is wrong by 3% — permanently, until you get a correction from another method. Worse, current sensors have small DC offsets — a 50 mA bias on a 100 Ah cell accumulates 1.2 Ah of error per day. Over a week of parking, that's a visible SOC error.
Coulomb counting is not a standalone estimator. It is a predictor that needs periodic correction.
Method 2 — OCV Lookup🔗
Open Circuit Voltage (OCV) is the voltage a cell settles to after resting with zero current for a sufficient period. At rest, the electrochemical reactions inside the cell reach equilibrium, and the voltage that results is a reliable, repeatable function of SOC.
The BMS stores an OCV-SOC lookup table — measured during cell characterization at the factory — and uses it to correct Coulomb counting whenever the vehicle is parked and the pack has rested.
Why it works: OCV is a thermodynamic property. It doesn't drift. It doesn't accumulate error. One clean OCV reading after a 15-minute rest resets whatever error Coulomb counting has accumulated.
The catch: The cell must be at rest. Under load, the measured voltage is not OCV — it's OCV minus voltage drops from internal resistance and polarization effects. Using terminal voltage during driving as if it were OCV will give large SOC errors.
For most chemistries (NMC, NCA), the OCV-SOC curve has a useful slope that makes OCV a reliable SOC indicator across most of the range. For LFP — which dominates commercial EVs in India — the curve is nearly flat between 20–80% SOC. More on this shortly.
Image source : Batterydesign.net
Method 3 — The Kalman Filter🔗
The Kalman filter is the algorithm that fuses Coulomb counting and voltage measurement into a single optimal SOC estimate in real time — including while the vehicle is driving.
To understand it, you need a simple mental model.
The BMS runs a mathematical model of the cell — equations that predict what the cell voltage should be at any given SOC, current, and temperature. At every timestep, it does two things:
Predict: Use the model to predict the current state (SOC) based on the last state and the current flowing. This is Coulomb counting. Error accumulates.
Update: Compare the predicted cell voltage against the measured cell voltage. The difference — called the innovation — tells the filter how wrong its prediction is. It corrects the SOC estimate proportionally.
The key insight: how much to trust the correction depends on the slope of the OCV-SOC curve at the current SOC. A steep slope means a small voltage error maps to a small SOC correction — the measurement is informative. A flat slope means a small voltage error could mean a large SOC range — the measurement is not informative, and the filter correctly trusts its model more.
This is expressed through the Kalman gain:
CopyK = P × H^T × (H × P × H^T + R)^−1
Where H is the slope of the OCV curve (dV/dSOC), P is the estimate uncertainty, and R is the measurement noise. When H is large (steep OCV curve), K is large — trust the measurement. When H is small (flat OCV curve, i.e., LFP plateau), K is small — trust the model.
The result is an SOC estimate that is more accurate than Coulomb counting alone, more continuous than OCV lookup alone, and self-correcting over time.
image source : Wireless pi
Why LFP Makes All of This Harder🔗
Lithium Iron Phosphate (LFP) is the dominant chemistry in Indian commercial EVs — buses, trucks, LCVs — because of its safety, cycle life, and cost. It is also the hardest chemistry to estimate SOC for.
The reason is the flat OCV plateau:
Chemistry
OCV change across 20–80% SOC
SOC error from 2 mV voltage noise
NMC 811
~250 mV
~0.8% SOC
NMC 622
~220 mV
~0.9% SOC
LFP
~30 mV
~6–8% SOC
In the plateau region, which is most of normal operating SOC for LFP, a 2 mV voltage measurement error corresponds to a 6–8% SOC uncertainty. The Kalman filter's voltage-based correction becomes nearly useless. The filter falls back to Coulomb counting.
This means for LFP packs: current sensor quality is everything. A high-quality shunt with low DC offset, temperature-compensated, with a good zero-current detection algorithm, is more important for SOC accuracy than any algorithm sophistication.
Cell Balancing — Passive vs Active🔗
A series pack carries the same current through every cell. But cells are never identical — manufacturing tolerances create small capacity and resistance differences at birth, and different temperatures and usage patterns cause them to age at different rates. SOC diverges.
Why it matters: The cell with the lowest SOC hits the discharge voltage limit first, terminating the discharge for the entire pack. All the energy in the other cells above that cell's SOC level is stranded — inaccessible until the weak cell is recharged. Over time, imbalance directly reduces usable pack capacity.
Passive balancing🔗
The most common approach: switch a resistor across cells that are above the target SOC, bleeding their excess charge as heat until they match the lowest cell.
Copyt_balance = (ΔSOC × Q_cell) / I_balance
At 100 mA balancing current on a 90 Ah cell with 3% SOC imbalance: t = (0.03 × 90) / 0.1 = 27 hours. Passive balancing is slow. It corrects gradual drift but cannot fix a large initial imbalance quickly. And it wastes energy as heat.
It always runs at the top of charge — when cells are fully charged and the voltage spread between cells is most visible.
Active balancing🔗
Rather than dissipating excess energy, active balancing transfers it from high cells to low cells using inductors, capacitors, or transformers. Efficiency of 80–92% is typical — significantly better than the 0% efficiency of passive balancing.
Active balancing is faster and wastes less energy, but it costs significantly more and adds circuit complexity. For well-matched prismatic cells in commercial packs with good thermal management, passive balancing is usually sufficient. Where cells see large temperature gradients — and therefore differential aging — active balancing starts to make economic sense.
Image source : Monolithic power systems
Protection Architecture — Two Layers, One Purpose🔗
BMS protection operates at two independent levels. This separation is not an accident — it exists because software can fail.
Layer 1 — Hardware (AFE)🔗
The Analog Front End IC measures cell voltages and temperatures directly and can trigger a contactor open signal independently of the microcontroller. If the microcontroller hangs, crashes, or is running the wrong firmware version, the hardware protection still operates.
Typical thresholds for LFP:
Fault
Threshold
Action
Cell overvoltage
>3.65V
Immediate contactor open
Cell undervoltage
<2.50V
Immediate contactor open
Overtemperature (charge)
>45°C
Stop charging
Overtemperature (discharge)
>60°C
Immediate contactor open
Short circuit
>3–5C instantaneous
Immediate contactor open
Layer 2 — Firmware (graduated response)🔗
The firmware implements a four-level alert ladder rather than binary trip or no-trip:
Warning: Threshold is being approached. Log, broadcast on CAN. No action.
Derate: Sustained approach. Reduce available power by 20–40%. Continue operation.
Controlled shutdown: Active fault. Ramp power down via CAN, then open contactors after current reaches zero.
Immediate trip: Hardware limit exceeded. Open contactors now.
The graduated ladder prevents the sudden power cut that is unacceptable in a vehicle carrying passengers. The driver gets a warning, then reduced performance, before a shutdown — not a cliff.
State of Health — The Slow Drift🔗
SOC tells you how full the battery is right now. SOH — State of Health — tells you how much of the original capacity is still available at all.
CopySOH = Q_actual / Q_nominal × 100%
A cell at 85% SOH holds 85% of its original energy. The end-of-life threshold for automotive applications is typically 80% SOH — not because the cell is dead, but because range loss and power fade at that point become unacceptable for the application.
The BMS tracks SOH primarily through two indicators:
Capacity fade: Over many charge-discharge cycles, the BMS compares how many ampere-hours it took to charge the pack against the SOC change. As capacity fades, the same SOC swing requires fewer Ah — and the BMS updates Q_max accordingly.
Resistance growth: The BMS periodically calculates DC internal resistance from voltage steps during current transitions. As cells age, resistance grows — this reduces power capability even when capacity is still healthy.
SOH tracking requires accurate data across hundreds of cycles. It is a slow-moving estimate compared to SOC, updated over days and weeks rather than milliseconds.
How the BMS Communicates🔗
The BMS talks to the vehicle controller, charger, thermal system, and dashboard over a CAN bus — a digital communication network running through the vehicle.
In commercial vehicles, this follows the SAE J1939 protocol with 29-bit identifiers. Every message has a defined ID, timing, and data format. The BMS transmits:
Pack voltage, current, and SOC at 100 ms intervals
Individual cell voltages at 500 ms intervals
Temperature readings at 1000 ms intervals
Maximum charge and discharge power limits at 100 ms intervals
Fault alerts immediately on occurrence, then repeated every 20 ms until resolved
The power limits — often called SOP (State of Power) — are critical. They tell the motor controller exactly how hard it can push the battery at any instant. Too aggressive and cells get damaged. Too conservative and performance suffers. The BMS updates these limits continuously based on SOC, temperature, and aging state.
Key Takeaways🔗
SOC cannot be directly measured — it is estimated from voltage and current using a combination of Coulomb counting, OCV lookup, and Kalman filtering.
Coulomb counting is the continuous backbone but drifts without correction. OCV lookup corrects the drift but only works at rest. The Kalman filter fuses both in real time.
LFP's flat OCV plateau makes voltage-based SOC correction nearly useless during normal operation. Current sensor quality becomes the dominant factor in SOC accuracy.
Cell balancing corrects SOC divergence between cells — passive by dissipating heat, active by transferring energy. Slow drift is handled by passive; large imbalance or high thermal gradients warrant active.
Protection is always two-layer: hardware AFE that operates independently of software, and firmware with a graduated alert ladder.
SOH tracks the long-term health of the pack — capacity fade and resistance growth — updated slowly across hundreds of cycles.
References🔗
Plett, G.L. (2015). Battery Management Systems, Volume I & II. Artech House.