How Good Is Your Sleep Tracker, Really? Wearables Tested Head-to-Head in Older Adults
A new head-to-head validation of consumer sleep wearables and nearables shows where the devices earn their place on your nightstand — and where, especially after midlife, they quietly overreach.
The morning ritual is now familiar to millions of professionals: before the first coffee, before the inbox, a glance at the phone to see how well — or how badly — the night went. A ring, a watch, a mat under the mattress have all promised to translate eight hours of unconsciousness into a tidy score. But a new head-to-head evaluation of four popular consumer sleep devices, measured against the clinical gold standard, suggests the numbers on your screen deserve a more skeptical read, particularly if you are past fifty.
Researchers at the University of Massachusetts Amherst put two wearables — the Fitbit Sense 2 and the Oura Ring — and two nearables — the Withings Sleep Mat and the Sleep Score Max — through a single overnight test in a sleep lab, comparing each device's output to polysomnography, the multi-sensor reference standard used in clinical sleep medicine. Thirteen young adults aged 19 to 24 and nineteen older adults aged 56 to 80 wore or slept beside the devices for one night. The question was straightforward: how closely do consumer trackers match the lab — and does that accuracy hold up as we age?
The short answer, drawn from the published performance evaluation, is that devices that look broadly reasonable in young adults become noticeably less reliable in older ones. The pattern is consistent enough to matter for anyone using a tracker to make decisions about recovery, training load, or whether last night's sleep is something to worry about.
What the lab saw
In the older cohort, every device underestimated total sleep time. Fitbit shorted the night by an average of roughly 74 minutes; Oura by about 75; Sleep Score Max by around 56; the Withings mat by roughly 46. That is not a rounding error. For an executive checking whether they cleared seven hours, a 75-minute miss can be the difference between a green ring and a red one.
Wake after sleep onset — the minutes spent awake once the night has begun — was also undercounted, by as much as 71 minutes in the case of Sleep Score Max and 44 for Fitbit. Meanwhile, the more flattering metric — deep sleep — tended to go the other way. Oura, Withings and Sleep Score Max overestimated deep sleep by between roughly 71 and 97 minutes in older adults. Across the board, the devices struggled to correctly classify individual sleep stages, with deep sleep the hardest call to make.
Polysomnography remains the reference standard against which consumer devices were measured.
Why aging changes the math
The researchers framed the study around a sober fact: changes in sleep architecture with age are associated with risk for Alzheimer's and other neurological diseases, with accidents, and with broader health decline. That is precisely why continuous monitoring at home is appealing — and why misreading the signal in older adults is the worst place to be wrong.
Older sleep tends to be lighter, more fragmented, and shorter in deep stages. Consumer devices infer those stages from heart rate, heart-rate variability, movement and, in the case of mats, ballistocardiography. The algorithms behind them were largely trained on the young. When the underlying physiology shifts — quieter heart-rate signals, more micro-arousals, subtler movement patterns — the model's assumptions get shakier. The result, in this evaluation, is a tracker that tells a sixty-five-year-old they slept less than they did, woke less than they did, and went deeper than they did.
A tracker that flatters your deep sleep while shorting your total is the wrong kind of wrong. PinnacleLife analysis
How to read your own data more honestly
None of this means the gadget on your wrist or finger is useless. Even imperfect trackers can surface patterns over weeks — bedtime drift, weekend recovery debt, the cost of a late dinner or a third glass of wine — that no annual physical will catch. The trouble starts when a single night's score is treated as a verdict.
The evidence here is moderate, not definitive: one night of measurement, modest sample sizes, and a particular generation of devices. Algorithms update; newer firmware may narrow some of these gaps. Still, the direction of the misses is consistent enough that the practical posture is clear. Trends, not nightly numbers. Total sleep time and consistency, not stage breakdowns. And for anyone whose sleep has changed meaningfully — new fragmentation, persistent daytime fatigue, loud snoring — a conversation with a clinician, not a firmware update, is the right next step.
- Expect undercounted sleep. In older adults, every device tested underestimated total sleep time, by roughly 45 to 75 minutes on average.
- Distrust the deep-sleep number. Three of four devices materially overestimated deep sleep; stage-level accuracy was the weakest area overall.
- Age widens the gap. Devices were less accurate in older adults than in young adults across commonly reported measures.
- Use trends, not verdicts. Week-over-week consistency and total time are more defensible signals than any single night's score.
- Escalate persistent issues. Ongoing fragmentation, fatigue or breathing concerns belong with a clinician, not an app.
The promise of consumer sleep tech has always been democratization: a clinic on your nightstand, quietly logging the most important third of your day. This evaluation does not retire that promise. It refines it. The wearables and nearables on the market are useful instruments for noticing change in yourself over time. They are not yet, especially in the readers who arguably need them most, reliable narrators of a single night.
Sources
- Performance evaluation of consumer sleep-tracking wearables and nearables in healthy young and older adults. — Sleep advances : a journal of the Sleep Research Society