Eye Tracking Explained: How Gaze Data Is Transforming UX Research, Healthcare, and AI
Eye tracking reveals what people actually look at — not what they say they look at. Here's a clear breakdown of how the technology works, what the data means, and why gaze is becoming one of the most powerful signals in product design and AI research.
Why Should You Care?
Eye tracking is moving from expensive lab hardware to browser-native tools accessible to any product team. As webcam-based gaze tracking matures — and as AR headsets put sensors millimeters from users' eyes — gaze data is becoming one of the most honest, unfiltered behavioral signals available to researchers and designers. Understanding how it works, what the data actually tells you, and where it breaks is essential for anyone doing serious UX research or building products for the next generation of interfaces.
Key Takeaways
- Eye tracking measures where a person looks, for how long, and in what sequence — revealing attention, confusion, and intent that self-reported feedback misses
- Modern eye tracking uses infrared or webcam-based techniques to estimate gaze coordinates from pupil and corneal reflection patterns
- The four core metrics — fixations, saccades, pupil dilation, and scan paths — each tell a different part of the behavioral story
- Applications span UX research, accessibility, healthcare diagnosis, gaming, AR/VR, and AI training data collection
- Webcam-based gaze tracking has made the technology accessible without hardware — but accuracy, lighting, and calibration remain real constraints
Users say they love your design. The heatmap shows they never looked at the CTA. A radiologist misses a tumour that was in their visual field for two seconds. A gamer's focus narrows to a single screen region under pressure. Eye tracking captures all of it — the gap between where people think they look and where their eyes actually go. That gap is where the most useful behavioral insights live.
What is Eye Tracking?
Eye tracking is the measurement of where a person's eyes are directed — their point of gaze — over time. It captures not just the final landing point but the full path: every fixation, every jump, every return. The result is a timestamped record of visual attention that can be correlated with on-screen content, user actions, and physiological signals.
Why gaze is different from other behavioral signals
Quick Answer
Behavioral signals like clicks and scroll depth tell you what users did. Eye tracking tells you what they considered — including things they noticed but didn't act on.
What eye tracking captures that other methods can't:
• Attention before action — what a user looked at before clicking, scrolling, or abandoning
• Confusion — prolonged fixation or repeated returns to the same area
• Ignored elements — CTAs, error messages, or onboarding prompts users never saw
• Reading patterns — how users scan text, menus, and form fields
• Peripheral awareness — what was noticed but not focused on
The core insight: People cannot accurately report what they looked at. Self-reported attention is unreliable — participants describe what they expected to look at, or what they think they should have looked at. Eye tracking bypasses self-reporting entirely and captures the ground truth of visual behavior.
The complement: Eye tracking answers WHERE and WHEN. It doesn't answer WHY. That's why the most powerful research combines gaze data with think-aloud audio (what users are saying), facial sentiment (what they're feeling), and behavioral data (what they ultimately did).
How Eye Tracking Works
Modern eye tracking systems use one of three primary approaches: infrared (IR) illumination-based tracking, webcam-based estimation, and electrooculography (EOG). Each involves a tradeoff between accuracy, cost, and deployment context.
Infrared (IR) Eye Tracking
Quick Answer
The gold standard for research-grade accuracy. Near-infrared light creates a stable corneal reflection pattern that, combined with pupil detection, enables precise gaze estimation.
How it works:
1. Near-infrared LEDs illuminate the eye at wavelengths invisible to humans but detectable by the tracking camera
2. IR light creates a bright, stable reflection on the cornea (the 'glint') and illuminates the pupil
3. Computer vision algorithms track the relative position of the pupil center and the corneal reflection
4. The vector between pupil and glint is used to calculate the gaze direction — which changes predictably as the eye rotates
5. A calibration step (typically 5–9 points) maps gaze direction to screen coordinates for each individual user
Why IR works well: The corneal reflection is stable and independent of ambient lighting variations. The pupil-glint vector is a reliable proxy for eye rotation that holds across different head positions within the tracking volume.
Hardware examples: Tobii Pro Fusion (research, 250Hz), Pupil Labs Neon (wearable, glasses-mounted), Apple Vision Pro (inside-out eye tracking for foveal rendering), Meta Quest Pro (social eye contact features).
Typical accuracy: 0.3°–0.5° of visual angle under controlled conditions — roughly the size of your thumbnail at arm's length. In practice, accuracy degrades with head movement, contact lenses, and environmental lighting changes.
Webcam-Based Gaze Estimation
Quick Answer
Uses a standard webcam and machine learning models to estimate gaze direction from facial landmarks, pupil position, and head pose — no special hardware required.
How it works:
1. A computer vision pipeline detects the face and extracts facial landmarks (eye corners, iris center, eyelid boundaries) using a standard RGB webcam
2. Head pose is estimated from 3D facial geometry — how the head is tilted and rotated in space
3. A neural network combines eye appearance features and head pose to predict gaze direction
4. Calibration (5–9 points) maps estimated gaze to screen coordinates
Key tools:
• WebGazer.js — open-source, browser-native, runs entirely client-side with no data sent to a server
• GazeRecorder — commercial SaaS platform for unmoderated studies
• Inferring gaze from MediaPipe Face Mesh — used in custom pipelines
Accuracy vs. IR: Webcam-based tracking typically achieves 1°–3° accuracy under good conditions — 2–6x less precise than IR hardware. That translates to roughly 50–150 pixels of uncertainty on a standard monitor at normal viewing distance. Enough to identify which section of a page a user was looking at, but not which specific word or UI element within a dense layout.
What degrades accuracy:
• Poor lighting — especially uneven or low-contrast conditions
• Glasses (reflections), contact lenses (pupil distortion), heavy eyeliner
• Extreme head angles or head movement outside the camera frame
• Low-resolution webcams or cameras positioned off-axis
• Incorrect calibration or calibration drift over time
Why it matters despite the accuracy gap: Webcam gaze tracking enables unmoderated research at scale — participants can join a study from any browser without downloading software or hardware. For identifying large-scale attention patterns across page sections, it's a practical tool that didn't exist at accessible price points five years ago.
Electrooculography (EOG)
Quick Answer
EOG measures the electrical potential difference between the cornea and retina using skin electrodes placed around the eyes — capturing eye movements as an electrical signal rather than a visual one.
How it works: The eye has a standing electrical potential — the cornea is electrically positive relative to the retina. As the eye rotates, this dipole moves, and electrodes placed around the eye detect the resulting voltage change.
Where it's used:
• Clinical settings for sleep studies (detecting REM vs. NREM sleep stages)
• Accessibility devices for users with motor disabilities (eye-controlled communication boards)
• Wearable research platforms where cameras aren't practical
• Measuring blink detection and saccade timing in vision science
Limitations: EOG captures movement amplitude and direction but can't easily determine where on a screen the user is looking without calibration. It's rarely used in standard UX research but remains important in medical and accessibility contexts.
The Four Core Eye Tracking Metrics
Raw gaze data — a time-series of x,y screen coordinates at 30–1000 samples per second — isn't directly interpretable. It must be processed into meaningful events. Four core metrics form the foundation of most eye tracking analysis.
Fixations
Quick Answer
A fixation is a period of relatively stable gaze on a specific area — typically defined as less than 1° of gaze movement for more than 100–200ms. Fixations are where visual processing actually happens.
What fixations tell you:
• Duration: Longer fixations indicate deeper processing — either high interest or confusion. Short fixations suggest efficient scanning or familiarity.
• Count: More fixations on an element = more attention devoted to it
• First fixation: Which elements capture attention first — a measure of visual salience and layout effectiveness
• Time to first fixation: How long before the user noticed a specific element — critical for CTAs and error messages
The fixation paradox: A long fixation on your CTA button could mean the user is carefully reading and deciding to click — or it could mean the button's label is ambiguous and the user is trying to understand what it does. Fixation duration alone doesn't distinguish interest from confusion. You need the full behavioral context to interpret it correctly.
Saccades
Quick Answer
Saccades are rapid eye movements between fixations — the 'jumps' that reposition the fovea from one point of interest to another. During a saccade, vision is suppressed — we are essentially blind between fixations.
What saccades tell you:
• Scan path: The sequence of saccades reveals the order in which elements were processed — does attention flow logically through your layout hierarchy?
• Regressions: Backwards saccades (returning to content already passed) indicate confusion, missed information, or re-evaluation
• Amplitude: Large saccades suggest users are jumping between distant elements; small saccades indicate detailed local reading
F-pattern and Z-pattern reading: Heatmap research on web pages consistently shows users scan in predictable patterns — F-patterns (horizontal then vertical) in text-heavy content, Z-patterns in marketing pages. Eye tracking was the tool that discovered these patterns, and they continue to inform layout decisions.
Pupil Dilation
Quick Answer
The pupil constricts in response to light and dilates in response to cognitive load — effortful mental processing causes measurable pupil enlargement independent of lighting conditions.
What pupil dilation tells you:
• Cognitive load: Greater task difficulty or mental effort → larger pupil
• Arousal and emotional response: Surprising, engaging, or emotionally salient content produces dilation
• Decision moments: Pupil size peaks at high-uncertainty decision points and relaxes after a choice is made
Critical caveat: Pupil data must always be baseline-corrected. Each person has a different resting pupil size. Changes from each individual's personal baseline are what matter — not absolute diameter. Lighting also powerfully drives pupil response, so controlled or consistent lighting is essential for valid pupil data.
Why it matters for UX: Pupil dilation can reveal friction that users don't verbalize. A checkout flow where pupil size consistently peaks at the shipping cost reveal step is a measurable signal — even if users don't mention it in post-session interviews.
Blink Rate and Scan Paths
Quick Answer
Blink rate decreases when attention and cognitive engagement are high — suppressed blinking is an involuntary response to visual focus. Scan paths show the sequential route attention takes through an interface.
Blink rate as an engagement signal:
Normal blink rate is 15–20 blinks/minute at rest. Intense visual focus — reading, watching video, playing a game — suppresses blinking to 3–8/minute. High blink rate indicates low engagement or fatigue. This signal is used in driver monitoring systems (drowsiness detection) and attention monitoring in eLearning platforms.
Scan paths for layout evaluation:
Recording the full sequence of fixations — not just the heatmap aggregate — reveals whether individual users process your layout in the intended order. A well-designed page guides the eye from headline → subheading → value proposition → CTA in a deliberate sequence. Scan path analysis shows whether that sequence actually happens, or whether attention is hijacked by a competing visual element before the intended flow is complete.
Real-World Applications
Eye tracking has moved from vision science labs into commercial product development, clinical medicine, accessibility technology, and AI training pipelines.
UX Research and Product Design
Quick Answer
Eye tracking is the closest thing UX research has to reading a user's mind — it reveals what they considered before they acted, and what they never noticed at all.
Core use cases:
• Heatmap generation: Aggregate fixation data across participants to identify hotspots of attention — and cold zones where important content goes unnoticed
• First fixation analysis: Is the CTA the first thing users notice, or does something else hijack attention?
• Navigation path analysis: Do users follow the intended flow through the product, or do they take unexpected routes?
• Form and checkout optimization: Where do users hesitate, re-read, or abandon in multi-step flows?
• A/B test enrichment: Two layouts with the same conversion rate may have completely different attention patterns — eye tracking reveals which version builds more intentional engagement
The think-aloud combination: Think-aloud protocol — where users narrate what they're thinking during a session — combined with gaze data is particularly powerful. The gaze shows what triggered a thought; the audio captures what the thought was. Neither is complete without the other.
Healthcare and Clinical Applications
Quick Answer
Eye movements are a window into neurological and cognitive function — many conditions alter gaze behavior in measurable, diagnostically useful ways.
Diagnostic applications:
• Autism Spectrum Disorder (ASD) screening: Children with ASD show different social gaze patterns — reduced fixation on eyes and faces relative to objects. Eye tracking enables earlier, more objective screening than behavioral observation alone.
• ADHD assessment: Gaze instability and increased saccade variability correlate with ADHD diagnosis
• Parkinson's disease: Abnormal saccade patterns — particularly square-wave jerks and antisaccade errors — are detectable markers of Parkinson's and related disorders
• Concussion and traumatic brain injury: Post-concussion gaze instability can be quantified objectively — unlike self-reported symptom scales
• Radiology training: Expert radiologists fixate differently from novices — tracking gaze during image review reveals where diagnostic errors occur and informs training
Ophthalmology: Eye tracking is foundational to diagnosing conditions like nystagmus, strabismus, and vergence disorders — quantifying movements that clinicians previously estimated by observation.
Accessibility and Assistive Technology
Quick Answer
For users with severe motor disabilities, the eyes are often the highest-bandwidth communication channel available. Eye-controlled interfaces enable text entry, device control, and communication at meaningful speeds.
Eye-controlled interfaces:
• Communication boards for ALS, locked-in syndrome, and cerebral palsy patients
• Tobii Dynavox and similar AAC (augmentative and alternative communication) devices
• Windows Eye Control (built into Windows 10/11) enabling PC control via gaze
• Apple's Switch Control with gaze support on iPhone
The technology stack: Specialized AAC eye tracking hardware typically runs at 60–120Hz with high accuracy optimized for dwell-based selection (the user fixates on a target for a set duration to 'click' it) — a fundamentally different interaction model from cursor control.
Gaming accessibility: Eye tracking allows players with motor disabilities to aim, navigate menus, and interact in games that would otherwise require a controller. Tobii's integration with PC games — and the PS5's accessibility controller — are expanding this space.
Gaming and Interactive Entertainment
Quick Answer
Eye tracking enables foveated rendering, dynamic difficulty adjustment, and new interaction mechanics — and is now shipping in consumer hardware at scale.
Foveated rendering: The human eye only sees in high resolution at the fovea — a small central region of the retina. The peripheral vision is much lower resolution than we subjectively experience. Foveated rendering exploits this: render only the region the user is currently looking at in full resolution, and reduce quality in the periphery where the user won't notice. This can dramatically reduce GPU load in VR — enabling higher frame rates or visual fidelity without proportional compute cost. Eye tracking enables dynamic, gaze-contingent foveated rendering — a key optimization for wireless VR headsets with power constraints.
Interaction mechanics:
• Aim assist calibrated to where you're looking
• Enemy awareness and alerting systems responding to player attention
• Dialogue systems that respond differently based on what you looked at in a conversation
• Dynamic tutorials that trigger only for elements the player hasn't fixated on yet
Platform integration: PlayStation VR2, Meta Quest Pro, and Apple Vision Pro all include eye tracking hardware. Eye contact in avatar-to-avatar social VR — driven by real gaze data — is a meaningful social presence improvement over animated approximations.
Marketing Research and Advertising
Quick Answer
Where does attention go on a shelf, an ad, or a website? Eye tracking in marketing answers the question that brand managers have always asked — does anyone actually see our logo?
Shelf and retail research: In-store eye tracking studies — using mobile eye tracking glasses — reveal which products catch attention on a crowded shelf, whether a price label is noticed before or after the product, and how the eye navigates the shopping decision. This directly informs packaging design, shelf positioning, and in-store display strategy.
Advertising effectiveness:
• Which elements of a print or digital ad are fixated on, and in what order
• Whether the brand name and logo receive fixation before attention leaves the ad
• How attention differs between ad formats and placements
• Where attention falls in video — are viewers looking at the product or the background?
Website and e-commerce:
• Above-the-fold attention: what do users see before they scroll?
• Product image viewing patterns on PDPs (product detail pages)
• Search results: which listings receive fixation vs. which are skipped entirely
AI and Machine Learning
Quick Answer
Eye tracking data is becoming a valuable training signal for AI models — teaching them what humans consider visually important, and enabling gaze-conditioned model behavior.
Gaze as a training signal:
• Saliency model training: Where humans fixate in an image is the ground truth for visual saliency — the regions that a model should attend to. Eye tracking datasets enable supervised saliency prediction, which improves image compression, thumbnail generation, and visual search ranking.
• Medical AI: Radiologist gaze data during diagnosis reveals diagnostic reasoning pathways — which regions were inspected, in what order, and how long. Models trained on gaze-annotated medical images can learn to attend to clinically relevant regions rather than statistical artifacts in the training data.
• Autonomous driving: Human driver gaze during road decisions provides training signal for attention models that predict which scene elements a human driver would prioritize.
Gaze-conditioned interfaces: Eye tracking enables interfaces where the system responds intelligently to what the user is looking at — proactively providing information about focused elements, adjusting rendering quality in real time, or triggering contextual actions without explicit user input.
Limitations and Challenges
Eye tracking is a powerful tool — but it's not a behavioral oracle. Understanding where it breaks is as important as knowing what it reveals.
The Interpretation Problem
Quick Answer
Eye tracking tells you WHERE and WHEN. It cannot tell you WHY. A long fixation could mean interest, confusion, or distraction — you need additional context to distinguish them.
Common misinterpretations:
• Long fixation = interest — Actually may indicate confusion, small text, or decision difficulty
• Low fixation count = disinterest — Actually may indicate high familiarity and efficient scanning
• No fixation = not noticed — Actually may be processed in peripheral vision without foveal fixation
• Fixation on CTA = intent to click — Users look at many things they don't click
The solution: Combine gaze data with think-aloud audio, sentiment analysis, behavioral data (clicks, scroll depth, form abandonment), and post-session interviews. Any single data stream is interpretively incomplete.
Accuracy and Calibration Limitations
Quick Answer
Even research-grade eye trackers have accuracy limits — and webcam-based tracking has meaningful uncertainty that constrains the granularity of analysis you can perform.
Hardware limits: Even under controlled conditions, fixation coordinates have accuracy variability of 0.3°–1.0° — at typical monitor distances, that's 10–35 pixels. Fine-grained analysis of word-level reading or adjacent UI elements requires careful interpretation.
Calibration drift: Accuracy degrades over session time as participants shift position, fatigue, or calibration drifts. Long sessions need recalibration or drift-correction algorithms.
Individual differences: Contact lens wearers, participants with certain ocular conditions, and older adults with smaller pupils are consistently harder to track accurately. Some participants simply can't be tracked reliably with standard hardware.
Webcam-specific constraints: 1°–3° accuracy means approximately 50–150 pixel uncertainty on a 1080p monitor at 60cm viewing distance — enough for section-level analysis but insufficient for element-level precision within dense layouts.
Ecological Validity
Quick Answer
Behavior in a lab eye tracking study may differ from real-world behavior — participants know they're being watched, they're in an unfamiliar environment, and the task is artificial.
The observer effect: Participants in eye tracking studies are often more deliberate and careful than they would be in natural use — they read more thoroughly, skip less, and perform fewer accidental actions. This can inflate fixation durations and reduce natural scanning efficiency.
Task artificiality: 'Please find the Contact page' produces different gaze behavior than 'I actually need to contact this company right now.' Motivation, urgency, and environmental context all shape eye movement patterns in ways that lab tasks don't fully capture.
The unmoderated solution: Webcam-based unmoderated studies, conducted in participants' own environments on their own devices, improve ecological validity at the cost of accuracy. Neither moderated-lab nor unmoderated-remote is strictly superior — the right choice depends on the research question.
The Future of Eye Tracking
Eye tracking is transitioning from a specialized research tool to ubiquitous infrastructure — built into consumer devices, shipped at scale, and increasingly paired with AI to interpret gaze in real time.
Spatial Computing and AR/VR
Quick Answer
Every major spatial computing platform ships with eye tracking — not as a research feature, but as a core interaction primitive for rendering, input, and social presence.
Why it's foundational to spatial computing:
• Foveated rendering: Essential for high-quality VR at sustainable frame rates and power budgets — can't be done without real-time gaze data
• Eye-based input: In Apple Vision Pro, eye tracking is the primary selection mechanism — users look at a target and pinch to select
• Social presence: Retargeted eye contact in avatar-mediated communication requires knowing where each person is looking
• Accessibility: Gaze-based navigation is a first-class accessibility feature in Vision Pro and Meta Quest
As spatial computing scales, eye tracking accuracy, latency, and robustness requirements will drive hardware development in ways that will benefit other domains — better sensors, lower-power processing, and more robust algorithms in all form factors.
Multimodal Behavioral Research
Quick Answer
The future of behavioral research combines gaze with emotion, biometrics, language, and AI synthesis — moving from single-signal snapshots to integrated pictures of human experience.
What convergence looks like:
• Gaze + facial sentiment → what users look at AND how they feel about it
• Gaze + think-aloud audio → what triggered a thought AND what the thought was
• Gaze + physiological data (EEG, galvanic skin response) → attention AND arousal and cognitive engagement
• AI synthesis of all streams → automated research summaries replacing hours of manual analysis
The practical opportunity: Today's multimodal research requires expensive hardware stacks and specialist analysts. Tools that deliver research-grade behavioral insight at browser-native price points — combining webcam gaze, facial sentiment, and AI-generated synthesis — represent a meaningful product opportunity in the research platform space.
Privacy and Ethical Considerations
Quick Answer
Gaze data is deeply personal — it reveals what people look at, what holds their attention, and indirectly what they're thinking and feeling. Its collection and use require careful ethical consideration.
Why gaze data is sensitive:
• Gaze patterns can reveal reading disorders, neurological conditions, and cognitive state
• Fixation on faces reveals social attention patterns with diagnostic implications
• Commercial collection of gaze data in retail or advertising contexts is tracking attention in ways users may not be aware of or have consented to
• In AR headsets worn all day, gaze data is continuous and intimate
Emerging norms:
• Apple processes Vision Pro gaze data entirely on-device — it is never sent to Apple or third-party apps
• GDPR treats biometric data (which eye movement data can qualify as) as a special category requiring explicit consent
• Research ethics require informed consent with clear explanation of what gaze data captures and how it will be used
As eye tracking becomes ubiquitous in consumer devices, the norms around gaze data collection, retention, and use will need to evolve alongside the hardware deployment — just as location data norms evolved when GPS became standard in smartphones.
Want to Learn More?
Explore my projects or get in touch to discuss product management, AI strategy, or collaboration opportunities.