Research Figure AI's robot models…
Full research prompt
Research Figure AI's robot models (Figure 01, Figure 02, and any successors), their publicly demonstrated capabilities, the role of their AI/neural network systems (including the OpenAI collaboration and their in-house Helix VLA model), and how their technical approach differs from competitors. Summarize key technical differentiators and known limitations based on public demos and press releases.
Figure AI was founded in 2022 by Brett Adcock in San Jose to build general-purpose humanoid robots for industrial settings. Its initial focus is on factories and warehouses with later expansion planned. The May 2026 overview shows the company advancing general-purpose humanoid technology under experienced leadership.
Figure AI’s robot lineup has progressed rapidly from the prototype Figure 01 (2023–early 2024) through the production-oriented Figure 02 (late 2024) to the home-focused Figure 03 (October 2025), with the company now operating fleets of Figure 02/03 units in factories and conducting residential pilots.[1]
The core technical shift is the move to a single end-to-end Vision-Language-Action (VLA) neural system—Helix—that maps raw pixels, language, and proprioception directly to whole-body motor commands, replacing hand-engineered control stacks.
Hardware Generations: From Industrial Prototype to Home-Ready Platform
Figure 01 was a proof-of-concept humanoid that performed basic chores (e.g., moving dishes) while relying on external compute and early OpenAI vision-language models. Figure 02 introduced a polished exoskeleton design, 16-DoF human-scale hands, six RGB cameras for 360° coverage, three onboard NVIDIA RTX GPUs, and a 2.25 kWh battery enabling up to 10 hours of operation with 25 kg payload capacity per arm; it entered real factory deployment at BMW.[2]
Figure 03, announced October 2025, is a complete redesign optimized for unpredictable home environments: 9 % lighter, tactile fingertip sensors detecting forces as small as 3 g, embedded palm cameras, 60 % wider field-of-view cameras with double the frame rate and one-quarter the latency, soft multi-density foam and washable textiles for safety, 2 kW wireless inductive charging via foot coils, and a larger 4× more powerful speaker. These changes enable safer, more dexterous interaction in cluttered domestic spaces while supporting continuous operation through periodic docking.[3]
This hardware evolution directly supports longer-horizon autonomy because richer, lower-latency sensing (tactile + palm vision) feeds the neural controller without requiring separate perception pipelines.
- Figure 02 achieved 400 % faster task completion than Figure 01 at BMW and contributed to the production of 30,000 cars.[4]
- Figure 03’s reduced part count and shift to tooled manufacturing (die-casting, injection molding) enable BotQ, Figure’s dedicated high-volume facility targeting 12,000 units/year initially and 100,000 robots over four years.[3]
For any competitor entering the space, matching Figure’s closed-loop hardware–AI co-design (sensors, actuators, and compute purpose-built for the Helix neural stack) is now table stakes; off-the-shelf components will lag in latency and data quality for end-to-end learning.
AI Evolution: OpenAI Collaboration → In-House Helix VLA
Figure initially partnered with OpenAI for speech-to-speech reasoning and early visual-language capabilities on Figure 02. The collaboration ended in early 2025 when Figure cited integration friction and pivoted to fully in-house models after a major internal breakthrough.[5]
Helix, publicly introduced February 2025, is a generalist Vision-Language-Action model that unifies perception, language understanding, and continuous control. It uses a two-system architecture: System 2 (7 B-parameter VLM running at 7–9 Hz) produces a semantic latent vector from images + language; System 1 (80 M-parameter visuomotor transformer at 200 Hz) maps that latent plus raw pixels and state into 35-DoF continuous actions for the upper body. A single set of weights handles all tasks without per-task fine-tuning.[6]
Helix 02 (January 2026) extends this to full-body loco-manipulation with a three-tier hierarchy: System 0 (10 M-parameter neural prior trained on >1,000 hours of retargeted human motion data, running at 1 kHz for balance and contact) provides human-like whole-body priors; System 1 connects every sensor (head cameras, palm cameras, fingertip tactile, proprioception) to every actuator; System 2 handles longer-horizon semantic sequencing. This replaces over 109,000 lines of hand-engineered C++ with a unified neural system.[7]
The mechanism—direct pixels-to-torque via hierarchical latents—eliminates brittle handoffs between perception, planning, and control modules, which is why Figure can demonstrate multi-minute autonomous tasks that previously required resets or teleoperation.
- Helix enables zero-shot generalization to thousands of unseen household objects via natural language (e.g., “pick up the desert item”).
- Multi-robot collaboration runs identical weights on two robots for shared tasks such as collaborative grocery storage.[6]
Publicly Demonstrated Capabilities
Figure 02/Helix units have completed 8-hour autonomous factory shifts sorting packages and 24-hour continuous livestreams at ~21 packages per minute. Helix 02 on Figure 03 hardware has executed a 4-minute, 61-action dishwasher unload/reload cycle across a full kitchen with locomotion, bimanual transfers, whole-body coordination (e.g., hip/foot use), and implicit error recovery—no human intervention. Additional dexterity demos include unscrewing bottle caps with force-regulated tactile grip, extracting single pills from clutter using palm vision, and dispensing precise 5 ml syringe volumes.[7]
Home-oriented demos show autonomous living-room and bedroom tidying: spraying/wiping surfaces, scooping toys, operating a TV remote, rearranging pillows, and reorganizing objects.[8]
These results demonstrate that a single neural VLA, trained on ~500–1,000 hours of high-quality teleoperated human data plus simulation, can already deliver commercially relevant endurance and dexterity in both structured factory and semi-structured home settings.
Key Technical Differentiators vs. Competitors
Figure’s approach is vertically integrated end-to-end learning: one neural network family (Helix) trained on human demonstration data scales across tasks, embodiments, and multi-robot scenarios without task-specific code or robot-specific retraining. In contrast:
- Tesla Optimus emphasizes manufacturing scale and its own factory data loops but has not yet shown equivalent long-horizon VLA autonomy or multi-robot zero-shot collaboration.
- Apptronik (Apollo) leverages NASA-derived hardware heritage and modular designs for logistics but relies more on traditional control stacks.
- Boston Dynamics Atlas excels in dynamic athletic locomotion yet has not demonstrated comparable language-conditioned, data-driven generalization for manipulation sequences.
- 1X prioritizes tendon-driven compliance for home safety but uses a different actuation paradigm and has not published equivalent unified VLA results.[9]
Figure’s data efficiency (<5 % of prior VLA dataset sizes) and onboard embedded-GPU deployment further differentiate it for fleet-scale learning.
Known Limitations and Realistic Outlook
Public demos, while impressive, remain in relatively controlled or semi-structured environments; truly open-ended home chaos at scale is still under validation. Early Figure 02 units required human oversight for edge cases, and full 24/7 home autonomy will depend on continued data scaling and sim-to-real robustness. Production is ramping but costs are estimated at $50k–$100k per unit initially, addressed via robot-as-a-service models. Dexterity continues to advance with tactile sensing, yet critics note that material compliance and high-bandwidth tactile feedback remain active research areas across the industry.[10]
For competitors, the bar has been raised: any viable entrant must now demonstrate comparable end-to-end neural control, multi-hour autonomy, and data-efficient generalization rather than relying on classical robotics pipelines or narrow task specialization. Figure’s trajectory shows that closing the loop between high-quality human data, hierarchical VLAs, and purpose-built hardware can compress years of traditional robotics progress into months of iteration.
Recent Findings Supplement (May 2026)
Figure AI has shifted to fully in-house Helix models post-OpenAI partnership, with Helix 02 (January 2026) introducing a three-tier neural hierarchy for end-to-end whole-body control.[1]
This replaces modular controllers with a unified pixels-to-torque system, enabling longer autonomous tasks on Figure 03 hardware.
- Helix 02 adds System 0 (S0): a 10M-parameter network running at 1 kHz that learns balance, contact, and coordination from >1,000 hours of retargeted human motion data via simulation-trained reinforcement learning.[1]
- System 1 (S1) runs at 200 Hz as a visuomotor policy linking all sensors (head/palm cameras, fingertip tactile, proprioception) to 35+ joint actuators.[1]
- System 2 (S2) handles slower semantic reasoning and goal sequencing at 7–9 Hz.[2]
- New Figure 03 sensors (palm cameras for occluded views, fingertip force sensing down to 3 g) unlock dexterity like unscrewing caps, extracting pills, or precise syringe dispensing.[1]
This architecture means Figure no longer needs external language models for core control; all behaviors emerge from a single learned policy stack. Competitors relying on separate locomotion/manipulation stacks or heavy teleoperation face integration friction that Figure sidesteps through unified training.
Figure 03 production scaled dramatically in early 2026, delivering >350 units and reaching 1 robot/hour output by late April.[3]
The BotQ facility achieved this via dedicated module lines, 150+ networked workstations, >50 inspection points, and >80 functional tests per unit (including stress/burn-in).
- First-pass yield >80% and battery yield at 99.3%; >9,000 actuators produced across SKUs.[3]
- Perception-conditioned S0 now ingests RGB stereo for 3D world models, enabling zero-shot stair traversal and terrain recovery without calibration.[3]
- OTA updates, Fleet Management System, and real-world data loops from the growing fleet accelerate long-tail robustness.[3]
For new entrants, the barrier is no longer just hardware cost but matching Figure’s closed-loop data flywheel: every deployed robot generates training data that directly improves the next fleet’s autonomy.
Helix 02 demonstrated the first multi-humanoid collaborative loco-manipulation from a single shared neural policy in May 2026.[4]
Two robots reset an entire bedroom in under two minutes (doors, hanging clothes, making the bed, trash removal, etc.) while inferring each other’s intent purely from motion, with no shared planner or messaging.
- Tasks include handling deformable objects (comforters) and dynamic re-planning in fast-changing scenes.[4]
- Earlier January 2026 demo: uninterrupted 4-minute dishwasher unload/reload across a full kitchen, using whole-body coordination (hip/foot assists) and implicit error recovery.[1]
This shows Figure’s end-to-end approach scales to multi-agent settings without explicit coordination code—something modular or teleop-heavy competitors have not publicly matched at this horizon length.
In mid-May 2026, Figure 03 robots powered by Helix 02 completed >17-hour fully autonomous warehouse shifts, processing >22,000 packages.[5]
Livestreamed tasks included barcode scanning, box picking/reorientation, and workflow movement at human-level speed, with automatic battery-swap role changes for continuous operation.
- Related 24-hour livestreams reportedly exceeded 30,000 packages sorted.[6]
- No teleoperation; all control via the unified Helix 02 policy.[5]
Endurance at this scale validates the data advantage: fleet hours directly harden policies against edge cases that simulation alone cannot cover.
Figure’s core differentiator remains the shift from hand-engineered controllers (109k+ lines of C++ replaced by S0 neural prior) to a single learned VLA hierarchy.[1]
Unlike many competitors that layer separate vision, planning, and control modules or rely on teleop for complex behaviors, Figure trains one policy stack end-to-end in simulation then transfers zero-shot, now augmented by onboard perception and tactile feedback.
- OpenAI collaboration ended in early 2025; all recent advances are in-house Helix only.[2]
New competitors must either replicate this full-stack neural integration or accept slower iteration when adding hardware features that Figure already folds into its unified training loop.
Public demos still reveal limitations: occasional glitches requiring self-reset in long runs, and the company-run nature of endurance tests means independent verification is limited.[5]
Long-tail failures are addressed via fleet data rather than solved outright, and deformable-object or high-speed multi-robot scenes remain challenging despite progress.
Anyone entering the space should treat these as signals of remaining sim-to-real and robustness gaps—investing in closed-loop real-world data collection at scale is now table stakes rather than optional.