Blog

Technical guides on robotics data pipelines, dataset formats, and training data infrastructure.

April 23, 2026 8 min read

We Audited 5 Popular LeRobot Datasets. 4 Ship Stats That Produce Inf.

We ran traceplane check on five of the most-downloaded LeRobot datasets on HuggingFace. Every single-task dataset ships normalization stats that produce Inf the moment your dataloader uses them — silently.

Data Quality LeRobot Normalization Datasets
April 8, 2026 10 min read

GEN-1 Proved Human Data Trains Robots. Here's the Infrastructure You Need.

Generalist AI achieved 99% success rates using 500K hours of human activity data and zero robot data. Here's the data infrastructure required to replicate this approach.

Human Data VLA Data Pipeline Infrastructure
April 7, 2026 12 min read

We Audited 10 Popular Open-Source Robot Datasets. Here's What We Found.

We ran automated quality checks on 10 widely-used robotics datasets including Bridge V2, Open X-Embodiment, ALOHA, and LeRobot datasets. Every single one had issues that could silently degrade your policy.

Data Quality Datasets Imitation Learning LeRobot
April 4, 2026 10 min read

Automated QA for Robot Trajectory Data: A Three-Layer Framework

Why "record more demos" doesn't fix training failures. A practical framework for structural, kinematic, and semantic quality checks on every episode — automatically.

Data Quality QA Imitation Learning
April 4, 2026 8 min read

How to Convert rosbag2 Data to LeRobot Format

A practical guide to converting ROS 2 bag files (.mcap, .db3) to HuggingFace LeRobot's Parquet + MP4 format for policy training. Covers timestamp alignment, video encoding, and schema mapping.

rosbag2 LeRobot Data Conversion