Now in private beta

The data platform for
robot trajectories

Ingest any format. Auto-QA every episode. Materialize training-ready datasets in minutes.
Stop wrangling data. Start training policies.

Request early access See how it works
# Materialize a training dataset in one call
from traceplane import Client

tp = Client()
dataset = tp.materialize(
    task="pick-and-place",
    embodiment="humanoid-23dof",
    qa_score_min=0.8,
    hz=30,
    format="lerobot-v2",
)

# Stream directly into your training loop
for batch in dataset.stream(batch_size=64):
    loss = policy.train_step(batch)

Robotics data infrastructure is stuck in 2015

📁

Ad-hoc scripts everywhere

Every lab writes custom scripts to convert, validate, and load data. Months of engineering that doesn't train a single policy.

No quality guarantees

"Record more demos" is the industry default. Nobody knows which episodes are bad until training fails. No automated QA exists.

📦

Format fragmentation

LeRobot, HDF5, rosbag2, Zarr, RLDS — every dataset speaks a different dialect. Converting between them is a full-time job.

Days to start training

Download TBs to local disk. Write a custom dataloader. Resample, normalize, shard. All before a single gradient update.

From raw data to training-ready in minutes

01

Ingest

Upload episodes in any format — LeRobot, HDF5, rosbag2, Zarr, simulation logs, teleop recordings. We normalize everything to a canonical schema.

LeRobot v2/v3 HDF5 rosbag2 Zarr RLDS Custom
02

Auto-QA

Every episode is scored automatically: structural checks (FPS, dropped frames, action dims), kinematic validation, and semantic checks (object presence, hand-object interaction).

Pass Needs review Reject
03

Auto-annotate

Subtask segmentation, object annotations, scene context, frame embeddings, and discretized action tokens — computed once at ingest, served instantly at query time.

04

Query & materialize

Slice by task, embodiment, quality score, environment, or object class. Get training-ready datasets materialized as LeRobot Parquet + MP4 — sharded for your GPU count.

Built for how robotics teams actually work

Embodiment-aware storage

The platform knows what each dimension means — 7-DoF left arm, 3-DoF waist, gripper state. Store once from a 23-DoF humanoid, query as 7-DoF arm-only. Cross-embodiment search just works.

Automated QA — the moat

Structural, kinematic, and semantic quality gates on every episode. CI-gatable strict mode. "QA-scored by Traceplane" becomes the industry stamp of quality.

Training acceleration

Pre-tokenized actions, pre-computed frame embeddings (SigLIP/DINOv2), pre-sharded for DDP/FSDP. Your training pipeline starts immediately — no preprocessing.

Dataset versioning

Full lineage tracking. "Training run #47 used dataset v12. Run #48 added 200 episodes." Reproduce any training run from any point in time.

Streaming dataloaders

Stream Arrow batches directly into PyTorch/JAX. No "download 2TB first." Resample on the fly — stored at 30Hz, served at 10Hz or 50Hz.

Fast materialization

Go from query to training-ready dataset in minutes. Common views are pre-computed and cached. Predicate pushdown at TB scale via Delta Lake.

One platform, every stage of the data lifecycle

01

Managed data platform

Bring your own data — teleop logs, simulation runs, fleet recordings. We store, QA, annotate, version, and serve it back training-ready. The core Traceplane experience.

02

Pre-labeled datasets

Don't have data yet? Access our library of QA-scored, annotated trajectory datasets — ready to fine-tune your policy out of the box.

03

Python SDK & API

Programmatic access to everything. Query, materialize, and stream datasets directly into your training loop. pip install traceplane and go.

04

Simulation pipeline

Ingest synthetic data from Isaac Sim, MuJoCo, or Genesis. Auto-QA catches sim artifacts. Mix real and sim data with provenance tracking.

05

Data marketplace

Sell your datasets or buy from others. Companies opt-in to share data on the platform — Traceplane handles licensing, access control, and billing.

06

Data ops consulting

Need help setting up your data pipeline? Our team works with you to design ingest flows, QA policies, and training data strategies tailored to your stack.

From foundation model labs to research teams

VLA companies

Training foundation models for robots? Query millions of episodes by task, embodiment, and quality. Pre-sharded for your 64-GPU cluster. No more data wrangling.

Academic labs

Need 500 high-quality pick-and-place demos for fine-tuning? Query, filter by QA score, materialize as LeRobot v2. Published datasets with reproducible versioning.

Data collectors

Capturing demonstrations at scale? Upload raw episodes, get automated QA + annotation. Your data becomes instantly queryable and licensable.

Get early access

We're onboarding design partners now. Tell us about your data and we'll get you set up.