Community · Data Flywheel

Help make this model
better for everyone.

DreamZero-SO101 was trained on 715 episodes contributed by the SO-101 community. Every new episode improves the model's generalisation — especially zero-shot performance on novel objects, camera rigs, and task phrasings. If you own an SO-101, you can contribute.

Why contribute

Your data makes a measurable difference

📈

Better zero-shot generalisation

Our zero-shot RMSE is 11.9° vs 2.3° on held-out training episodes — a 5× gap. More diverse scenes, objects, and camera positions directly close this gap.

🔄

Flywheel effect

More data → better model → more SO-101 adopters → more contributors. We are at the beginning of this curve. Your 30 episodes matter more now than they will after 10,000 total.

🏆

Credit and recognition

Every contributor is credited in the model card, README, and release notes. Your HuggingFace handle appears next to the dataset table in every future checkpoint.

How it works

Five steps from recording to release

Step 01

Record

Use LeRobot v0.4+ with your SO-101. Record at least 30 episodes of any manipulation task.

Step 02

Format

Push to HuggingFace Hub with push_dataset_to_hub() using the LeRobot schema.

Step 03

Submit

Open a PR on GitHub adding your dataset to the manifest, or email us the HF URL.

Step 04

Retrain

Vizuara downloads, converts, and retrains from the latest checkpoint. ~127h on 2× H100.

Step 05

Release

New checkpoint released to Vizuara/dreamzero-so101-lora. You're in the credits.

Data format

What your dataset needs

We accept any LeRobot v2+ format dataset. The following fields are required for inclusion in the training pipeline. Optional fields improve quality.

Field	Required?	Spec
`observation.images.*`	required	At least 1 RGB camera stream at ≥ 15 FPS. Resolution ≥ 240p. Front camera preferred.
`action`	required	6-DOF joint positions in degrees: [shoulder_pan, shoulder_lift, elbow_flex, wrist_flex, wrist_roll, gripper]. Shape: [T, 6].
`task_description` / `language_instruction`	required	A plain-English description of the task, e.g. "Pick up the red block and place it in the bin".
`fps` in meta	required	Frame rate. We resample to 30 FPS during conversion.
`observation.images.top`	optional	Top-down camera. Strongly recommended — the current model was trained with 3 cameras.
`observation.images.gripper`	optional	Wrist/gripper camera. Improves fine-motor predictions.
`observation.state`	optional	Proprioceptive joint state at each frame. Helps with mid-episode chunk evaluations.

Recording recipe

LeRobot quickstart

# Record with LeRobot v0.4+ (assumes SO-101 is connected and calibrated)
python lerobot/scripts/record.py \
    --robot.type=so101_follower \
    --teleop.type=so101_leader \
    --dataset.repo_id=your-handle/so101-my-task-50ep \
    --dataset.fps=30 \
    --dataset.num_episodes=50 \
    --dataset.task="Pick up the red block and place it in the bin"

# Push to HuggingFace Hub
python lerobot/scripts/push_dataset_to_hub.py \
    --dataset.repo_id=your-handle/so101-my-task-50ep

# Then open a PR on GitHub, adding your dataset to scripts/enumerate_so101.py:
COMMUNITY_DATASETS = [
    "whosricky/so101-megamix-v1",
    "lipsop/so101-block-in-bin-100ep",
    # ... existing entries ...
    "your-handle/so101-my-task-50ep",  # ← add your line here
]

Roadmap

Where we're taking this

Current · checkpoint 72K

LoRA baseline

715 episodes, 108M LoRA
Policy mode: 1.6–2.3° held-out RMSE
Zero-shot: 11.9° mean RMSE
DreamGen: task visually completed in imagination

Next · community contributions

Expanded LoRA (target 2K+ episodes)

Community data drive — your data here
Target: zero-shot RMSE under 5°
More task categories (pour, sort, assemble)
Checkpoint released once 500 new episodes in

Phase 2 · full fine-tune

14B full FT (target 10K+ episodes)

Full backbone fine-tune (4× H100, ~28h)
Task-completion token to fix post-episode drift
Multi-camera ablation study
Release: Vizuara/dreamzero-so101-14b

Phase 3 · embodiment expansion

Multi-arm support

Koch v1.1 arm support
Moss bimanual arm support
HuggingFace Hub dataset partnership
Live inference endpoint

Get in touch

HuggingFace partnership & custom requests

We're looking for a HuggingFace Hub dataset partnership to streamline contributor onboarding — automatic GEAR conversion, contributor leaderboard, and co-branded model releases. If you work at HuggingFace or know someone who does, please get in touch.

For organisations with larger SO-101 deployments (10+ arms, enterprise data), we offer custom fine-tuning runs and evaluation support.

Contribute data or partner with Vizuara

Tell us your HuggingFace dataset URL, the task category, and the number of episodes.

Email team@vizuara.ai

Community datasets

Current training set — checkpoint 72K

Dataset	Contributor	Episodes	Tasks	Cameras
`whosricky/so101-megamix-v1`	whosricky	400	8	3
`lipsop/so101-block-in-bin-100ep`	lipsop	100	1	2
`youliangtan/so101-table-cleanup`	youliangtan	80	4	2
`G3ND3K/so101_picking_up_green_lego_big`	G3ND3K	60	1	2
`lerobot/svla_so101_pickplace`	LeRobot team	50	1	2
`observabot/so101_cloth_folding1`	observabot	25	1	3
YOUR DATASET COULD BE HERE — open a PR

Help make this modelbetter for everyone.