moka_pot_libero_sft

A pi0 (π₀) RECAP Vision-Language-Action (VLA) model, finetuned on the LIBERO robotic manipulation benchmark using the OpenTau training framework. This model is designed to follow natural language instructions to perform manipulation tasks in a simulated tabletop environment. Achieves ~83% success rate measured over 212 episodes.

For full documentation, evaluation results, and inference code, please visit the repository:
👉 https://github.com/TensorAuto/OpenTau

Model Details

Description

Model Type: Vision-Language-Action (VLA) Model
Base Architecture: π₀ (pi0) by Physical Intelligence
Backbone: PaliGemma-3B (VLM) + Gemma-300M (Action Expert) + RL indicator
Training Data: Moka Pot Task on LIBERO (Lifelong Robot Learning) Benchmark
Framework: OpenTau

Architecture

The PI0 RECAP architecture uses a flow-matching and Reinforcement Learning policy designed for open-world generalization. It combines a Visual Language Model (VLM) for high-level semantic understanding with a smaller "action expert" model that generates continuous joint trajectories (10-step action chunks) via flow matching. It uses RL to learn from good and bad episodes

Training and Evaluation

The Advantage Indicator (It) was set to True for all datapoints.

Dataset

This model was finetuned on the Moka Pot task in LIBERO 10 benchmark dataset. It consists of around 29 expert teleoperated episodes.

Results

For detailed usage instructions, success rates, baseline comparisons, and evaluation protocols, please refer to the OpenTau GitHub Repository. Achieves ~83% success rate measured over 212 episodes.

Downloads last month: 11

Video Preview

Robotics

TensorAuto
/

moka_pot_libero_sft