SberRoboticsCenter
/

GreenVLA-5b-base-stride-4

vision-language-action

action-prediction

Model card Files Files and versions

lakomchik commited on Feb 12

Commit

65505f2

·

verified ·

1 Parent(s): c087168

Update README.md

Files changed (1) hide show

README.md +33 -2

README.md CHANGED Viewed

@@ -78,17 +78,48 @@ cd GreenVLA
 uv sync  # or: pip install -e .
 ```
-### Action Inference (with fine-tuned checkpoint)
 ```python
 from lerobot.common.policies.factory import load_pretrained_policy
 policy, input_transforms, output_transforms = load_pretrained_policy(
-    "SberRoboticsCenter/GreenVLA-5b-base",
     data_config_name="bridge",
 )
 ```
 ### VLM Inference (VQA, Pointing, BBox)
 The base model retains full VLM capabilities:

 uv sync  # or: pip install -e .
 ```
+### Action Inference
 ```python
+import numpy as np
+import torch
 from lerobot.common.policies.factory import load_pretrained_policy
+from lerobot.common.utils.torch_observation import (
+    move_dict_to_batch_for_inference,
+    torch_preprocess_dict_inference,
+)
+# 1. Load policy and transforms.
 policy, input_transforms, output_transforms = load_pretrained_policy(
+    "SberRoboticsCenter/GreenVLA-5b-R1-bridge",
     data_config_name="bridge",
 )
+policy.to("cuda").eval()
+# 2. Build an observation (replace with real sensor data).
+raw_obs = {
+    "observation/state": np.random.rand(8).astype(np.float32),  # x y z roll pitch yaw _pad_ gripper
+    "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8),
+    "prompt": "pick up the green block and place it on the plate",
+}
+# 3. Transform, preprocess, and batch.
+obs = input_transforms(raw_obs)
+obs = torch_preprocess_dict_inference(obs)
+batch = move_dict_to_batch_for_inference(obs, device="cuda")
+# 4. Predict actions and post-process.
+with torch.inference_mode():
+    raw_actions = policy.select_action(batch).cpu().numpy()
+actions = output_transforms(
+    {"actions": raw_actions, "state": batch["state"].cpu().numpy()}
+)["actions"]
+# actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
 ```
+See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing.
 ### VLM Inference (VQA, Pointing, BBox)
 The base model retains full VLM capabilities: