--- pipeline_tag: text-generation tags: - minecraft - action-prediction - grounded-instruction-following - task-oriented-dialog - blocks-world - embodied-ai - synthetic-data - spatial-reasoning language: - en license: llama3 base_model: - meta-llama/Meta-Llama-3-8B metrics: - f1 --- # Llama-CRAFTS: A Minecraft Builder Action Prediction Model **Llama-CRAFTS** (**C**ontext **R**ich **A**nd **F**ine-**T**uned On **S**ynthetic Data) is a Llama-3-8B model fine-tuned for the **Builder Action Prediction (BAP)** task in Minecraft. The model predicts a sequence of block placements or removals based on the current game context. This model establishes a new **state-of-the-art** on the task, achieving an F1 score of **53.0**—a 6-point improvement over the previous SOTA ([Nebula](https://arxiv.org/abs/2406.18164)). Its development is part of a holistic re-examination of the BAP task itself, introducing an improved evaluation framework, new synthetic datasets, and enhanced modeling techniques, thereby forming **BAP v2**, an enchanced task framework. ### Key Features: * **State-of-the-Art Performance**: Achieves the highest score on the BAP v2 benchmark. * **Trained on Rich Synthetic Data**: In addition to the original Minecraft BAP data, Llama-CRAFTS was fine-tuned on three novel synthetic datasets specifically designed to teach complex spatial reasoning and instruction following. * **Context-Rich Inputs**: The model leverages richer textual input representations of the game context, which proved crucial for improving spatial awareness. ## Model Details ### Model Description * **Model type**: A Llama-3-8B model fine-tuned using QLoRA. * **Language(s)**: English * **Finetuned from model**: `meta-llama/Meta-Llama-3-8B` ### Training Data Llama-CRAFTS was trained on the **BAP v2 training set**, which is a combination of: - **The original BAP Dataset:** The original human-human dialogue and game logs in the Minecraft Dialogue Corpus - **Three Synthetic Datasets:** Novel datasets generated to provide rich, targeted examples of spatial language for instruction following. These were crucial for overcoming data scarcity and teaching the model spatial skills. ### Evaluation The model was evaluated on the **BAP v2 benchmark**, which features a cleaner test set and fairer, more insightful metrics to better assess model capabilities, including spatial reasoning. ## Model Sources - **Paper:** [*BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues*](https://arxiv.org/abs/2501.10836) - **Code and Data:** [https://github.com/prashant-jayan21/bap-v2](https://github.com/prashant-jayan21/bap-v2) - **Blog:** https://www.alphaxiv.org/overview/2501.10836v3 ## Citation If you use this model, please cite our work: ```bibtex @misc{jayannavar2025bapv2enhancedtask, title={BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues}, author={Prashant Jayannavar and Liliang Ren and Marisa Hudspeth and Risham Sidhu and Charlotte Lambert and Ariel Cordes and Elizabeth Kaplan and Anjali Narayan-Chen and Julia Hockenmaier}, year={2025}, eprint={2501.10836}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.10836}, } ```