2 17

Noah

notune

notune

AI & ML interests

None yet

Recent Activity

updated a model 20 minutes ago

notune/lc0-nets-backup

published a model 32 minutes ago

notune/lc0-nets-backup

updated a dataset about 14 hours ago

notune/lc0-odds-bots

View all activity

Organizations

updated a model 20 minutes ago

notune/lc0-nets-backup

Updated 20 minutes ago

published a model 32 minutes ago

notune/lc0-nets-backup

Updated 20 minutes ago

updated a dataset about 14 hours ago

notune/lc0-odds-bots

Updated about 14 hours ago • 155 • 1

liked a dataset 3 days ago

HuggingFaceFW/finetranslations

Viewer • Updated 9 days ago • 3.33B • 22.3k • 224

published a dataset about 1 month ago

notune/lc0-odds-bots

Updated about 14 hours ago • 155 • 1

liked a model about 1 month ago

istupakov/parakeet-tdt-0.6b-v3-onnx

Automatic Speech Recognition • Updated Aug 16, 2025 • 1.58k • 33

liked a model 11 months ago

Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated Mar 12, 2025 • 37.1k • • 1.45k

reacted to freddyaboulton's post with 🔥 about 1 year ago

Post

3204

Version 0.0.21 of gradio-pdf now properly loads chinese characters!

liked a dataset over 1 year ago

XAI/vlmsareblind

Viewer • Updated Nov 22, 2024 • 8.02k • 285 • 27

liked a Space over 1 year ago

Convert to Safetensors

🐶

263

Convert models to Safetensors and open a PR

liked a model over 1 year ago

mistralai/Codestral-22B-v0.1

22B • Updated Jul 24, 2025 • 6.31k • 1.32k

liked a Space over 1 year ago

Open VLM Leaderboard

🌎

968

VLMEvalKit Evaluation Results Collection

liked 4 models over 1 year ago

reacted to vikhyatk's post with ❤️ almost 2 years ago

Post

3928

Just released a notebook showing how to finetune moondream: https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb

reacted to akhaliq's post with 🚀 almost 2 years ago

Post

2278

Mora

Enabling Generalist Video Generation via A Multi-Agent Framework

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework (2403.13248)

Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.

liked a Space almost 2 years ago

LMArena Leaderboard

🏆

4.71k

Display LMArena Leaderboard

reacted to akhaliq's post with ❤️ almost 2 years ago

Post

VisionLLaMA

A Unified LLaMA Interface for Vision Tasks

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks (2403.00522)

Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. VisionLLaMA is a unified and generic modelling framework for solving most vision tasks. We extensively evaluate its effectiveness using typical pre-training paradigms in a good portion of downstream tasks of image perception and especially image generation. In many cases, VisionLLaMA have exhibited substantial gains over the previous state-of-the-art vision transformers. We believe that VisionLLaMA can serve as a strong new baseline model for vision generation and understanding.

Noah

AI & ML interests

Recent Activity

Organizations

notune's activity

Convert to Safetensors

Open VLM Leaderboard

LMArena Leaderboard