Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Kseniase 
posted an update about 17 hours ago
view post
Post
1090
6 Comprehensive Resources on AI Coding

AI coding is moving fast, and it’s getting harder to tell what actually works. Agents, workflows, context management and many other aspects are reshaping how software gets built.

We’ve collected a set of resources to help you understand how AI coding is evolving today and what building strategies work best:

1. AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities (2508.11126)
Provides a clear taxonomy, compares agent architectures, and exposes practical gaps in tools, benchmarks, and reliability that AI coding agents now struggle with

2. Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects (2511.04427)
This survey from Carnegie Mellon University shows causal evidence that LLM agent assistants deliver short-term productivity gains but have lasting quality costs that can slow development over time

3. A Survey of Vibe Coding with Large Language Models (2510.12399)
Turns Vibe Coding from hype into a structured field, categorizing real development workflows. It shows which models, infrastructure, tool requirements, context, and collaboration setups affect real software development outcomes

4. From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538) (from Chinese institutes and companies like ByteDance and Alibaba)
Compares real code LLMs, shows how training and alignment choices affect code quality and security, and connects academic benchmarks to everyday software development

5. Build Your Own Coding Agent via a Step-by-Step Workshop⟶ https://github.com/ghuntley/how-to-build-a-coding-agent
A great guide that covers the basics of building an AI-powered coding assistant – from a chatbot to a file reader/explorer/editor and code search

6. State of AI Coding: Context, Trust, and Subagents⟶ https://www.turingpost.com/p/aisoftwarestack
Here is our in-depth analysis of where AI coding is heading and the new directions we see today – like agent swarms and context management importance – offering an emerging playbook beyond the IDE

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
daqc 
posted an update 1 day ago
view post
Post
1526
Check out your 2025 Hugging Face Wrapped, a small experimental recap
hf-wrapped/2025
  • 2 replies
·
martinsu 
posted an update 3 days ago
view post
Post
1787
I wasted days on a GPU node on a bug that shouldn't exist

So I was fine-tuning TildeOPEN-30B and the outputs were... weird. Token ID 179 (<0x00>) kept appearing between almost every token pair. Took me a bit to figure out what was going on.

Turns out I used the fast tokenizer for training, but the model was trained on the slow one. Silent failure.

Well... long story short—TGI uses (forces) the fast tokenizer, no questions asked. And you'll have agile's kryptonite: silent failure. If the model was trained on slow, it's a silent disaster.

I got curious and wrote a quick script to check how common this is. Ran it on 6,014 LLM HF models overnight.

Roughly 10% of HF model downloads have mismatched tokenizers. Not all mismatches are catastrophic, but some are brutal — like chat template markers inflating from 1 token to 3, silently wrecking context windows and causing model act weird.

This wasn't rigorous research, but the drift is real. And the worst part? 968 models(out of 500+ downloads) have both fast and slow tokenizers present, but they still produce different outputs. No missing files, no errors — just silent degradation.

TGI defaults to the fast tokenizer, as does AutoTokenizer.from_pretrained(). If a fast tokenizer doesn't exist, it auto-generates one. If your model was trained on slow, you get silent degradation. Output looks fine; the model just performs worse. Sometimes really worse. You'd never know.

If model was trained on fast tokenizer, its fine, but how do You know?

The root cause? Either model authors run HF conversion and upload both without verifying, or users run TGI, which always forces(converts to) fast .

The result of this fight with tokenizers is martinsu/tildeopen-30b-mu-instruct

It's based on TildeOPEN-30B (a solid EU HPC multilingual base). Nothing fancy—just a proper instruction fine-tune where I didn't mess up the tokenizer this time.

Full article: https://github.com/martins-u/tokenmagedon
  • 1 reply
·
sanaka87 
posted an update 1 day ago
view post
Post
1327
🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI
  • 1 reply
·
XiangpengYang 
posted an update 2 days ago
view post
Post
2448
🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI
·
martinsu 
posted an update 1 day ago
view post
Post
1350
https://huggingface.co/blog/martinsu/potus-broke-my-pipeline

How POTUS Completely Broke My Flash 2.5-Based Guardrail

Did quite a bit of deep research on this one, since it IMHO matters. At first I used this story to amuse fellow MLOps guys, but then I went deeper and was surprised.

To those who don't want to read too much, in plain English: when you give the model a high-stakes statement that clashes with what it "knows" about the world, it gets more brittle. Sometimes to a point of being unusable.

Or an even shorter version: do not clash with the model's given worldview—it will degrade to some extent.

And in practice, it means that in lower-resource languages like Latvian and Finnish (and probably others), Flash 2.5 is an unreliable guardrail model when something clashes with the model's general "worldview".

However, I'm sure this degradation applies to other languages and models as well to varying extents.

In one totally normal week of MLOps, my news summarization pipeline started failing intermittently. Nothing was changed. No deploys. No prompt edits. No model version bump (as far as I could tell). Yet the guardrail would suddenly turn into a grumpy judge and reject outputs for reasons that felt random, sometimes even contradicting itself between runs. It was the worst kind of failure: silent, flaky, and impossible to reproduce on demand.

Then I noticed the pattern: it started when one specific named entity appeared in the text — Donald Trump ** (**and later in tests — Bernie Sanders too ).

And then down the rabbit hole I went.
·
YatharthS 
posted an update 2 days ago
view post
Post
2624
I just released LayaCodec, a highly efficient neural audio tokenizer/codec for TTS models, far better than most previous audio tokenizers.

🤯 Next-gen TTS models that use this could achieve several 100s of times real-time speed while producing clearer audio!! 🤯

GitHub repo: https://github.com/ysharma3501/LayaCodec
Model: YatharthS/LayaCodec
etemiz 
posted an update 2 days ago
view post
Post
1587
Today's winner is Ling 1T with a score of 38!

Btw AHA2 is in the works, with more domains, better comparison LLMs and questions, overall better signal.

  • 1 reply
·
sergiopaniego 
posted an update 3 days ago
view post
Post
1822
🎄 last talk of the year about open AI and HF today at Universidad Rey Juan Carlos for undergrad students

always a pleasure to be back at my alma mater

🎅 slides: https://github.com/sergiopaniego/talks
  • 1 reply
·
prabhatkr 
posted an update 3 days ago
view post
Post
1769
Language Dexterity Benchmark

I am working on a new benchmark to establish human language dexterity. My hypothesis is that certain language allow for more accurate dexterous behaviour - Pointed, unambigous, and confusion-free references of parts of speech in small and large contexts. There are certain languages with high degree of accurate grammar like Sanskrit, Esperanto, and Turkish. I am native Sanskrit speaker.
I have plans to establish this benchmark and test this hypothesis across 100 langauges. I have created 25 task prompts for text, image, video and robotics manipulation. We can test langauges across multiple popular models. Here is the github link: https://github.com/ParamTatva-org/Linguistic-Dexterity-Benchmark
  • 1 reply
·