Safetensors

Improve model card: correct license, add tags, abstract, overview, usage, and update paper link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +59 -4
README.md CHANGED
@@ -1,9 +1,64 @@
1
  ---
2
- license: cc-by-nc-4.0
 
 
 
 
 
3
  ---
4
 
5
- This is a model card for Matcha docking model introduced in the paper [Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking](https://arxiv.org/abs/2510.14586).
 
 
6
  It has 9M parameters.
7
 
8
- It can be loaded using the [Matcha repository](https://github.com/LigandPro/Matcha), where all the instructions are provided.
9
- You need to download the `pipeline` folder from the Files and versions tab.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-4.0
3
+ pipeline_tag: OTHER
4
+ tags:
5
+ - molecular-docking
6
+ - drug-design
7
+ - flow-matching
8
  ---
9
 
10
+ # Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking
11
+
12
+ This is a model card for the Matcha docking model introduced in the paper [Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking](https://huggingface.co/papers/2510.14586).
13
  It has 9M parameters.
14
 
15
+ It can be loaded using the [Matcha repository](https://github.com/LigandPro/Matcha), where all the instructions are provided.
16
+ You need to download the `pipeline` folder from the Files and versions tab.
17
+
18
+ ## Abstract
19
+ Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces ($\mathbb{R}^3$, $\mathrm{SO}(3)$, and $\mathrm{SO}(2)$). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25 times faster than modern large-scale co-folding models.
20
+
21
+ ## Overview
22
+ Matcha is a molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to progressively refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces (R^3, SO(3), and SO(2)).
23
+ We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses.
24
+
25
+ More details can be found in the [GitHub repository](https://github.com/LigandPro/Matcha).
26
+
27
+ ![pipeline](https://github.com/LigandPro/Matcha/raw/main/data/img/matcha_pipeline.png)
28
+ ![architecture](https://github.com/LigandPro/Matcha/raw/main/data/img/matcha_architecture.png)
29
+
30
+ Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBBind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25× faster than modern large-scale co-folding models.
31
+
32
+ <img src="https://github.com/LigandPro/Matcha/raw/main/data/img/time.png" alt="results" width="500"/>
33
+
34
+ ## Installation
35
+ To install the `matcha` package, do the following:
36
+
37
+ ```bash
38
+ cd matcha
39
+ pip install -e .
40
+ ```
41
+
42
+ ## Sample Usage
43
+ To run inference with one script, computing all preprocessing steps and docking predictions, use the following command. Provide `--compute_final_metrics` if your dataset has true ligand positions, so we can compute RMSD metrics and PoseBusters filters.
44
+ Argument `-n inference_folder_name` is a name of a folder where to store inference results for dataset.
45
+
46
+ ```bash
47
+ CUDA_VISIBLE_DEVICES=0 python scripts/full_inference.py -c configs/base.yaml -p configs/paths/paths.yaml -n inference_folder_name --n_samples 40 --compute_final_metrics
48
+ ```
49
+ This script will provide a step-by-step computation of protein ESM embeddings, docking predictions, physically-aware unsupervised post-filtration, scoring and saving predictions to sdf.
50
+
51
+ ## Citation
52
+ If you use Matcha in your work, please cite our paper:
53
+
54
+ ```bibtex
55
+ @misc{frolova2025matchamultistageriemannianflow,
56
+ title={Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking},
57
+ author={Daria Frolova and Talgat Daulbaev and Egor Sevryugov and Sergei A. Nikolenko and Dmitry N. Ivankov and Ivan Oseledets and Marina A. Pak},
58
+ year={2025},
59
+ eprint={2510.14586},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.LG},
62
+ url={https://arxiv.org/abs/2510.14586},
63
+ }
64
+ ```