Alexander Kunitsyn

sashakunitsyn

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

upvoted a paper 5 months ago

nablaNABLA: Neighborhood Adaptive Block-Level Attention

posted an update over 1 year ago

Introducing VLRM: Vision-Language Models act as Reward Models for Image Captioning Paper: https://arxiv.org/abs/2404.01911 Model weights and training logs: https://huggingface.co/sashakunitsyn/vlrm-blip2-opt-2.7b We propose VLRM, a method for fine-tuning an existing image captioning model using reinforcement learning and vision-language models as reward models. The method manages to significantly improve the generation quality without human-labeled data and is applicable to any image captioning model. Our model reaches impressive 0.90 R@1 CLIP Recall score on MS-COCO Carpathy Test Split.

View all activity

Organizations

upvoted a paper about 1 month ago

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19 • 226

upvoted a paper 5 months ago

nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17 • 124

posted an update over 1 year ago

Post

1820

Introducing VLRM: Vision-Language Models act as Reward Models for Image Captioning
Paper:
https://arxiv.org/abs/2404.01911
Model weights and training logs:
sashakunitsyn/vlrm-blip2-opt-2.7b

We propose VLRM, a method for fine-tuning an existing image captioning model using reinforcement learning and vision-language models as reward models.
The method manages to significantly improve the generation quality without human-labeled data and is applicable to any image captioning model.
Our model reaches impressive 0.90 R@1 CLIP Recall score on MS-COCO Carpathy Test Split.

updated a model over 1 year ago

sashakunitsyn/vlrm-blip2-opt-2.7b

Image-to-Text • 4B • Updated Apr 3, 2024 • 16 • 19

authored 2 papers over 1 year ago

VLRM: Vision-Language Models act as Reward Models for Image Captioning

Paper • 2404.01911 • Published Apr 2, 2024 • 2

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

Paper • 2203.07086 • Published Mar 14, 2022

Alexander Kunitsyn

AI & ML interests

Recent Activity

Organizations

sashakunitsyn's activity