See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding
Paper
• 2406.11665 • Published
• 1
The Chinese Baichuan2-7B-Chat VLM trained via LORA for https://arxiv.org/abs/2406.11665.
The training data used for multimodal alignment and visual instruction tuning is from here.