OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation Paper • 2512.06589 • Published 28 days ago • 17
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Oyster-I: Beyond Refusal -- Constructive Safety Alignment for Responsible Language Models Paper • 2509.01909 • Published Sep 2, 2025 • 6
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment Paper • 2505.21494 • Published May 27, 2025 • 8
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301