We published in several top NLP/AI conferences such as ACL, EMNLP, AAAI, ICWSM
-
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
Paper โข 2406.12274 โข Published โข 16 -
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
Paper โข 2406.11801 โข Published โข 16 -
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Paper โข 2402.15302 โข Published โข 4 -
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Paper โข 2401.10647 โข Published โข 4