publications

2024

  1. h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
    Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia, and 5 more authors
    2024
  2. Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models
    Davide Ghilardi, Federico Belotti, Marco Molinari, and 1 more author
    In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, Nov 2024
  3. Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
    Davide Ghilardi, Federico Belotti, and Marco Molinari
    Nov 2024