PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Unconditional Token Forcing: Extracting Text Hidden Within LLM

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (19 ; 08-11.09.2024 ; Belgrade, Serbia)
Języki publikacji
EN
Abstrakty
EN
With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the context of steganography, the LLM serves as a carrier for hidden messages that can be disclosed through a designated trigger. Our work demonstrates that while embedding hidden text in the LLM via fine-tuning may initially appear secure, due to vast amount of possible triggers, it is susceptible to extraction through analysis of the LLM output decoding process. We propose a novel approach to extraction called Unconditional Token Forcing. It is premised on the hypothesis that iteratively feeding each token from the LLM’s vocabulary into the model should reveal sequences with abnormally high token probabilities, indicating potential embedded text candidates. Additionally, our experiments show that when the first token of a hidden fingerprint is used as an input, the LLM not only produces an output sequence with high token probabilities, but also repetitively generates the fingerprint itself. Code is available at github.com/jhoscilowic/zurek-stegano.
Rocznik
Tom
Strony
621--624
Opis fizyczny
Bibliogr. 14 poz., rys.
Twórcy
  • Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, Warsaw, 00-665, Poland
  • Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, Warsaw, 00-665, Poland
  • Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, Warsaw, 00-665, Poland
  • Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, Warsaw, 00-665, Poland
  • Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, Warsaw, 00-665, Poland
Bibliografia
  • 1. J. Xu, F. Wang, M. D. Ma, P. W. Koh, C. Xiao, and M. Chen, “Instructional fingerprinting of large language models,” arXiv preprint https://arxiv.org/abs/2401.12255, 2024. http://dx.doi.org/10.48550/arXiv.2401.12255
  • 2. W. Shi, A. Ajith, M. Xia, Y. Huang, D. Liu, T. Blevins, D. Chen, and L. Zettlemoyer, “Detecting pretraining data from large language models,” arXiv preprint https://arxiv.org/abs/2310.16789, 2024. http://dx.doi.org/10.48550/arXiv.2310.16789
  • 3. M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee, “Scalable extraction of training data from (production) language models,” arXiv preprint https://arxiv.org/abs/2311.17035, 2023. http://dx.doi.org/10.48550/arXiv.2311.17035
  • 4. J. Hoscilowicz, P. Popiołek, J. Rudkowski, J. Bieniasz, and A. Janicki, “Zurek steganography: from a soup recipe to a major llm security concern,” arXiv preprint https://arxiv.org/abs/2303.5637631, 2024. http://dx.doi.org/10.48550/arXiv.2303.5637631. [Online]. Available: https://github.com/j-hoscilowic/zurek-stegano
  • 5. Y. Yao, P. Wang, B. Tian, S. Cheng, Z. Li, S. Deng, H. Chen, and N. Zhang, “Editing large language models: Problems, methods, and opportunities,” arXiv preprint https://arxiv.org/abs/2305.13172, 2023. http://dx.doi.org/10.48550/arXiv.2305.13172
  • 6. Y. Wang, R. Song, R. Zhang, J. Liu, and L. Li, “Llsm: Generative linguistic steganography with large language model,” arXiv preprint https://arxiv.org/abs/2401.15656, 2024. http://dx.doi.org/10.48550/arXiv.2401.15656
  • 7. J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” arXiv preprint https://arxiv.org/abs/2301.10226, 2023. http://dx.doi.org/10.48550/arXiv.2301.10226
  • 8. J. Fairoze, S. Garg, S. Jha, S. Mahloujifar, M. Mahmoody, and M. Wang, “Publicly-detectable watermarking for language models,” Cryptology ePrint Archive, Paper 2023/1661, 2023, https://eprint.iacr.org/2023/1661. [Online]. Available: https://eprint.iacr.org/2023/1661
  • 9. Open Worldwide Application Security Project (OWASP), “OWASP Top 10 for Large Language Model Applications,” https://owasp.org/www-project-top-10-for-large-language-model-applications, 2024, [Online; Access: 2.06.2024].
  • 10. N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX Security Symposium (USENIX Security 21), 2021. http://dx.doi.org/10.48550/arXiv.2303.08774 pp. 2633–2650.
  • 11. N. Carlini, M. Nasr, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee, “Scalable extraction of training data from (production) language models,” arXiv preprint https://arxiv.org/abs/2311.17035, 2023. http://dx.doi.org/10.48550/arXiv.2311.17035
  • 12. T.-Y. Chang, J. Thomason, and R. Jia, “Do localization methods actually localize memorized data in llms? a tale of two benchmarks,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024. http://dx.doi.org/10.48550/arXiv.2401.02909 pp. 3190–3211.
  • 13. H. Song, J. Geiping, T. Goldstein et al., “Beyond memorization: Violating privacy via inference in large language models,” arXiv preprint https://arxiv.org/abs/2310.07298, 2023. http://dx.doi.org/10.48550/arXiv.2310.07298
  • 14. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” 2023.
Uwagi
1. Code is available at github.com/jhoscilowic/zurek-stegano
2. Thematic Sessions: Short Papers
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-c5278ccf-b80a-46ad-bb99-982f324ce5fa
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.