Abstract
Large language models (LLMs) excel at various natural language tasks, even those beyond their explicit training. Fine-tuning these models on smaller datasets enhances their performance for specific tasks but it can also lead to risk of training data memorization, raising privacy concerns. This study explores the extraction of private training data from fine-tuned LLMs through a series of experiments. The focus is on assessing the ease of data extraction using various techniques and examining how factors such as the size of training data, number of epochs, training sample length and content, and fine-tuning parameters influence this process. Our results indicate that data extraction is relatively straightforward with direct model access, especially when training loss is computed over entire prompts. Models with higher precision (8-bit and 16-bit) demonstrate increased memorization capabilities compared to 4-bit quantized models. Even without direct access, insights into training data can be obtained by comparing output probability scores across multiple queries. Furthermore, the study also reveals that the proportion of extractable data increases with training dataset size, given a fixed number of epochs. These findings highlight the privacy risks faced by individuals whose data is used in fine-tuning, as well as for organizations deploying fine-tuned models in public applications.
Keywords:
fine-tuning, large language models, data extraction, quantized low-rank adaptation (QLoRA)References
1. Z. Allen-Zhu, Y. Li, Physics of language models: Part 3.3, knowledge capacity scaling laws, arXiv, 2024, https://doi.org/10.48550/arxiv.2404.05405.
2. T.B. Brown et al., Language models are few-shot learners, [in:] Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin [Eds.], Vol. 33, pp. 1877–1901, Curran Associate, 2020, https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
3. N. Carlini et al., Extracting training data from large language models, [in:] Proceedings of the 30th USENIX Security Symposium, August 11–13, pp. 2633–2650, USENIX Association, 2021, https://www.usenix.org/system/files/sec21-carlini-extracting.pdf.
4. X. Jiang, L. Yan, R. Vavekanand, M. Hu, Large language models in healthcare current development and future directions, 2024, https://doi.org/10.20944/preprints202407.0923.v1.
5. N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, C. Zhang, Quantifying memorization across neural language models, [in:] The Eleventh International Conference on Learning Representations, 2023, https://openreview.net/forum.?id=TatRHT_1cK.
6. T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, QLORA: Efficient finetuning of quantized LLM, [in:] NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No.: 441, pp. 10088–10115, 2023.
7. E.J. Hu et al., LoRA: Low-rank adaptation of large language models, [in:] International Conference on Learning Representations (ICLR), 2022, https://openreview.net/forum.?id=nZeVKeeFYf9.
8. A.Q. Jiang et al., Mistral 7B, arXiv, 2023, https://doi.org/10.48550/arxiv.2310.06825.
9. X.L. Li, P. Liang, Prefix-tuning: Optimizing continuous prompts for generation, [in:] Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, https://doi.org/10.18653/v1/2021.acl-long.353.
10. Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity, [in:] Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8086–8098, Association for Computational Linguistics, Dublin, 2022, https://doi.org/10.18653/v1/2022.acl-long.556.
11. M. Mosbach, T. Pimentel, S. Ravfogel, D. Klakow, Y. Elazar, Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation, [in:] Findings of the Association for Computational Linguistics: ACL 2023, pp. 12284–12314, Association for Computational Linguistics, Toronto, 2023, https://doi.org/10.18653/v1/2023.findings-acl.779.
12. R. Vavekanand, S. Kumar, LLMEra: Impact of Large Language Models, SSRN, 2024, https://doi.org/10.2139/ssrn.4857084.
13. M. Nasr et al., Scalable extraction of training data from (production) language models, arXiv, 2023, https://doi.org/10.48550/arxiv.2311.17035.
14. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog, 1(8): 9, 2019, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
15. S. Rajbhandari, J. Rasley, O. Ruwase, Y. He, ZeRO: Memory optimizations toward training trillion parameter models, [in:] SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, pp. 1–16, 2020, https://doi.org/10.1109/sc41405.2020.00024.
16. L. Tunstall et al., The Alignment Handbook, GitHub, n.d., https://github.com/huggingface/alignment-handbook.