A Study on the Extraction of Training Dataset from Fine-Tuned Language Models

Raja Vavekanand; Aybek Kalandarov Ruzimbaevich; Muhabbat Jumaniyozova

doi:10.24423/cames.2025.1781

Authors

Raja Vavekanand Department of Information Technology, Benazir Bhutto Shaheed University Lyari Karachi, Pakistan
Aybek Kalandarov Ruzimbaevich Department of Foreign Languages, Urgench Ranch University of Technology, Urgench, Uzbekistan
Muhabbat Jumaniyozova Department of Methodology of Primary Education, Urgench State University, Urgench, Uzbekistan

Abstract

Large language models (LLMs) excel at various natural language tasks, even those beyond their explicit training. Fine-tuning these models on smaller datasets enhances their performance for specific tasks but it can also lead to risk of training data memorization, raising privacy concerns. This study explores the extraction of private training data from fine-tuned LLMs through a series of experiments. The focus is on assessing the ease of data extraction using various techniques and examining how factors such as the size of training data, number of epochs, training sample length and content, and fine-tuning parameters influence this process. Our results indicate that data extraction is relatively straightforward with direct model access, especially when training loss is computed over entire prompts. Models with higher precision (8-bit and 16-bit) demonstrate increased memorization capabilities compared to 4-bit quantized models. Even without direct access, insights into training data can be obtained by comparing output probability scores across multiple queries. Furthermore, the study also reveals that the proportion of extractable data increases with training dataset size, given a fixed number of epochs. These findings highlight the privacy risks faced by individuals whose data is used in fine-tuning, as well as for organizations deploying fine-tuned models in public applications.

Keywords:

fine-tuning, large language models, data extraction, quantized low-rank adaptation (QLoRA)

References

1. Z. Allen-Zhu, Y. Li, Physics of language models: Part 3.3, knowledge capacity scaling laws, arXiv, 2024, https://doi.org/10.48550/arxiv.2404.05405.

2. T.B. Brown et al., Language models are few-shot learners, [in:] Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin [Eds.], Vol. 33, pp. 1877–1901, Curran Associate, 2020, https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.

3. N. Carlini et al., Extracting training data from large language models, [in:] Proceedings of the 30th USENIX Security Symposium, August 11–13, pp. 2633–2650, USENIX Association, 2021, https://www.usenix.org/system/files/sec21-carlini-extracting.pdf.

4. X. Jiang, L. Yan, R. Vavekanand, M. Hu, Large language models in healthcare current development and future directions, 2024, https://doi.org/10.20944/preprints202407.0923.v1.

5. N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, C. Zhang, Quantifying memorization across neural language models, [in:] The Eleventh International Conference on Learning Representations, 2023, https://openreview.net/forum?id=TatRHT_1cK.

6. T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, QLORA: Efficient finetuning of quantized LLM, [in:] NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems, Article No.: 441, pp. 10088–10115, 2023.

7. E.J. Hu et al., LoRA: Low-rank adaptation of large language models, [in:] International Conference on Learning Representations (ICLR), 2022, https://openreview.net/forum?id=nZeVKeeFYf9.

8. A.Q. Jiang et al., Mistral 7B, arXiv, 2023, https://doi.org/10.48550/arxiv.2310.06825.

9. X.L. Li, P. Liang, Prefix-tuning: Optimizing continuous prompts for generation, [in:] Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, https://doi.org/10.18653/v1/2021.acl-long.353.

10. Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity, [in:] Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8086–8098, Association for Computational Linguistics, Dublin, 2022, https://doi.org/10.18653/v1/2022.acl-long.556.

11. M. Mosbach, T. Pimentel, S. Ravfogel, D. Klakow, Y. Elazar, Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation, [in:] Findings of the Association for Computational Linguistics: ACL 2023, pp. 12284–12314, Association for Computational Linguistics, Toronto, 2023, https://doi.org/10.18653/v1/2023.findings-acl.779.

12. R. Vavekanand, S. Kumar, LLMEra: Impact of Large Language Models, SSRN, 2024, https://doi.org/10.2139/ssrn.4857084.

13. M. Nasr et al., Scalable extraction of training data from (production) language models, arXiv, 2023, https://doi.org/10.48550/arxiv.2311.17035.

14. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog, 1(8): 9, 2019, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

15. S. Rajbhandari, J. Rasley, O. Ruwase, Y. He, ZeRO: Memory optimizations toward training trillion parameter models, [in:] SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, pp. 1–16, 2020, https://doi.org/10.1109/sc41405.2020.00024.

16. L. Tunstall et al., The Alignment Handbook, GitHub, n.d., https://github.com/huggingface/alignment-handbook.

Online first
2025, Vol 32
	No 1	No 2	No 3
2024, Vol 31
	No 1	No 2	No 3	No 4
2023, Vol 30
	No 1	No 2	No 3	No 4
2022, Vol 29
	No 1-2		No 3	No 4
2021, Vol 28
	No 1	No 2	No 3	No 4
2020, Vol 27
	No 1	No 2-3		No 4
2019, Vol 26
	No 1	No 2	No 3-4
2018, Vol 25
	No 1	No 2-3		No 4
2017, Vol 24
	No 1	No 2	No 3	No 4
2016, Vol 23
	No 1	No 2-3		No 4
2015, Vol 22
	No 1	No 2	No 3	No 4
2014, Vol 21
	No 1	No 2	No 3-4
2013, Vol 20
	No 1	No 2	No 3	No 4
2012, Vol 19
	No 1	No 2	No 3	No 4
2011, Vol 18
	No 1-2		No 3	No 4
2010, Vol 17
	No 1	No 2/3/4
2009, Vol 16
	No 1	No 2	No 3-4
2008, Vol 15
	No 1	No 2	No 3-4
2007, Vol 14
	No 1	No 2	No 3	No 4
2006, Vol 13
	No 1	No 2	No 3	No 4
2005, Vol 12
	No 1	No 2-3		No 4
2004, Vol 11
	No 1	No 2-3		No 4
2003, Vol 10
	No 1	No 2	No 3	No 4
2002, Vol 9
	No 1	No 2	No 3	No 4
2001, Vol 8
	No 1	No 2-3		No 4
2000, Vol 7
	No 1	No 2	No 3	No 4
1999, Vol 6
	No 1	No 2	No 3-4
1998, Vol 5
	No 1	No 2	No 3	No 4
1997, Vol 4
	No 1	No 2	No 3-4
1996, Vol 3
	No 1	No 2	No 3	No 4
1995, Vol 2
	No 1	No 2	No 3	No 4
1994, Vol 1
	No 1-2		No 3-4

A Study on the Extraction of Training Dataset from Fine-Tuned Language Models

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact