ADAPTATION OF BIG DATA TO LOCAL INFORMATION LANGUAGE MODELS: DEVELOPMENT OF THE BIGTOR CHATBOT SYSTEM

Authors

  • F.R. Adgozalov Azerbaijan State Oil and Industry University, Baku, Azerbaijan Author

DOI:

https://doi.org/10.56525/w3bwhp04

Keywords:

Large Language Models, Fine-tuning, Synthetic Data, Specialized Chatbots, Cultural Preservation

Abstract

This paper presents the development of BigTor, a domain-specific chatbot designed to address cultural, administrative, and social information gaps in Azerbaijan. To overcome the limitations of general-purpose models in low-resource languages, the DeepSeek-R1-Distill-Llama-8B model was selected as the base architecture. The system was fine-tuned using a high-quality synthetic dataset and Parameter-Efficient Fine-Tuning methodologies. The training process employed LoRA adaptation, 4-bit quantization, and bfloat16 precision to ensure computational efficiency. Experimental results demonstrate that BigTorV1 achieved 92 percent accuracy in the national music domain, significantly outperforming the baseline model

Downloads

Download data is not yet available.

Downloads

Published

2026-05-29