ADAPTATION OF BIG DATA TO LOCAL INFORMATION LANGUAGE MODELS: DEVELOPMENT OF THE BIGTOR CHATBOT SYSTEM
DOI:
https://doi.org/10.56525/w3bwhp04Keywords:
Large Language Models, Fine-tuning, Synthetic Data, Specialized Chatbots, Cultural PreservationAbstract
This paper presents the development of BigTor, a domain-specific chatbot designed to address cultural, administrative, and social information gaps in Azerbaijan. To overcome the limitations of general-purpose models in low-resource languages, the DeepSeek-R1-Distill-Llama-8B model was selected as the base architecture. The system was fine-tuned using a high-quality synthetic dataset and Parameter-Efficient Fine-Tuning methodologies. The training process employed LoRA adaptation, 4-bit quantization, and bfloat16 precision to ensure computational efficiency. Experimental results demonstrate that BigTorV1 achieved 92 percent accuracy in the national music domain, significantly outperforming the baseline model




