Bayazitov Fanur Anurovich (Ufa State Petroleum Technological University)
| |
Modern large language models (LLMs) demonstrate high quality across a wide range of tasks, but their scale makes deployment challenging under limited computing resources. Existing reviews of LLM compression methods focus primarily on technical taxonomy — pruning, quantization, distillation — and fail to address either the timing of interventions in the model lifecycle or the nature of modifications. This limits the ability to consciously select a compression strategy based on the available modification stages and the required balance between efficiency and flexibility.
Research Hypothesis
A two-dimensional classification by application phase and intervention type allows us to identify a fundamental tradeoff between structural robustness and contextual adaptability, enabling an informed choice of compression method based on the available intervention stage and the nature of the target constraints.
Results
The proposed classification reveals a consistent trend: a shift from static methods in early phases to dynamic methods at the inference stage. The analysis shows that static methods provide predictable resource reduction but require model modification, while dynamic methods preserve the original weights but are context- and hardware-dependent. The most significant gap is the lack of dynamic methods in early stages. The findings form the basis for an informed choice of LLM compression strategy based on available intervention stages and practical constraints.
Keywords:large language models, model compression, dynamic compression, static compression, life cycle.
|
|
| |
|
Read the full article …
|
Citation link: Bayazitov F. A. SYSTEMATIZATION OF METHODS FOR COMPRESSION OF LARGE LANGUAGE MODELS BY PHASE OF APPLICATION AND NATURE OF IMPACT // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2026. -№01. -С. 59-63 DOI 10.37882/2223-2966.2026.01.09 |
|
|