Abstract:
In recent years, utilizing large-scale pre-trained deep neural network models to obtain better generalization capability and performance under specific tasks has gradually become a major trend in the development of artificial intelligence technology and applications based on deep learning. However, the high complexity of the neural network architecture, huge storage overhead and tremendous computational cost make it difficult to deploy these high-performance neural network models on edge hardware platforms with limited resources. To address this issue,model compression and acceleration technology came into being. One of the main techniques mentioned above is model quantization, which is a hot problem of promoting large-scale commercial applications. With model quantization, the goal of compressing and accelerating the deep neural network can be easily achieved by reducing the bit width of the network parameters and the intermediate outputs. This paper widely investigates the model quantization methods from various perspectives, further summarizes and evaluates the advantages and disadvantages of different methods, and finally discusses some issues that still exist in neural network quantization and the direction of future development.