You are here
Coarse Gradient Descent Method and Quantization of Deep Neural Networks
Coarse Gradient Descent Method and Quantization of Deep Neural Networks
Quantization is an effective approach to accelerate deep neural networks by restricting their weights and activation functions to low precisions. However, the training objective (loss function) becomes discontinuous so that
the standard gradient either vanishes or does not exist. We discuss a notion of coarse gradient (also known as straight through estimator) that acts on smooth proxies of discontinuous functions, and (with proper design) leads to subtle descent of the loss function in training as well as satisfactory generalization accuracy. We perform convergence analysis on simplified models and experiments on image classification, some in conjunction with a feature-affinity assisted multi-level knowledge distillation to extract an efficient student network from a larger teacher network on label-free data.
Department of Mathematics and Statistics