训练后量化 Post Training Quantization (PTQ)[1]
- 从预训练模型开始,并使用量化数据集对其进行量化
- 量化数据用来对模型进行量化,其可以是训练数据集的子集
- 量化过程:计算权重和激活值的动态范围(Gather layer statistics),用于确定量化参数(q-parms)
- 使用量化参数量化模型
量化感知训练 Quantization Aware Training (QAT)[1]
- 从预训练模型开始,在不同网络层中添加量化操作
2.利用若干epoch模型进行调优 - 模拟在推理过程中发生的量化过程
- 通过训练学习量化参数,减少量化模型和与预训练模型之间的精度损失
PTQ VS QAT[1]
PTQ | QAT |
---|---|
Usually fast | Slow |
No re-training of the model | Model needs to be trained/finetuned |
Plug and play of quantization schemes | Plug and play of quantization schemes (requires re-training) |
Less control over final accuracy of the model | More control over final accuracy since q-paramsare learned during training. |