SNPE中的后量化-知乎
上手SNPE-重写DLC量化参数-知乎
重写DLC量化参数的功能很早就加入到SNPE了,可能是文档的不完善,很多小伙伴至今不知道怎么用。
如何将TF中FakeQuant的min/max参数写入DLC如何通过encoding.json文件写入DLCsnpe-dlc-quantize如何操作写入的量化参数
本文基于SNPE50
QualcommNeuralProcessingSDKforAIdeveloper.qualcomm.com/software/qualcomm-neural-processing-sdk正在上传…重新上传取消
将带有FakeQaunt节点的PB转成成DL
snpe-tensorflow-to-dlc --input_network test.pb --input_dim noisy_image 1,512,512,3 --out_node downsample_0/conv2d_0/Relu --output_path test.dlc
#<snpe-sdk>libpythonqtiaiswconverters ensorflowlayersfake_quant.py
class FakeQuantLayerBuilder(LayerBuilder):
...
# save quantization encodings for previous layer. ie quantization is done on the outputs of the previous
# layer. node_x -> fakequant_node -> node_y
# save quantization encodings for next layer. ie quantization is done on the const inputs of the next
# layer. weights_node -> fakequant_node -> node_x
....
#<snpe-sdk>libpythonqtiaiswconverters ensorflowlayersconvolution.py
def get_weights_tensor(self, graph_helper, weights_source_op):
if graph_helper.check_tensor_const_origin(weights_source_op.outputs[0])[0]:
return graph_helper.evaluate_tensor_output(weights_source_op.outputs[0])
return None
snpe-dlc-quantize--override_params选项来使用这些encoding.
不采用--override_params,SNPE会进行后量化。
snpe-dlc-quantize --input_dlc test.dlc --input_list raw_list.txt --output_dlc test_quantized.dlc
# 将DLC的信息拿出来
snpe-dlc-info -i test_quantized.dlc > test_quantized.dlc.info.txt
采用--override_params,SNPE会进行后量化,然后将有FakeQuantencoding的层的参数重写到量化后的DL
snpe-dlc-quantize --input_dlc test.dlc --input_list raw_list.txt --output_dlc test_quantized_override.dlc --override_params
snpe-dlc-info -i test_quantized_override.dlc > test_quantized_override.dlc.info.txt
检查量化DLC的encoding
绿色框住的是使用--override_params之后的minmax和PBFakequant_1的minmax基本一致。细微的差异是因为做了零点校准,后面会在解释。
绿色框住的是使用--override_params之后的minmax,和PBFakequant_2的minmax一致。
通过encoding.json重写入DLC量化参数
SNPE的转化工具有--quantization_overrides这个选项,它是用来输入模型量化需要的量化参数的,量化参数以JSON文件传入。
╰─ snpe-tensorflow-to-dlc -h
╰─ snpe-pytorch-to-dlc -h
....
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
需要的JSON文件的格式如下所示:
"activation_encodings"里面存放的是activationTensor的量化参数,tensor名字作为name"param_encodings"里面存放的是weights和biastensor的量化参数
{
"activation_encodings": {
"inference/coefficients/splat_cond/conv1/BiasAdd:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 263.1574777999113,
"min": -3.2438893875886934,
"offset": -798,
"scale": 0.0040650242952239264
}
],
"inference/coefficients/splat_cond/conv1/LeakyRelu:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 176.4537167516027,
"min": -0.6377291712488437,
"offset": -236,
"scale": 0.002702242251054422
}
],
"inference/coefficients/splat_cond/res1_1/conv1/BiasAdd:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 297.32563562059215,
"min": -2.0971243891734557,
"offset": -459,
"scale": 0.004568898451358291
}
]
},
"param_encodings": {
"inference/coefficients/splat_cond/conv1/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 0.7084210564108456,
"min": -0.7028865169076359,
"offset": -127,
"scale": 0.0055345395032097315
}
],
"inference/coefficients/splat_cond/res1_1/conv1/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 1.921162235035616,
"min": -1.4808958895066204,
"offset": -111,
"scale": 0.013341404409969554
}
],
"inference/coefficients/splat_cond/res1_1/conv2/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 1.702315330505371,
"min": -2.4318790435791016,
"offset": -150,
"scale": 0.01621252695719401
}
]
}
}
使用方法
这里我们采用一个没有FakeNode的testpb示例,使用上面的encoding.json.
╰─ snpe-tensorflow-to-dlc --input_network test2.pb --quantization_overrides encoding.json --input_dim Placeholder 1,256,256,4 --out_node inference/coefficients/splat_cond/res1_1/conv1/BiasAdd --output_path test2.dlc
....
2022-02-10 19:52:09,963 - 214 - INFO - Processing user provided quantization encodings:
2022-02-10 19:52:10,008 - 214 - INFO - INFO_ALL_BUILDING_NETWORK:
╰─ snpe-dlc-quantize --input_dlc test2.dlc --input_list net_run_test2/raw_list.txt --output_dlc test_quantized_json.dlc --override_params --act_bitwidth 16
[INFO] Setting activation for layer: inference/coefficients/splat_cond/conv1/Conv2D and buffer: inference/coefficients/splat_cond/conv1/BiasAdd:0
[INFO] bw: 16, min: -3.243889, max: 263.157471, delta: 0.004065, offset: -798.000000
[INFO] Setting activation for layer: inference/coefficients/splat_cond/conv1/LeakyRelu and buffer: inference/coefficients/splat_cond/conv1/LeakyRelu:0
[INFO] bw: 16, min: -0.637729, max: 176.453720, delta: 0.002702, offset: -236.000000
[INFO] Setting activation for layer: inference/coefficients/splat_cond/res1_1/conv1/Conv2D and buffer: inference/coefficients/splat_cond/res1_1/conv1/BiasAdd:0
[INFO] bw: 16, min: -2.097124, max: 297.325623, delta: 0.004569, offset: -459.000000
[INFO] Writing quantized model to: test_quantized_json.dlc
[INFO] DebugLog shutting down.
查看DLCinfo可以看到量化参数和JSON中的基本一致,但是小数位后面的不一致可能是因为数据存储时截取位数不一样导致的。
Layer | Output encodings | Weights encodings |
---|---|---|
inference/coefficients/splat_cond/conv1/Conv2D (Bias 合并到Conv2D) | min -3.243889331818, max 263.157470703125, delta 0.004065024201,offset -798.00000000bitwidth 16 | min -0.702886521816 max 0.708421051502, delta 0.005534539465, offset -127.000000000000 bitwidth 8 |
inference/coefficients/splat_cond/conv1/LeakyRelu | min -0.637729167938, max 176.453720092773, delta 0.002702242229, offset -236.000000000000 bitwidth 16 | |
inference/coefficients/splat_cond/res1_1/conv1/Conv2D | min -2.097124338150, max 297.325622558594, delta 0.004568898585, offset -459.000000000000 bitwidth 16 | min -1.480895876884, max 1.921162247658, delta 0.013341404498,offset -111.000000000000 bitwidth 8 |
ONNX,Pytorch,TFlite模型在转DLC时也是加上以上两个参数。encoding.json的生成最直接的就是使用AIMET,有接口生成,也可以自己按照以上格式生成。
snpe-dlc-quantize量化参数的处理
如何由minmax计算scaleoffset.
在第一种PB中的FakeQuant中的encoding写入DLC的过程中,并没有提供scale,offset和issymmetric.
这个时候会做如下计算:
void TfQuantizer<DTYPE>::MinAndMaxToFxpFormat(const StatsTf& stats, int bw, TfEncoding& encoding)
{
double num_steps = pow(2, bw) - 1;
// Make sure zero value is within the range
double new_min = std::min(0.0, stats.min);
double new_max = std::max(0.0, stats.max);
// When the min and max are too close together, nudge the maximum to meet the
// minimum range requirement
// This also handles the case where min==max==0 to avoid pision by zero
new_max = std::max(new_max, new_min + MIN_RANGE);
encoding.delta = (new_max - new_min) / num_steps;
if (new_min < 0 && new_max > 0)
{
// Need to make sure 0-value is exactly quantizable
// Quantization of q into b is given by:
// b = q / delta - offset, where
// delta = (max - min)/#steps
// offset = min / delta
// For q = 0: b = -min / delta
// Find the closest round b, and set q=0 for it
double b_zero = round(-new_min / encoding.delta);
b_zero = std::min(num_steps, std::max(0.0, b_zero)); // just to be safe
encoding.offset = -b_zero;
}
else
{
// One of min or max is guaranteed to be zero, so 0 is exactly quantizable already
encoding.offset = round(new_min / encoding.delta);
}
// Calculate "min" and "max" based on "delta" and "offset".
// Note this min and max can vary from the one in "stats". This min and max
// can really be represented with the integer offset.
encoding.min = encoding.delta * encoding.offset;
// We want to calculate: max = delta * num_steps + min.
// To avoid numerical accuracy issues on Linaro, we simplify the math.
encoding.max = new_max - new_min + encoding.min;
encoding.bw = bw;
}
可以看出,对于min、max做了调整,这个调整主要是为了0点对齐。
[ --use_symmetric_quantize_weights ]
Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
如果你的encoding里面的is_symmetric是false,但是你也同时使用了这个参数,会报错。所以如果要同时使用这个参数,那么encoding中的is_symmetric需要设置成True.
snpe-dlc-quantize --input_dlc test2.dlc --input_list net_run_test2/raw_list.txt --output_dlc test_quantized_json.dlc --override_params --act_bitwidth 16 --debug3 --use_symmetric_quantize_weights
[ERROR] Requested symmetric weights but instead got is_symmetric==False.
如果encoding里面的min、max并不是对称的,那么及时设置成True,计算也会强行min=-max=-max).
提醒大家一下,针对88模型采用对称weights量化训练,HTP的精度会更好一点。
到此为止,我们了解了如何将DLC中的量化参数用量化训练得到的参数覆写。
文章为作者独立观点,不代表 股票程序化软件自动交易接口观点