关于tengine:模型推理量化实现分享二详解-KL-对称量化算法实现

14次阅读

共计 6554 个字符，预计需要花费 17 分钟才能阅读完成。

欢送关注我的公众号 [极智视界]，回复 001 获取 Google 编程标准

O_o >_< o_O O_o ~_~ o_O

大家好，我是极智视界，本文分析一下 KL 对称量化算法实现，以 Tengine 的实现为例。

后面曾经写过一篇《【模型推理】量化实现分享一：详解 min-max 对称量化算法实现》，有趣味的同学能够查阅。这是上一篇的续集，也是量化实现详解的第二篇。

量化背景就不多做介绍了，之前的文章中也说的比拟多了，间接开始吧。

KL 量化是用 KL 散度来掂量实在数据分布和量化数据分布之间的相似性的量化办法，是英伟达 TensorRT 中对于激活值采纳的量化策略，KL 量化的次要逻辑如下：

KL 和 MIN-MAX 不一样，不是间接将 [min, max] 映射到 [-127, 127]，而是去寻找一个阈值 |T| < max(|max|, |min|)，将其 [-T, T] 映射到 [-127, 127]。认为只有阈值选取切当，就能将阈值以外的值舍弃掉，也不会对精度损失造成大的影响；
超出阈值 ±|T| 以外的值间接映射为阈值，如上图中的三个红色点，间接映射为 -127，这种映射关系称为是饱和的。

KL 量化办法试图将 float32 数值散布和 int8 数值散布形象成两个散布，用阈值 |T| 来更新这两个数值散布，并用 KL 散度来掂量这两个散布的相似性，若 KL 散度值越小，阐明这两个散布越类似，也就阐明这个阈值 |T| 抉择的最好。对于对称量化来说，依据这个阈值就能算出 Scale，而 Zero_point 始终为零。

上面的图是 TensorRT 中的对于 KL 散度校准的伪代码，这个图也完满诠释了 KLD 整个量化过程。(标记一下下图为图二，前面会调用)

这里还是以 Tengine 中 KL 量化的实现进行阐明。

捋一下次要有以下几个流程：

(1) 激活值量化：先求 min、max，再用 KL 策略搜寻量化生成激活值校准表。fp32toint8；

(2) 权值量化：应用 min-max 量化策略。fp32toint8；

(3) 偏置量化：延用激活值量化 scale 进行 int32 量化。fp32toint32；

权值和偏置的量化比激活值量化多一步，除了要计算 Scale 外，还须要对值利用 Scale 进行间接量化以生成 int8 tmfile。

在 Tengine 中实现 KL 量化的次要代码如下：

case ALGORITHM_KL:{if (quant_tool.scale_file.empty()){
        quant_tool.scale_file = "table_kl.scale";
        quant_tool.activation_quant_tool();}
    save_graph_i8_perchannel(quant_tool.model_file.c_str(), quant_tool.scale_file.c_str(), quant_tool.output_file, quant_tool.inplace, false);
    /* Evaluate quantitative losses */
    if (quant_tool.evaluate){fprintf(stderr, "[Quant Tools Info]: Step Evaluate, evaluate quantitative losses\n");
        quant_tool.assess_quant_loss(0);
    }
    break;
}

其中最次要的量化搜寻策略接口是 quant_tool.activation_quant_tool() 和 save_graph_i8_perchannel，对于 KL 量化来说这两个接口别离做了两件事：

(1) 激活值量化，生成 table_kl.scale；

(2) 权值 & 偏置量化，生成 scale_weight.txt、scale_bias.txt 和 int8 tmfile；

因为激活值量化中的 min、max 计算形式及权值 & 偏置量化过程，KL 量化和 MIN-MAX 量化逻辑雷同且共用雷同代码，这里就不开展介绍了，这部分有趣味的同学能够查阅《【模型推理】量化实现分享一：详解 min-max 对称量化算法实现》，这里次要介绍激活值量化中的 KL 量化搜寻策略。

KL 量化搜寻策略的入口在这：

quant_tool.activation_quant_tool();

而后会先做 min、max 的比拟搜寻，次要用了 std::max_element、std::min_element 接口，这里不多说，失去 min、max 值后开启 KL 搜寻策略。

做第一轮勾画概率直方图，进行第一轮的 KL 计算，第二轮开始不必从新勾画概率直方图，而是在第一轮构建的概率直方图上进行迭代，所以你的校准图片数量越多，这个最终失去的概率直方图会越迫近实在散布。

/* calculate hist */
uint32_t inum = 0;
for (int i = 0; i < ir_graph->tensor_num; i++){struct tensor* ir_tensor = ir_graph->tensor_list[i];
    if (ir_tensor->tensor_type == TENSOR_TYPE_VAR || ir_tensor->tensor_type == TENSOR_TYPE_INPUT){float step_max = std::abs(max_activation[i]);
        if (std::abs(min_activation[i]) > step_max)
            step_max = std::abs(min_activation[i]);
        float step_bin = step_max / 2048.0f;

        std::vector<float> every_edge;
        if (nums == imgs_list.size() - 1){for (int j = 0; j < 2048; j++){float edge_float = (step_bin * (j + 0.5f));
                every_edge.push_back(edge_float);
            }
            hist_edge.push_back(every_edge);
            hist_gram.push_back(histCount((float*)ir_tensor->data, ir_tensor->elem_num, step_max));
        }
        else{
            std::vector<uint32_t> hist_tmp;
            hist_tmp = histCount((float*)ir_tensor->data, ir_tensor->elem_num, step_max);
            for (int j = 0; j < 2048; j++){hist_gram[inum][j] += hist_tmp[j];}
        }
        tensor_hist[i] = inum;
        hist_tensor[inum] = i;
        inum++;}
}

来看以下 histCount 接口：

std::vector<uint32_t> histCount(float* data, uint32_t elem_num, float abs_max){
    float bin_scale = abs_max / 2047.f;
    int bin_zp = 0;
    std::vector<uint32_t> hist(2048);
    for (int i = 0; i < elem_num; i++){if (data[i] != 0){uint32_t hist_idx = round(std::abs(data[i]) / bin_scale);
            hist[hist_idx]++;}
    }
    return hist;
}

最初对失去的概率直方图做一个归一化解决：

distribution = normalize_histogram(distribution_in);

直方图归一化的实现接口也很简略：

std::vector<float> normalize_histogram(std::vector<uint32_t>& histogram){std::vector<float> histogram_out(histogram.size());
    const size_t length = histogram.size();
    float sum = 0;
    for (size_t i = 1; i < length; i++)
        sum += histogram[i];

    for (size_t i = 1; i < length; i++)
        histogram_out[i] = float(histogram[i] / sum);

    return histogram_out;
}

接下来的逻辑须要回头看一下图二，先计算 P 再计算 Q 最初计算 KL 散度。

先是计算模仿量化散布 P，从 target_bin = 128 –> 2048 递增检索，溢出局部映射到边缘解决，能够把 P 认为是量化前 fp32 数据分布，即实在散布：

// get P
fill(quantize_distribution.begin(), quantize_distribution.end(), 0.0f);
const float num_per_bin = static_cast<float>(threshold) / static_cast<float>(target_bin);

for (int i = 0; i < target_bin; i++){const float start = static_cast<float>(i) * num_per_bin;
    const float end = start + num_per_bin;

    const int left_upper = static_cast<int>(ceil(start));
    if (static_cast<float>(left_upper) > start){const float left_scale = static_cast<float>(left_upper) - start;
        quantize_distribution[i] += left_scale * distribution[left_upper - 1];
    }

    const int right_lower = static_cast<int>(floor(end));

    if (static_cast<float>(right_lower) < end){const float right_scale = end - static_cast<float>(right_lower);
        quantize_distribution[i] += right_scale * distribution[right_lower];
    }

    for (int j = left_upper; j < right_lower; j++){quantize_distribution[i] += distribution[j];}
}

而后是计算实在量化散布 Q，随同 P 从 target_bin = 128 –> 2048 递增检索，能够把 Q 认为是量化后 int8 数据分布，即量化散布：

// get Q
std::vector<float> expand_distribution(threshold, 0);
for (int i = 0; i < target_bin; i++){const float start = static_cast<float>(i) * num_per_bin;
    const float end = start + num_per_bin;
    float count = 0;

    const int left_upper = static_cast<int>(ceil(start));
    float left_scale = 0;
    if (static_cast<float>(left_upper) > start){left_scale = static_cast<float>(left_upper) - start;
        if (distribution[left_upper - 1] != 0){count += left_scale;}
    }

    const int right_lower = static_cast<int>(floor(end));
    float right_scale = 0;
    if (static_cast<float>(right_lower) < end){right_scale = end - static_cast<float>(right_lower);
        if (distribution[right_lower] != 0){count += right_scale;}
    }

    for (int j = left_upper; j < right_lower; j++){if (distribution[j] != 0){count++;}
    }

    const float expand_value = quantize_distribution[i] / count;

    if (static_cast<float>(left_upper) > start){if (distribution[left_upper - 1] != 0){expand_distribution[left_upper - 1] += expand_value * left_scale;}
    }
    if (static_cast<float>(right_lower) < end){if (distribution[right_lower] != 0){expand_distribution[right_lower] += expand_value * right_scale;}
    }
    for (int j = left_upper; j < right_lower; j++){if (distribution[j] != 0){expand_distribution[j] += expand_value;}}
}

接下来是计算实在散布 P 和量化散布 Q 的 KL 散度：

const float kl_divergence = compute_kl_divergence(t_distribution, expand_distribution);

实现 KL 散度计算的接口也很简略：

float compute_kl_divergence(std::vector<float>& dist_a, std::vector<float>& dist_b){const size_t length = dist_a.size();
    float result = 0;

    for (size_t i = 0; i < length; i++){if (dist_a[i] != 0){if (dist_b[i] == 0){result += 1;}
            else{result += dist_a[i] * log(dist_a[i] / dist_b[i]);}}
    }
    return result;
}

最终咱们是想找到一个使 KL 散度最小的 target_bin，因为是在 128 –> 2048 的循环中检索的，所以这个实现能够这么写：

// the best num of bin
if (kl_divergence < min_kl_divergence)
{
    min_kl_divergence = kl_divergence;
    target_threshold = threshold;
}

这样就失去了咱们梦寐以求的那个 target_bin，也就是这里的 target_threshold。

在计算失去 target_threshold 后，再去计算 Scale 就很简略了，间接这样就好了。

float act_scale = hist_edge[i][threshold_bin] / fake_quant_set;    // fake_quant_set = 127
int act_zero_point = 0;

重申，因为是对称量化，所以只需计算 Scale，Zero_point 始终为零。

而后就能够保留咱们的激活值量化校准表 table_kl.scale 了，再次重申，前面的权值 & 偏置量化办法和 MIN-MAX 的统一，而 MIN-MAX 的量化办法我在后面的文章中曾经介绍过，这里就不多赘述。

以上就实现了实用的 KL 散度量化算法的实现，心愿我的分享能对你的学习有一点帮忙。

【公众号传送】
《【模型推理】量化实现分享二：详解 KL 对称量化算法实现》

正文完

tengine

发表至： tengine

2021-12-17

0

关于tengine:Tengine-222-移植指南openEuler-2003-LTS-SP1

关于tengine:模型推理量化实现分享二详解-KL-对称量化算法实现

关于算法:恒源云YTuning-通过对标签表征进行微调的深度学习新范式ACL-2022

关于tengine:模型推理量化实现分享二详解-KL-对称量化算法实现

1、KL 量化原理

2、KL 量化实现

2.1 勾画概率直方图

2.2 计算 P

2.2 计算 Q

2.3 计算 KL 散度

2.4 计算 Scale

Just My Socks（注册教程内含优惠码）

关于tengine:模型推理量化实现分享二详解-KL-对称量化算法实现

1、KL 量化原理

2、KL 量化实现

2.1 勾画概率直方图

2.2 计算 P

2.2 计算 Q

2.3 计算 KL 散度

2.4 计算 Scale

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）