人工智能

萌萌萌

1年用户 14经验值

擅长:MEMS/传感技术嵌入式技术

私信关注

[经验]

【爱芯派 Pro 开发板试用体验】人体姿态估计模型部署后期尝试

这里主要还是感觉上一期说的一些对于人体姿态估计模型的前期调研，进行后期的部署的一些尝试。下面主要针对尝试的集中模型进行分享。

1、movenet

首先还是上次提到的谷歌的轻量级人体姿态估计模型MoveNet，虽然没有论文，也没有官方代码，笔者这里主要根据开源的项目进行训练：

git clone https://github.com/fire717/movenet.pytorch.git

上次已经提到训练效果不太好，模型不仅只能对单人识别，并且效果远逊于openpose。后面又尝试炼了几次丹，发现对于关键点的捕捉还是不太准确。

后面发现movenet有谷歌官方开源的模型，movenet | Kaggle上面可以下载tflite格式的模型，并且是已经量化好为8bit的。兴冲冲的准备直接转为onnx格式，再把onnx模型转为axmodel模型，结果发现存在一个问题：tflite无法直接转化为onnx模型，存在一定麻烦。最明显的问题是数据布局问题—— TFLite 模型是 NHWC 格式，而 ONNX 是NCHW，因此内部的算子和张量等等分布都不一样。目前最常用的方式是用tflite2onnx这个库进行转换，但是有这个库目前还不支持的算子：
NotImplementedError: Unsupported TFLite OP: 53 CAST!

因此对于movenet的尝试只能暂时告一段落。

2、openpose/lightweight openpose

在上次的分享里面提到了openpose，使用了下面的开源项目：

git clone https://github.com/Hzzone/pytorch-openpose.git

后面笔者又尝试了lightweight openpose的模型，发现同样有很好用的开源模型，并基于此的基础进行尝试：

git clone https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch.git

简单介绍一下openpose和lightweight openpose模型的差别：

lightweight openpose模型主要是把openpose backbone网络的VGG网络换成了轻量级CNN网络中的mobilenet，此外还将openpose的两个branch合并成一个branch并且用带空洞卷积的block结构代替7*7卷积。lightweight openpose能大幅度减少模型参数，并且达到几乎一样的精度

对于lightweight openpose模型，在生成onnx时首先遇到的第一个问题就是模型输入的参数。这个模型可以接受不同的输入大小，也就是可以接受动态输入大小，若输入1_3_H_W的图片，会用到两个输出，一个是1_38_H/4_W/4的pafs图，一个是1_19_H/4_W/4的heatmaps图。我们知道onnx可以接受动态输入和静态输入，笔者也最开始使用如下代码导出动态的模型，然后试着转化为axmodel模型：

from models.with_mobilenet import PoseEstimationWithMobileNet
from modules.load_state import load_state
import torch
import argparse

def convert_to_onnx(net, output_name):
    # 动态尺寸的输入
    input = torch.randn(1, 3, 256, 456)
    dynamic_axes = {
        'data': {2: 'height', 3: 'width'},  # 动态批处理大小和宽度
        'stage_0_output_1_heatmaps': {2: 'height_out', 3: 'width_out'},
        'stage_0_output_0_pafs': {2: 'height_out', 3: 'width_out'},
        'stage_1_output_1_heatmaps': {2: 'height_out', 3: 'width_out'},
        'stage_1_output_0_pafs': {2: 'height_out', 3: 'width_out'}
    }

    input_names = ['data']
    output_names = ['stage_0_output_1_heatmaps', 'stage_0_output_0_pafs',
                    'stage_1_output_1_heatmaps', 'stage_1_output_0_pafs']

    torch.onnx.export(net, input, output_name, verbose=True, input_names=input_names, output_names=output_names,
                      dynamic_axes=dynamic_axes,opset_version=15)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--checkpoint-path', type=str, required=True, help='path to the checkpoint')
    parser.add_argument('--output-name', type=str, default='human-pose-estimation.onnx',
                        help='name of output model in ONNX format')
    args = parser.parse_args()

    net = PoseEstimationWithMobileNet()
    checkpoint = torch.load(args.checkpoint_path)
    load_state(net, checkpoint)

    convert_to_onnx(net, args.output_name)

但是在生成axmodel模型时，出现问题，发现模型不接受动态输入大小，看来必须要接受恒定大小的静态输入:

2024-01-20 21:52:00.043 | WARNING  | yamain.command.build:fill_default:320 - ignore data csc config because of src_format is AutoColorSpace or src_format and tensor_format are the same
Traceback (most recent call last):
  File "<frozen yamain.common.error>", line 11, in wrapper
  File "<frozen yamain.command.build>", line 631, in optimize_onnx
  File "<frozen yamain.command.load_model>", line 633, in optimize_onnx_model
  File "<frozen frontend.parsers.onnx_parser>", line 71, in parse_onnx_model
  File "<frozen frontend.parsers.onnx_parser>", line 122, in parse_onnx_model_proto
  File "<frozen frontend.parser_utils>", line 34, in parse_value_info
  File "<frozen frontend.parser_utils>", line 28, in check_value_info
AssertionError: illegal value_info data: [1, 3, 'height', 'width']

于是笔者选择使用1_3_256_456作为恒定的输入大小，这里需要对原项目的代码做一些改动，因为为了实现1_3_256_456的输入大小，我们先要一个scale对图片进行放缩，让图片的height达到256或者width达到456，再进行padding，在达不到的边pad(0,0,0)，这里对源代码如下地方做了改动：

def infer_fast(net, img, net_input_size, stride, upsample_ratio, cpu,
               pad_value=(0, 0, 0), img_mean=np.array([128, 128, 128], np.float32), img_scale=np.float32(1/256)):
    height, width, _ = img.shape
    scale = min(net_input_size[0]/height,net_input_size[1]/width)

    scaled_img = cv2.resize(img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
    scaled_img = normalize(scaled_img, img_mean, img_scale)
    min_dims = [net_input_size[0], net_input_size[1]]
    padded_img, pad = pad_width(scaled_img, pad_value, min_dims)

    tensor_img = torch.from_numpy(padded_img).permute(2, 0, 1).unsqueeze(0).float()

def pad_width(img, pad_value, min_dims):
    h, w, _ = img.shape
    pad = []
    pad.append(int(math.floor((min_dims[0] - h) / 2.0)))
    pad.append(int(math.floor((min_dims[1] - w) / 2.0)))
    pad.append(int(min_dims[0] - h - pad[0]))
    pad.append(int(min_dims[1] - w - pad[1]))
    padded_img = cv2.copyMakeBorder(img, pad[0], pad[2], pad[1], pad[3],
                                    cv2.BORDER_CONSTANT, value=pad_value)
    return padded_img, pad

再通过如下代码，便可以导出来onnx模型，这里笔者用了simplifier库做了简化，发现差别并不大，因此就直接用onnx模型：

import argparse

import torch

from models.with_mobilenet import PoseEstimationWithMobileNet
from modules.load_state import load_state


def convert_to_onnx(net, output_name):
    input = torch.randn(1, 3, 256, 456)
    input_names = ['data']
    output_names = ['stage_0_output_1_heatmaps', 'stage_0_output_0_pafs',
                    'stage_1_output_1_heatmaps', 'stage_1_output_0_pafs']

    torch.onnx.export(net, input, output_name, verbose=True, input_names=input_names, output_names=output_names)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--checkpoint-path', type=str, required=True, help='path to the checkpoint')
    parser.add_argument('--output-name', type=str, default='human-pose-estimation.onnx',
                        help='name of output model in ONNX format')
    args = parser.parse_args()

    net = PoseEstimationWithMobileNet()
    checkpoint = torch.load(args.checkpoint_path)
    load_state(net, checkpoint)

    convert_to_onnx(net, args.output_name)

导出来的模型后面就需要用pulsar2的工具进行转换为了，但是没有现成的校准数据集，因此只能用openpose训练时的coco数据集中选择图片作为数据集。先从COCO - Common Objects in Context (cocodataset.org)下载coco数据集，然后根据annotation选择类别为person的照片，我这里选择了30张比较完整身躯的人类图片，有一个人的也有多个人的，为了合适先把这些图片按照上述预处理先进行了处理，然后组成了calibration.tar:

之后我们按照之前的方法，先写个config.json，这里写的结果如下：

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "data",
        "calibration_dataset": "./dataset/calibration_data.tar",
        "calibration_size": 30,
        "calibration_mean": [128,128,128],
        "calibration_std": [256,256,256]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method":"EndToEnd"
  },
  "input_processors": [
    {
      "tensor_name": "data",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
    }
  ],
  "output_processors": [
    {
      "tensor_name": "stage_0_output_1_heatmaps",
    },
    {
      "tensor_name": "stage_0_output_0_pafs",
    },
    {
      "tensor_name": "stage_1_output_1_heatmaps",
    },
    {
      "tensor_name": "stage_1_output_0_pafs",
    }
  ],
  "compiler": {
    "check": 0
  }
}

然后再按照如下的指令生成axmodel:

pulsar2 build --input model/human-pose-estimation.onnx --output_dir output --config config/lightweight_openpose_config.json

可以看一下结果：

2024-01-21 00:23:17.469 | WARNING  | yamain.command.build:fill_default:320 - ignore data csc config because of src_format is AutoColorSpace or src_format and tensor_format are the same
Building onnx ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2024-01-21 00:23:19.247 | INFO     | yamain.command.build:build:444 - save optimized onnx to [output/frontend/optimized.onnx]
2024-01-21 00:23:19.254 | INFO     | yamain.common.util:extract_archive:21 - extract [dataset/calibration_data.tar] to [output/quant/dataset/data]...
                                                   Quant Config Table
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Input ┃ Shape            ┃ Dataset Directory ┃ Data Format ┃ Tensor Format ┃ Mean               ┃ Std                ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ data  │ [1, 3, 256, 456] │ data              │ Image       │ BGR           │ [128.0, 128.0,     │ [128.0, 128.0,     │
│       │                  │                   │             │               │ 128.0]             │ 128.0]             │
└───────┴──────────────────┴───────────────────┴─────────────┴───────────────┴────────────────────┴────────────────────┘
Transformer optimize level: 0
30 File(s) Loaded.
[00:23:24] AX LSTM Operation Format Pass Running ...      Finished.
[00:23:24] AX Set MixPrecision Pass Running ...           Finished.
[00:23:24] AX Refine Operation Config Pass Running ...    Finished.
[00:23:24] AX Reset Mul Config Pass Running ...           Finished.
[00:23:24] AX Tanh Operation Format Pass Running ...      Finished.
[00:23:24] AX Confused Op Refine Pass Running ...         Finished.
[00:23:24] AX Quantization Fusion Pass Running ...        Finished.
[00:23:24] AX Quantization Simplify Pass Running ...      Finished.
[00:23:24] AX Parameter Quantization Pass Running ...     Finished.
Calibration Progress(Phase 1): 100%|████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.03it/s]
Finished.
[00:23:31] AX Passive Parameter Quantization Running ...  Finished.
[00:23:31] AX Parameter Baking Pass Running ...           Finished.
[00:23:31] AX Refine Int Parameter Pass Running ...       Finished.
[00:23:31] AX Refine Weight Parameter Pass Running ...    Finished.
--------- Network Snapshot ---------
Num of Op:                    [117]
Num of Quantized Op:          [117]
Num of Variable:              [226]
Num of Quantized Var:         [226]
------- Quantization Snapshot ------
Num of Quant Config:          [350]
BAKED:                        [54]
OVERLAPPED:                   [171]
ACTIVATED:                    [74]
PASSIVE_BAKED:                [51]
Network Quantization Finished.
quant.axmodel export success: output/quant/quant_axmodel.onnx
===>export per layer debug_data(float data) to folder: output/quant/debug/float
Writing npy... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
===>export input/output data to folder: output/quant/debug/test_data_set_0
Building native ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
/usr/local/lib/python3.9/site-packages/scipy/spatial/distance.py:620: RuntimeWarning: invalid value encountered in float_scalars
  dist = 1.0 - uv / np.sqrt(uu * vv)
                                      
Building native ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2024-01-21 00:23:37.654 | WARNING  | yamain.command.load_model:pre_process:454 - preprocess tensor [data]
2024-01-21 00:23:37.654 | INFO     | yamain.command.load_model:pre_process:456 - tensor: data, (1, 3, 256, 456), U8
2024-01-21 00:23:37.654 | INFO     | yamain.command.load_model:pre_process:456 - op: op:pre_dequant_1, AxDequantizeLinear, {'const_inputs': {'x_zeropoint': array(0, dtype=int32), 'x_scale': array(1., dtype=float32)}, 'output_dtype': <class 'numpy.float32'>, 'quant_method': 0}
2024-01-21 00:23:37.654 | INFO     | yamain.command.load_model:pre_process:456 - tensor: tensor:pre_norm_1, (1, 3, 256, 456), FP32
2024-01-21 00:23:37.654 | INFO     | yamain.command.load_model:pre_process:456 - op: op:pre_norm_1, AxNormalize, {'dim': 1, 'mean': [128.0, 128.0, 128.0], 'std': [128.0, 128.0, 128.0]}
2024-01-21 00:23:37.654 | WARNING  | yamain.command.load_model:post_process:475 - postprocess tensor [stage_0_output_1_heatmaps]
2024-01-21 00:23:37.654 | WARNING  | yamain.command.load_model:post_process:475 - postprocess tensor [stage_0_output_0_pafs]
2024-01-21 00:23:37.654 | WARNING  | yamain.command.load_model:post_process:475 - postprocess tensor [stage_1_output_1_heatmaps]
2024-01-21 00:23:37.654 | WARNING  | yamain.command.load_model:post_process:475 - postprocess tensor [stage_1_output_0_pafs]
tiling op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92/92 0:00:00
new_ddr_tensor = []
build op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 389/389 0:00:00
add ddr swap...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 738/738 0:00:00
calc input dependencies...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1039/1039 0:00:00
calc output dependencies...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1039/1039 0:00:00
assign eu heuristic   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1039/1039 0:00:00
assign eu onepass   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1039/1039 0:00:00
assign eu greedy   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1039/1039 0:00:00
2024-01-21 00:23:39.899 | INFO     | yasched.test_onepass:results2model:2004 - max_cycle = 4,752,230
2024-01-21 00:23:40.217 | INFO     | yamain.command.build:compile_npu_subgraph:1076 - QuantAxModel macs: 7,656,013,824
2024-01-21 00:23:40.218 | INFO     | yamain.command.build:compile_npu_subgraph:1084 - use random data as gt input: data, uint8, (1, 3, 256, 456)
2024-01-21 00:23:42.095 | INFO     | yamain.command.build:compile_ptq_model:1003 - fuse 1 subgraph(s)

虽然成功生成了，但是当我们把这个model放到板子上时，发现虽然能生成结果，但是无法生成和之前一样的结果，为了查找原因，笔者选择在电脑上先跑一下结果看一下：

我们执行如下的指令

python3
pose_detection.py --pre_processing --image_path sim_images/cxk.jpg
--axmodel_path models/compiled.axmodel --intermediate_path sim_inputs/0

post_detecion.py的pre_preocessing函数如下：

def pre_processing(args):

image_path = Path(args.image_path)
if not image_path.exists():
    raise FileNotFoundError(f"Not found image file at '{image_path}'")

axmodel_path = Path(args.axmodel_path)
if not axmodel_path.exists():
    raise FileNotFoundError(f"Not found compiled axmodel at '{axmodel_path}'")


pad_value=(0, 0, 0)
img = cv2.imread(str(image_path), cv2.IMREAD_COLOR)
height, width, _ = img.shape
net_input_size = np.array([256,456])
scale = min(net_input_size[0]/height,net_input_size[1]/width)

scaled_img = cv2.resize(img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
min_dims = [net_input_size[0], net_input_size[1]]
padded_img, pad = pad_width(scaled_img, pad_value, min_dims)

input_names = get_input_info(axmodel_path)
if len(input_names) != 1:
    raise NotImplementedError(f"Currently only supports length 1, but got {input_names}")

intermediate_path = Path(args.intermediate_path)
intermediate_path.mkdir(exist_ok=True, parents=True)

output_path = intermediate_path / f"{sanitize(input_names[0])}.bin"
output_path.write_bytes(padded_img.tobytes())
LOGGER.info(f"Write [{input_names[0]}] to '{output_path}' successfully.")

结果是：

[I]Write [data] to 'sim_inputs/0/data.bin' successfully.

然后再执行如下指令：

pulsar2 run --model models/compiled.axmodel --input_dir sim_inputs --output_dir sim_outputs --list list.txt

可以看到输出的四个矩阵分别作为四个bin文件存储在output的文件夹里面，这里后处理可以接着修改post_detecion.py的post_preocessing函数，其实主要功能也就是读取bin文件的数据，笔者这里自己写了一个代码进行output:

from pathlib import Path
from typing import Dict, List, Tuple
import cv2
import numpy as np
import onnx

from pulsar2_run_helper.utils import get_tensor_value_info, sanitize

from torch import Tensor

def get_output_info(model_path: str):
    """
    Returns the shape and tensor type of all outputs.
    """

    model_obj = onnx.load(model_path)
    model_graph = model_obj.graph

    output_info = {}
    for tensor_info in model_graph.output:
        output_info.update({tensor_info.name: get_tensor_value_info(tensor_info)})
    return output_info

output_info = get_output_info('pulsar2-run-helper\\\\models\\\\compiled.axmodel')

output_data: Dict[str, np.ndarray] = {}
for k, v in output_info.items():
    data_path = Path(f"pulsar2-run-helper/sim_outputs/0/{sanitize(k)}.bin")
    if not data_path.exists():
        raise FileNotFoundError(
            f"Could not find the expected key '{k}', please double check your pulsar run output directory.",
        )
    data = data_path.read_bytes()
    output_data[k] = np.frombuffer(data, dtype=v["tensor_type"]).reshape(v["shape"]).copy()

stage2_heatmaps = np.array(output_data['stage_1_output_1_heatmaps'])
stage2_pafs = np.array(output_data['stage_1_output_0_pafs'])

upsample_ratio = 4
heatmaps = np.transpose(stage2_heatmaps.squeeze().cpu().data.numpy(), (1, 2, 0))
heatmaps = cv2.resize(heatmaps, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC)

pafs = np.transpose(stage2_pafs.squeeze().cpu().data.numpy(), (1, 2, 0))
pafs = cv2.resize(pafs, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC)

作者把需要用到的两个输出，stage_1_output_0_pafs和stage_1_output_1_heatmaps，把本来的onnx模型运行结果，和这里axmodel运行结果进行对比，来比较下输出有什么不同。因为代码后续操作就是根据heatmaps使用nvs筛选关节点，再利用pafs进行配对，所以只要pafs和heatmaps输出一致，那么后续就应该没有问题，但是比较后发现差别还是挺大的：

原onnx的输出：

[[[-1.52968132e-05  4.39210387e-04  4.67733509e-04 ...  1.89101411e-04
    2.96855404e-04  9.99360263e-01]
  [-2.03267518e-05  4.37346724e-04  4.73105902e-04 ...  1.85395606e-04
    2.94736557e-04  9.99273658e-01]
  [-4.39256692e-05  4.27683495e-04  4.98184003e-04 ...  1.69247418e-04
    2.85554648e-04  9.98842716e-01]
  ...
  [ 1.48566623e-05  6.13198732e-04  1.83847776e-04 ... -1.04615065e-04
    8.77983111e-05  9.97699857e-01]
  [ 2.27145174e-05  6.29885879e-04  2.23268275e-04 ... -8.43701637e-05
    1.05829211e-04  9.97494936e-01]
  [ 2.40934714e-05  6.33706921e-04  2.31671875e-04 ... -8.03298753e-05
    1.09530112e-04  9.97451067e-01]]

 [[-1.54063455e-05  4.41880926e-04  4.65514458e-04 ...  1.83200624e-04
    2.94604048e-04  9.99336004e-01]
  [-2.04233584e-05  4.39986703e-04  4.70740197e-04 ...  1.79557086e-04
    2.92459794e-04  9.99250889e-01]
  [-4.39912328e-05  4.30239772e-04  4.95205459e-04 ...  1.63641351e-04
    2.83144385e-04  9.98827040e-01]
  ...
  [ 1.50406322e-05  6.05818816e-04  1.78435293e-04 ... -1.01274185e-04
    8.60481086e-05  9.97708738e-01]
  [ 2.18633922e-05  6.21417770e-04  2.17492474e-04 ... -8.17274704e-05
    1.03465252e-04  9.97506917e-01]
  [ 2.30273308e-05  6.25000568e-04  2.25825745e-04 ... -7.78278918e-05
    1.07039421e-04  9.97463822e-01]]

 [[-1.56860224e-05  4.57458053e-04  4.53745743e-04 ...  1.56425449e-04
    2.84618087e-04  9.99214590e-01]
  [-2.06957775e-05  4.55345347e-04  4.58224851e-04 ...  1.53032437e-04
    2.82332650e-04  9.99136090e-01]
  [-4.43626486e-05  4.44885925e-04  4.79539827e-04 ...  1.38012576e-04
    2.72285804e-04  9.98743176e-01]
  ...
  [ 1.64749854e-05  5.74149657e-04  1.52241788e-04 ... -8.50291181e-05
    7.78756585e-05  9.97737229e-01]
  [ 1.84111286e-05  5.84639900e-04  1.89980768e-04 ... -6.85549094e-05
    9.25216373e-05  9.97549713e-01]
  [ 1.85594690e-05  5.87106333e-04  1.98066118e-04 ... -6.52728268e-05
    9.55235009e-05  9.97509658e-01]]

 ...

 [[ 1.60911601e-04  7.16453709e-04  1.12063228e-03 ...  1.95305420e-05
    2.95104983e-04  9.93238151e-01]
  [ 1.55670175e-04  7.07174186e-04  1.10480236e-03 ...  1.61740845e-05
    2.93474703e-04  9.93373156e-01]
  [ 1.32725501e-04  6.63083512e-04  1.03514048e-03 ...  1.66020197e-06
    2.86501076e-04  9.93972063e-01]
  ...
  [-1.28306638e-04  3.12105549e-04 -1.59716117e-04 ...  3.75063137e-05
    3.05537433e-05  9.99176919e-01]
  [-1.35803712e-04  3.12552031e-04 -1.45055514e-04 ...  1.80985189e-05
    3.50762166e-05  9.99347508e-01]
  [-1.37439521e-04  3.12695687e-04 -1.42450066e-04 ...  1.42308554e-05
    3.60244339e-05  9.99381363e-01]]

 [[ 2.10002720e-04  8.25065654e-04  1.23704271e-03 ...  4.70072628e-05
    3.27726681e-04  9.93095756e-01]
  [ 2.03197109e-04  8.12154845e-04  1.21757423e-03 ...  4.28054227e-05
    3.24971421e-04  9.93218422e-01]
  [ 1.73336040e-04  7.51663989e-04  1.13101187e-03 ...  2.43920495e-05
    3.12652002e-04  9.93755341e-01]
  ...
  [-1.51157365e-04  2.43631308e-04 -1.81165044e-04 ...  5.54859871e-05
    2.44969706e-05  9.99618769e-01]
  [-1.55363508e-04  2.51566060e-04 -1.59615520e-04 ...  3.67974717e-05
    3.21547923e-05  9.99756753e-01]
  [-1.56319831e-04  2.53219740e-04 -1.55611167e-04 ...  3.31435440e-05
    3.37603196e-05  9.99783993e-01]]

 [[ 2.20244023e-04  8.48500291e-04  1.26282580e-03 ...  5.30014295e-05
    3.34725075e-04  9.93052840e-01]
  [ 2.13104737e-04  8.34821782e-04  1.24257524e-03 ...  4.86175632e-05
    3.31732503e-04  9.93173242e-01]
  [ 1.81768570e-04  7.70858896e-04  1.15238572e-03 ...  2.93666471e-05
    3.18286417e-04  9.93698537e-01]
  ...
  [-1.56104667e-04  2.29996047e-04 -1.85998841e-04 ...  5.89654264e-05
    2.30604492e-05  9.99710977e-01]
  [-1.59631090e-04  2.39599918e-04 -1.62955912e-04 ...  4.04056809e-05
    3.13900709e-05  9.99843001e-01]
  [-1.60447336e-04  2.41591828e-04 -1.58648603e-04 ...  3.67916145e-05
    3.31362789e-05  9.99869108e-01]]]

但是转换后axmodel的输出：

[[[ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9400777e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9417037e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9492270e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00  4.6832468e-03 ...  0.0000000e+00
    0.0000000e+00  9.9824995e-01]
  [ 0.0000000e+00  0.0000000e+00  4.7707101e-03 ...  0.0000000e+00
    0.0000000e+00  9.9908477e-01]
  [ 0.0000000e+00  0.0000000e+00  4.7885687e-03 ...  0.0000000e+00
    0.0000000e+00  9.9926531e-01]]

 [[ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9400777e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9417037e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9492270e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00  4.5393431e-03 ...  0.0000000e+00
    0.0000000e+00  9.9826699e-01]
  [ 0.0000000e+00  0.0000000e+00  4.5964858e-03 ...  0.0000000e+00
    0.0000000e+00  9.9907321e-01]
  [ 0.0000000e+00  0.0000000e+00  4.6081534e-03 ...  0.0000000e+00
    0.0000000e+00  9.9924749e-01]]

 [[ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9395919e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9412346e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9488354e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00  3.8737776e-03 ...  0.0000000e+00
    0.0000000e+00  9.9838972e-01]
  [ 0.0000000e+00  0.0000000e+00  3.7901415e-03 ...  0.0000000e+00
    0.0000000e+00  9.9906290e-01]
  [ 0.0000000e+00  0.0000000e+00  3.7731556e-03 ...  0.0000000e+00
    0.0000000e+00  9.9920863e-01]]

 ...

 [[ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9916482e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9901927e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9830627e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00 -3.1781812e-05 ...  0.0000000e+00
    0.0000000e+00  9.9830580e-01]
  [ 0.0000000e+00  0.0000000e+00 -7.8918245e-05 ...  0.0000000e+00
    0.0000000e+00  9.9901927e-01]
  [ 0.0000000e+00  0.0000000e+00 -9.7570926e-05 ...  0.0000000e+00
    0.0000000e+00  9.9916482e-01]]

 [[ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9924749e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9907321e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681405e-03 ...  0.0000000e+00
    0.0000000e+00  9.9821997e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00 -7.8918245e-05 ...  0.0000000e+00
    0.0000000e+00  9.9821991e-01]
  [ 0.0000000e+00  0.0000000e+00  2.1989405e-05 ...  0.0000000e+00
    0.0000000e+00  9.9907327e-01]
  [ 0.0000000e+00  0.0000000e+00  3.3657252e-05 ...  0.0000000e+00
    0.0000000e+00  9.9924749e-01]]

 [[ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9926525e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9908489e-01]
  [ 0.0000000e+00  0.0000000e+00  4.2681401e-03 ...  0.0000000e+00
    0.0000000e+00  9.9820137e-01]
  ...
  [ 0.0000000e+00  0.0000000e+00 -9.7570926e-05 ...  0.0000000e+00
    0.0000000e+00  9.9820119e-01]
  [ 0.0000000e+00  0.0000000e+00  3.3657252e-05 ...  0.0000000e+00
    0.0000000e+00  9.9908483e-01]
  [ 0.0000000e+00  0.0000000e+00  5.1516203e-05 ...  0.0000000e+00
    0.0000000e+00  9.9926525e-01]]]

发现差别还是很大的，这里一个简单的推测是由于量化时的离群值，数据分布差距比较大，导致较小的数据(~1e-4/1e-5)的数据量化时直接被忽略为0了，导致量化后出现很多的0值。

后续笔者尝试了一些方法，但都不太行：

1、直接用FP精度的模型，但是不知道为什么pulsar2好像无法用FP32/FP16精度的模型，必须要用量化量化为int精度的模型。

2、在官方文档里说明了可以Quantized ONNX 模型导入，但是这一块需要先进行QAT，作者对这一块可能不太会，也就放弃了。

作者把资料整理一下放在这里，有需要的朋友可以后续进行研究：

https://drive.google.com/file/d/1ON6SWXVpFrJBKrn9OXREepWn4q0raNbi/view?usp=drive_link

更多回帖

萌萌萌

【爱芯派 Pro 开发板试用体验】人体姿态估计模型部署后期尝试

1、movenet

2、openpose/lightweight openpose

相关帖子

【爱芯派 Pro 开发板试用体验】人体姿态估计模型部署后期尝试

【爱芯派 Pro 开发板试用体验】人体姿态估计模型部署前期准备

使用爱芯派Pro开发板部署人体姿态估计模型

【爱芯派 Pro 开发板试用体验】部署爱芯派官方YOLOV5模型

【爱芯派 Pro 开发板试用体验】模型部署（以mobilenetV2为例）

【爱芯派 Pro 开发板试用体验】在爱芯派 Pro上部署坐姿检测

【爱芯派 Pro 开发板试用体验】利用爱芯派 Pro部署USB摄像头

【爱芯派 Pro 开发板试用体验】开箱及配置

【爱芯派 Pro 开发板试用体验】实际场景应用

【爱芯派 Pro 开发板试用体验】+基本使用环境配置

20万+工程师都在用，免费PCB检查工具