瑞芯微Rockchip开发者社区
直播中

李红

7年用户 1223经验值
私信 关注
[问答]

RK3588 RedeceMax OP在CPU上运行耗时过大怎么解决

1.jpg

问题描述及复现步骤:

在简单的自定网络结构中,需要实现(B, C, H, W) 降维成 (B, C, W)
使用ReduceMax OP + Reshape OP来实现此功能,发现ReduceMax OP是在CPU上运行的,耗时很大(约140ms)。

rk3588开发板上实测结果如下:

D RKNN: [11:25:59.947] ID   OpType           DataType Target InputShape                                   OutputShape            DDR Cycles     NPU Cycles     Total Cycles   Time(us)       MacUsage(%)    RW(KB)         FullName
D RKNN: [11:25:59.947] 0    InputOperator    INT8     CPU    \                                            (1,10,32,10000)        0              0              0              4              \              5000.00        InputOperator:voxels_input
D RKNN: [11:25:59.947] 1    ConvRelu         INT8     NPU    (1,10,32,10000),(64,10,1,1),(64)             (1,64,32,10000)        811751         200000         811751         3871           6.89           25001.50       Conv:Conv_0
D RKNN: [11:25:59.947] 2    ReduceMax        INT8     CPU    (1,64,32,10000)                              (1,64,1,10000)         0              0              0              139036         \              20625.00       ReduceMax:ReduceMax_2
D RKNN: [11:25:59.947] 3    Reshape          INT8     CPU    (1,64,1,10000),(4)                           (1,64,10000,1)         0              0              0              1048           \              1250.03        Reshape:Squeeze_3_2reshape
D RKNN: [11:25:59.947] 4    OutputOperator   INT8     CPU    (1,64,10000,1)                               \                      0              0              0              40             \              625.00         OutputOperator:pillar_features
D RKNN: [11:25:59.947] Total Operator Elapsed Time(us): 143999

---

另外,采用Maxpool替换ReduceMax OP,同样发现在CPU上运行,耗时很大(约130ms)。在rk3588开发板上实测结果如下:

D RKNN: [13:11:54.589] ID   OpType           DataType Target InputShape                                   OutputShape            DDR Cycles     NPU Cycles     Total Cycles   Time(us)       MacUsage(%)    RW(KB)         FullName
D RKNN: [13:11:54.589] 0    InputOperator    INT8     CPU    \                                            (1,10,32,10000)        0              0              0              4              \              5000.00        InputOperator:voxels_input
D RKNN: [13:11:54.589] 1    ConvRelu         INT8     NPU    (1,10,32,10000),(64,10,1,1),(64)             (1,64,32,10000)        811751         200000         811751         3873           6.89           25001.50       Conv:Conv_0
D RKNN: [13:11:54.589] 2    MaxPool          INT8     CPU    (1,64,32,10000)                              (1,64,1,10000)         0              0              0              130099         \              20625.00       MaxPool:MaxPool_2
D RKNN: [13:11:54.589] 3    Reshape          INT8     CPU    (1,64,1,10000),(4)                           (1,64,10000,1)         0              0              0              779            \              1250.03        Reshape:Squeeze_3_2reshape
D RKNN: [13:11:54.589] 4    OutputOperator   INT8     CPU    (1,64,10000,1)                               \                      0              0              0              28             \              625.00         OutputOperator:pillar_features
D RKNN: [13:11:54.589] Total Operator Elapsed Time(us): 134783

---

请问能否优化,使得reducemax opNPU上运行,提高速度。另外,为何使用maxpool op是在CPU上运行而非NPU

回帖(1)

王英

2022-8-24 17:15:52
你好像用错工具了,3588 应该使用二代的工具,rknn-toolkit2-v1.3.0,你用的是一代
举报

更多回帖

发帖
×
20
完善资料,
赚取积分