本文内容来自先楫开发者 @Xusiwei1236,介绍了如何在HPM6750上运行边缘AI框架,感兴趣的小伙伴快点来看看
--------------- 以下为测评内容 ---------------
TFLM是什么?
你或许都听说过TensorFlow——由谷歌开发并开源的一个机器学习库,它支持模型训练和模型推理。
今天介绍的TFLM,全称是TensorFlow Lite for Microcontrollers,翻译过来就是“针对微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢?
TensorFlow Lite(通常简称TFLite)其实是TensorFlow团队为了将模型部署到移动设备而开发的一套解决方案,通俗的说就是手机版的TensorFlow。下面是TensorFlow官网上关于TFLite的一段介绍:
“TensorFlow Lite 是一组工具,可帮助开发者在移动设备、嵌入式设备和 loT 设备上运行模型,以便实现设备端机器学习。”
而我们今天要介绍的TensorFlow Lite for Microcontrollers(TFLM)则是 TensorFlow Lite的微控制器版本。这里是官网上的一段介绍:
“ TensorFlow Lite for Microcontrollers (以下简称TFLM)是 TensorFlow Lite 的一个实验性移植版本,它适用于微控制器和其他一些仅有数千字节内存的设备。它可以直接在“裸机”上运行,不需要操作系统支持、任何标准 C/C++ 库和动态内存分配。核心运行时(core runtime)在 Cortex M3 上运行时仅需 16KB,加上足以用来运行语音关键字检测模型的操作,也只需 22KB 的空间。”
这三者一脉相承,都出自谷歌,区别是TensorFlow同时支持训练和推理,而后两者只支持推理。TFLite主要用于支持手机、平板等移动设备,TFLM则可以支持单片机。从发展历程上来说,后两者都是TensorFlow项目的“支线项目”。或者说这三者是一个树形的发展过程,具体来说,TFLite是从TensorFlow项目分裂出来的,TFLite-Micro是从TFLite分裂出来的,目前是三个并行发展的。在很长一段时间内,这三个项目的源码都在一个代码仓中维护,从源码目录的包含关系上来说,TensorFlow包含后两者,TFLite包含tflite-micro。
HPM SDK中的TFLM
HPM SDK中集成了TFLM中间件(类似库,但是没有单独编译为库),位于hpm_sdk\middleware子目录:
这个子目录的代码是由TFLM开源项目裁剪而来,删除了很多不需要的文件。
TFLM示例
HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目录:

示例代码是从官方的persion_detection示例修改而来,添加了摄像头采集图像和LCD显示结果。
由于我手里没有配套的摄像头和显示屏,所以本篇没有以这个示例作为实验。
在HPM6750上运行TFLM基准测试
接下来以person detection benchmark为例,讲解如何在HPM6750上运行TFLM基准测试。
按照如下步骤,在HPM SDK环境中添加person detection benchmark源代码文件:
在HPM SDK的samples子目录创建tflm_person_detect_benchmark目录,并在其中创建src目录;
从上文描述的已经运行过person detection benchmark的tflite-micro目录中拷贝如下文件到src目录:
tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc
tensorflow\lite\micro\benchmarks\micro_benchmark.h
tensorflow\lite\micro\examples\person_detection\model_settings.h
tensorflow\lite\micro\examples\person_detection\model_settings.cc
在src目录创建testdata子目录,并将tflite-micro目录下如下目录中的文件拷贝全部到testdata中:
tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata
修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include预处理指令的文件路径(根据拷贝后的相对路径修改);
person_detection_benchmark.cc文件中,main函数的一开始添加一行board_init();、顶部添加一行#include "board.h”
在src平级创建CMakeLists.txt文件,内容如下:
cmake_minimum_required(VERSION 3.13)
set(CONFIG_TFLM 1)
find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})project(tflm_person_detect_benchmark)set(CMAKE_CXX_STANDARD 11)
sdk_app_src(src/model_settings.cc)sdk_app_src(src/person_detection_benchmark.cc)sdk_app_src(src/testdata/no_person_image_data.cc)sdk_app_src(src/testdata/person_image_data.cc)
sdk_app_inc(src)sdk_ld_options("-lm")sdk_ld_options("--std=c++11")sdk_compile_definitions(__HPMICRO__)sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)# sdk_compile_options("-mabi=ilp32f")# sdk_compile_options("-march=rv32imafc")sdk_compile_options("-O2")# sdk_compile_options("-O3")set(SEGGER_LEVEL_O3 1)generate_ses_project()
在src平级创建app.yaml文件,内容如下:
dependency: - tflm
接下来就是大家熟悉的——编译运行了。首先,使用generate_project生产项目:
接着,将HPM6750开发板连接到PC,在Embedded Studio中打卡刚刚生产的项目:
这个项目因为引入了TFLM的源码,文件较多,所以右边的源码导航窗里面的Indexing要执行很久才能结束。
然后,就可以使用F7编译、F5调试项目了:

编译完成后,先打卡串口终端连接到设备串口,波特率115200。启动调试后,直接继续运行,就可以在串口终端中看到基准测试的输出了:
============================== hpm6750evkmini clock summary==============================cpu0: 816000000Hzcpu1: 816000000Hzaxi0: 200000000Hzaxi1: 200000000Hzaxi2: 200000000Hzahb: 200000000Hzmchtmr0: 24000000Hzmchtmr1: 1000000Hzxpi0: 133333333Hzxpi1: 400000000Hzdram: 166666666Hzdisplay: 74250000Hzcam0: 59400000Hzcam1: 59400000Hzjpeg: 200000000Hzpdma: 200000000Hz==============================
----------------------------------------------------------------------$$\ $$\ $$$$$$$\ $$\ $$\ $$\$$ | $$ |$$ __$$\ $$$\ $$$ |\__|$$ | $$ |$$ | $$ |$$$$\ $$$$ |$$\ $$$$$$$\ $$$$$$\ $$$$$$\$$$$$$$$ |$$$$$$$ |$$\$$\$$ $$ |$$ |$$ _____|$$ __$$\ $$ __$$\$$ __$$ |$$ ____/ $$ \$$$ $$ |$$ |$$ / $$ | \__|$$ / $$ |$$ | $$ |$$ | $$ |\$ /$$ |$$ |$$ | $$ | $$ | $$ |$$ | $$ |$$ | $$ | \_/ $$ |$$ |\$$$$$$$\ $$ | \$$$$$$ |\__| \__|\__| \__| \__|\__| \_______|\__| \______/----------------------------------------------------------------------InitializeBenchmarkRunner took 114969 ticks (4 ms).
WithPersonDataIterations(1) took 10694521 ticks (445 ms)DEPTHWISE_CONV_2D took 275798 ticks (11 ms).DEPTHWISE_CONV_2D took 280579 ticks (11 ms).CONV_2D took 516051 ticks (21 ms).DEPTHWISE_CONV_2D took 139000 ticks (5 ms).CONV_2D took 459646 ticks (19 ms).DEPTHWISE_CONV_2D took 274903 ticks (11 ms).CONV_2D took 868518 ticks (36 ms).DEPTHWISE_CONV_2D took 68180 ticks (2 ms).CONV_2D took 434392 ticks (18 ms).DEPTHWISE_CONV_2D took 132918 ticks (5 ms).CONV_2D took 843014 ticks (35 ms).DEPTHWISE_CONV_2D took 33228 ticks (1 ms).CONV_2D took 423288 ticks (17 ms).DEPTHWISE_CONV_2D took 62040 ticks (2 ms).CONV_2D took 833033 ticks (34 ms).DEPTHWISE_CONV_2D took 62198 ticks (2 ms).CONV_2D took 834644 ticks (34 ms).DEPTHWISE_CONV_2D took 62176 ticks (2 ms).CONV_2D took 838212 ticks (34 ms).DEPTHWISE_CONV_2D took 62206 ticks (2 ms).CONV_2D took 832857 ticks (34 ms).DEPTHWISE_CONV_2D took 62194 ticks (2 ms).CONV_2D took 832882 ticks (34 ms).DEPTHWISE_CONV_2D took 16050 ticks (0 ms).CONV_2D took 438774 ticks (18 ms).DEPTHWISE_CONV_2D took 27494 ticks (1 ms).CONV_2D took 974362 ticks (40 ms).AVERAGE_POOL_2D took 2323 ticks (0 ms).CONV_2D took 1128 ticks (0 ms).RESHAPE took 184 ticks (0 ms).SOFTMAX took 2249 ticks (0 ms).
NoPersonDataIterations(1) took 10694160 ticks (445 ms)DEPTHWISE_CONV_2D took 274922 ticks (11 ms).DEPTHWISE_CONV_2D took 281095 ticks (11 ms).CONV_2D took 515380 ticks (21 ms).DEPTHWISE_CONV_2D took 139428 ticks (5 ms).CONV_2D took 460039 ticks (19 ms).DEPTHWISE_CONV_2D took 275255 ticks (11 ms).CONV_2D took 868787 ticks (36 ms).DEPTHWISE_CONV_2D took 68384 ticks (2 ms).CONV_2D took 434537 ticks (18 ms).DEPTHWISE_CONV_2D took 133071 ticks (5 ms).CONV_2D took 843202 ticks (35 ms).DEPTHWISE_CONV_2D took 33291 ticks (1 ms).CONV_2D took 423388 ticks (17 ms).DEPTHWISE_CONV_2D took 62190 ticks (2 ms).CONV_2D took 832978 ticks (34 ms).DEPTHWISE_CONV_2D took 62205 ticks (2 ms).CONV_2D took 834636 ticks (34 ms).DEPTHWISE_CONV_2D took 62213 ticks (2 ms).CONV_2D took 838212 ticks (34 ms).DEPTHWISE_CONV_2D took 62239 ticks (2 ms).CONV_2D took 832850 ticks (34 ms).DEPTHWISE_CONV_2D took 62217 ticks (2 ms).CONV_2D took 832856 ticks (34 ms).DEPTHWISE_CONV_2D took 16040 ticks (0 ms).CONV_2D took 438779 ticks (18 ms).DEPTHWISE_CONV_2D took 27481 ticks (1 ms).CONV_2D took 974354 ticks (40 ms).AVERAGE_POOL_2D took 1812 ticks (0 ms).CONV_2D took 1077 ticks (0 ms).RESHAPE took 341 ticks (0 ms).SOFTMAX took 901 ticks (0 ms).
WithPersonDataIterations(10) took 106960312 ticks (4456 ms)
NoPersonDataIterations(10) took 106964554 ticks (4456 ms)可以看到,在HPM6750EVKMINI开发板上,连续运行10次人像检测模型,总体耗时4456毫秒,每次平均耗时445.6毫秒。
在树莓派3B+上运行TFLM基准测试
在树莓派上运行TFLM基准测试
树莓派3B+上可以和PC上类似,直接运行PC端的测试命令,得到基准测试结果:

可以看到,在树莓派3B+上的,对于有人脸的图片,连续运行10次人脸检测模型,总体耗时4186毫秒,每次平均耗时418.6毫秒;对于无人脸的图片,连续运行10次人脸检测模型,耗时4190毫秒,每次平均耗时419毫秒。
HPM6750和树莓派3B+、AMD R7 4800H上的基准测试结果对比
这里将HPM6750EVKMINI开发板、树莓派3B+和AMD R7 4800H上运行人脸检测模型的平均耗时结果汇总如下:

可以看到,在TFLM人脸检测模型计算场景下,HPM6750EVKMINI和树莓派3B+成绩相当。虽然HPM6750的816MHz CPU频率比树莓派3B+搭载的BCM2837 Cortex-A53 1.4GHz的主频低,但是在单核心计算能力上没有相差太多。
这里树莓派3B+上的TFLM基准测试程序是运行在64位Debian Linux发行版上的,而HPM6750上的测试程序是直接运行在裸机上的。由于操作系统内核中任务调度器的存在,会对CPU的计算能力带来一定损耗。所以,这里进行的并不是一个严格意义上的对比测试,测试结果仅供参考。
全部0条评论
快来发表一下你的评论吧 !