正点原子学习小组

zealsoft

3年用户 688经验值

擅长:可编程逻辑,嵌入式技术

私信关注

[经验]

【正点原子i.MX93开发板试用连载体验】基于深度学习的语音本地控制

有一段时间没有参加电子发烧友的开发板评测了，主要是不想总是重复以往做过的东西，希望在评测中想学点新东西突破一下自己。这次感谢电子发烧友william hill官网和正点原子给的评测机会，希望充分利用i.MX93开发板实现语音智能识别功能。

项目计划

1）根据文档，学习i.MX的AI开发环境和相关的程序框架。
2）利用TensorFlow Lite框架进行语音识别的模型建立和训练工作。
3）将所训练的模型移植到NXP的硬件平台上。
4）利用语音指令控制其他外设。
5）项目调试，分享设计经验。

开箱体验

正点原子的产品一直以做工细致、资料丰富著名。此次开箱后立刻就喜欢上了它的板子，真是漂亮，而且接口丰富，非常时候新手入门。

微信图片_20240630103316.jpg

音频播放测试

此次的主要功能是需要通过语音实现的，所以先测试了其语音功能。

微信图片_20240630103330.jpg

系统加电后，使用MobaXterm登录系统。开发板出厂系统里有音频配置和测试文件，按照如下指令执行音频测试脚本。

cd shell/audio
./atk_audioTest.sh

第一次运行该脚本时，会打印音频设备初始化相关操作，后续执行此脚本时不再打印初始化相关信息。按 Ctrl+c 组合键可以退出脚本。

初始化完音频设备后，输入数字 2 并确认即可播放音频测试，播放信息如下。
MobaXterm screenshot.png

期间板载扬声器会播放音频，声音响亮。

录音测试

还是运行刚才的脚本，初始化完音频设备后，输入数字 1 确认后，下一步选择麦克风测试项目，如果是使用带麦克风的耳机接在开发板 PHONE 接口则使用 1. 耳机麦克风；如果是没接耳机，直接使用开发板自带的板载麦克风 MIC，则使用 2. 板载麦克风。笔者使用的是开发板自带的板载麦克风MIC，这里选择第二项。选择好对应的麦克风配置后，脚本会自动进行录音，请靠近麦克风进行录音测试。录音完成后会在当前目录下生成 record.wav 文件，此文件就是笔者刚刚录音生成的音频文件。脚本在录音后会自动播放所录音频。感觉板载麦克风的噪音有点大，所以对音质要求苛刻的还是接耳机麦克风比较好。

基本测试先进行到这里，后续将进行编程测试。

回帖（6）

zealsoft

2024-7-2 21:40:08

这次的主要目标就是学习NXP的AI程序开发。首先阅读了《05【正点原子】ATK-DLIMX93嵌入式AI开发手册V1.0》文档，这个文档写得很清楚，不过我建议大家读一下原厂文档《i.MX Machine Learning User's Guide》，里面有些技术细节更清楚。下面就介绍一下自己对I.MX 93平台的测试。

i.MX 93支持在Cortex-A上进行CPU推理，也支持为 Arm 自研的 Ethos-U65（NPU）上进行推理。当然后者的推理速度比前者快很多。I.MX 93对不同推理的支持是通过选择委托（Delegate）来实现的。如果选择XNNPACK就是CPU推理，而使用Ethos-U委托就是采用NPU推理。NXP在其他硬件平台上还提供了更多的推理选项，这里就不讨论了。

我测试了厂商提供的图像分类程序，它在开发板的/usr/bin/eiq-examples-git/image_classification目录中。要运行这个程序，首先需要下载模型文件。/usr/bin/eiq-examples-git/download_models.py 这个脚本是用来下载模型的，不过这个脚本要访问谷歌网盘，不方便的朋友可以从正点原子的网盘下载所需要的文件。

安装好模型文件后，如果运行python3 label_image.py就执行的是CPU推理，需要63.815ms。

root@atk-imx93:/usr/bin/eiq-examples-git/image_classification# python3 label_image.py

0.878431: military uniform

0.027451: Windsor tie

0.011765: mortarboard

0.011765: bulletproof vest

0.007843: sax

time: 63.815ms

如果想使用NPU推理，就需要在命令行使用-d制定推理所需要的库文件：

python3 label_image.py  -d /usr/lib/libethosu_delegate.so

正点原子的文档说要使用NPU推理，需要用开发板上的vela 工具将tflite模型编译成可以使 NPU 进行推理的 vela 模型，而且模型只支持8位或16位量化。

使用NPU后，推理时间减少到4ms。

root@atk-imx93:/usr/bin/eiq-examples-git/image_classification# python3 label_image.py  -d /usr/lib/libethosu_delegate.so

INFO: Ethosu delegate: device_name set to /dev/ethosu0.

INFO: Ethosu delegate: cache_file_path set to .

INFO: Ethosu delegate: timeout set to 60000000000.

INFO: Ethosu delegate: enable_cycle_counter set to 0.

INFO: Ethosu delegate: enable_profiling set to 0.

INFO: Ethosu delegate: profiling_buffer_size set to 2048.

INFO: Ethosu delegate: pmu_event0 set to 0.

INFO: Ethosu delegate: pmu_event1 set to 0.

INFO: Ethosu delegate: pmu_event2 set to 0.

INFO: Ethosu delegate: pmu_event3 set to 0.

INFO: EthosuDelegate: 31 nodes delegated out of 31 nodes with 1 partitions.

0.874510: military uniform

0.031373: Windsor tie

0.015686: mortarboard

0.011765: bulletproof vest

0.007843: bow tie

time: 4.016ms

安装好模型文件后，如果运行python3 label_image.py就执行的是CPU推理，需要63.815ms。

root@atk-imx93:/usr/bin/eiq-examples-git/image_classification# python3 label_image.py

0.878431: military uniform

0.027451: Windsor tie

0.011765: mortarboard

0.011765: bulletproof vest

0.007843: sax

time: 63.815ms

如果想使用NPU推理，就需要在命令行使用-d制定推理所需要的库文件：

python3 label_image.py  -d /usr/lib/libethosu_delegate.so

正点原子的文档说要使用NPU推理，需要用开发板上的vela 工具将tflite模型编译成可以使 NPU 进行推理的 vela 模型，而且模型只支持8位或16位量化。

使用NPU后，推理时间减少到4ms。

root@atk-imx93:/usr/bin/eiq-examples-git/image_classification# python3 label_image.py  -d /usr/lib/libethosu_delegate.so

INFO: Ethosu delegate: device_name set to /dev/ethosu0.

INFO: Ethosu delegate: cache_file_path set to .

INFO: Ethosu delegate: timeout set to 60000000000.

INFO: Ethosu delegate: enable_cycle_counter set to 0.

INFO: Ethosu delegate: enable_profiling set to 0.

INFO: Ethosu delegate: profiling_buffer_size set to 2048.

INFO: Ethosu delegate: pmu_event0 set to 0.

INFO: Ethosu delegate: pmu_event1 set to 0.

INFO: Ethosu delegate: pmu_event2 set to 0.

INFO: Ethosu delegate: pmu_event3 set to 0.

INFO: EthosuDelegate: 31 nodes delegated out of 31 nodes with 1 partitions.

0.874510: military uniform

0.031373: Windsor tie

0.015686: mortarboard

0.011765: bulletproof vest

0.007843: bow tie

time: 4.016ms

zealsoft

2024-7-4 19:43:23

今天测试的内容是进行简单的音频分类。我们要想进行语音控制，就需要构建和训练一个基本的自动语音识别 (ASR) 模型来识别不同的单词。如果想了解这方面的知识可以参考TensorFlow的官方文档：简单的音频识别：识别关键词 | TensorFlow Core。

预训练模型来自Simple Audio Recognition on a Raspberry Pi using Machine Learning (I2S, TensorFlow Lite) - Electronut Labs，我在它提供的代码基础上进行了修改。NXP官方的Model Zoo也提供了类似的代码，不过它需要TensorFlow类，而开发板上默认提供Tflite runtime推理框架，所以我没有使用NXP的方案。

本模型使用 Speech Commands 数据集的一部分，其中包含命令的短（一秒或更短）音频片段，例如“down”、“go”、“left”、“no”、“right”、“stop”、“up”和“yes”。

数据集中的波形在时域中表示，通过计算短时傅里叶变换 (STFT) 将波形从时域信号转换为时频域信号，以将波形转换为[频谱图](频谱图_百度百科 (baidu.com))，显示频率随时间的变化，并且可以表示为二维图像。然后把频谱图图像输入您的神经网络以训练模型。

在前面提到网页中有模型训练的方法。这里采用的是已经训练好的模型。在模型推理部分，首先从wav文件中读取语音数据，如果是双声道的，只使用其中的一个声道。默认音频的采样率是16k，只提取音频中的1s数据进行测试。数据提取后，需要归一化，然后利用STFT转换为频谱图，再输入神经网络进行计算。

程序中使用了scipy库进行STFT处理，所以需要先安装scipy库，执行如下命令：

pip3 install scipy

我在i.MX 93开发板上运行了测试Python程序，它可以正确识别YES和NO。其中yes.wav是我自己录制的。

测试代码和wav文件参见*附件：simple_audio.zip

现在所用的模型是浮点的，只能用于CPU推理而不能使用NPU推理。如果希望使用NPU推理，需要将模型进行转换，并修改程序进行量化处理。

程序中使用了scipy库进行STFT处理，所以需要先安装scipy库，执行如下命令：

pip3 install scipy

我在i.MX 93开发板上运行了测试Python程序，它可以正确识别YES和NO。其中yes.wav是我自己录制的。

测试代码和wav文件参见*附件：simple_audio.zip

现在所用的模型是浮点的，只能用于CPU推理而不能使用NPU推理。如果希望使用NPU推理，需要将模型进行转换，并修改程序进行量化处理。

zealsoft

2024-7-5 19:59:45

昨天提到要使模型运行的NPU上，必须先将其量化。如果对没有量化的模型使用vela工具进行转换，工具会给出警告，所生成的模型仍然是只能运行在CPU上，而无法运行在NPU上的。

下面就是用vela工具对simple_audio_model_numpy.tflite文件进行转换的结果。

root@atk-imx93:~/shell/simple# vela simple_audio_model_numpy.tflite

Warning: Unsupported TensorFlow Lite semantics for RESIZE_BILINEAR 'sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: input_3, sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear

Warning: Unsupported TensorFlow Lite semantics for CONV_2D 'sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear, sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp1_reshape, sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2

Warning: Unsupported TensorFlow Lite semantics for CONV_2D 'sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2, sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D_reshape, sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp

Warning: Unsupported TensorFlow Lite semantics for MAX_POOL_2D 'sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp, sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool

Warning: Unsupported TensorFlow Lite semantics for RESHAPE 'sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool, sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape

Warning: Unsupported TensorFlow Lite semantics for FULLY_CONNECTED 'sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape, sequential_2/dense_4/MatMul;StatefulPartitionedCall/sequential_2/dense_4/MatMul_reshape, sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd

Warning: Unsupported TensorFlow Lite semantics for FULLY_CONNECTED 'Identity'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd, sequential_2/dense_5/MatMul;StatefulPartitionedCall/sequential_2/dense_5/MatMul_reshape, Identity



Network summary for simple_audio_model_numpy

Accelerator configuration               Ethos_U65_256

System configuration                 internal-default

Memory mode                          internal-default

Accelerator clock                                1000 MHz





CPU operators = 7 (100.0%)

NPU operators = 0 (0.0%)



Neural network macs                                 0 MACs/batch

Network Tops/s                                    nan Tops/s



NPU cycles                                          0 cycles/batch

SRAM Access cycles                                  0 cycles/batch

DRAM Access cycles                                  0 cycles/batch

On-chip Flash Access cycles                         0 cycles/batch

Off-chip Flash Access cycles                        0 cycles/batch

Total cycles                                        0 cycles/batch



Batch Inference time                 0.00 ms,     nan inferences/s (batch size 1)



Warning: Could not write the following attributes to RESHAPE 'sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape' ReshapeOptions field: new_shape

这个错误信息明确指出Vela不支持 TensorFlow Lite 对特定操作的支持问题。具体来说，这个警告说明了：量化参数缺失 ，错误信息指出，涉及的输入、输出和权重张量必须具有量化参数，但在这个操作中，某些张量（如 input_3 和 sequential_2/resizing_2/resize/ResizeBilinear）缺失了这些量化参数。由于不支持，相关的操作将被放置在 CPU 上执行，而不是利用可能存在的更高效的硬件加速（NPU）。

我们使用netron.app可以查看一下模型文件。

FireShot Capture 134 - simple_audio_model_numpy.tflite - netron.app.png

从中可以看到input_3是float32类型的。

而查看被vela支持的模型，可以看到其输入参数已经被量化，是int8类型的。
FireShot Capture 135 - kws_ref_model.tflite - netron.app.png

如果我们想利用i.MX 93的NPU能力就需要先对模型文件进行量化。当然如果觉得i.MX 93的CPU推理能力已经够用了，此步骤也可以省略。

root@atk-imx93:~/shell/simple# vela simple_audio_model_numpy.tflite

Warning: Unsupported TensorFlow Lite semantics for RESIZE_BILINEAR 'sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: input_3, sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear

Warning: Unsupported TensorFlow Lite semantics for CONV_2D 'sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/resizing_2/resize/ResizeBilinear;StatefulPartitionedCall/sequential_2/resizing_2/resize/ResizeBilinear, sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp1_reshape, sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2

Warning: Unsupported TensorFlow Lite semantics for CONV_2D 'sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/conv2d_4/Relu;StatefulPartitionedCall/sequential_2/conv2d_4/Relu;sequential_2/conv2d_4/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd;sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_4/BiasAdd/ReadVariableOp2, sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D_reshape, sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp

Warning: Unsupported TensorFlow Lite semantics for MAX_POOL_2D 'sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/conv2d_5/Relu;StatefulPartitionedCall/sequential_2/conv2d_5/Relu;sequential_2/conv2d_5/BiasAdd;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd;sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/Conv2D;StatefulPartitionedCall/sequential_2/conv2d_5/BiasAdd/ReadVariableOp, sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool

Warning: Unsupported TensorFlow Lite semantics for RESHAPE 'sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/max_pooling2d_2/MaxPool;StatefulPartitionedCall/sequential_2/max_pooling2d_2/MaxPool, sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape

Warning: Unsupported TensorFlow Lite semantics for FULLY_CONNECTED 'sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape, sequential_2/dense_4/MatMul;StatefulPartitionedCall/sequential_2/dense_4/MatMul_reshape, sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd

Warning: Unsupported TensorFlow Lite semantics for FULLY_CONNECTED 'Identity'. Placing on CPU instead

 - Input(s), Output and Weight tensors must have quantization parameters

   Op has tensors with missing quantization parameters: sequential_2/dense_4/Relu;StatefulPartitionedCall/sequential_2/dense_4/Relu;sequential_2/dense_4/BiasAdd;StatefulPartitionedCall/sequential_2/dense_4/BiasAdd, sequential_2/dense_5/MatMul;StatefulPartitionedCall/sequential_2/dense_5/MatMul_reshape, Identity



Network summary for simple_audio_model_numpy

Accelerator configuration               Ethos_U65_256

System configuration                 internal-default

Memory mode                          internal-default

Accelerator clock                                1000 MHz





CPU operators = 7 (100.0%)

NPU operators = 0 (0.0%)



Neural network macs                                 0 MACs/batch

Network Tops/s                                    nan Tops/s



NPU cycles                                          0 cycles/batch

SRAM Access cycles                                  0 cycles/batch

DRAM Access cycles                                  0 cycles/batch

On-chip Flash Access cycles                         0 cycles/batch

Off-chip Flash Access cycles                        0 cycles/batch

Total cycles                                        0 cycles/batch



Batch Inference time                 0.00 ms,     nan inferences/s (batch size 1)



Warning: Could not write the following attributes to RESHAPE 'sequential_2/flatten_2/Reshape;StatefulPartitionedCall/sequential_2/flatten_2/Reshape' ReshapeOptions field: new_shape

FireShot Capture 134 - simple_audio_model_numpy.tflite - netron.app.png

从中可以看到input_3是float32类型的。

而查看被vela支持的模型，可以看到其输入参数已经被量化，是int8类型的。
FireShot Capture 135 - kws_ref_model.tflite - netron.app.png

如果我们想利用i.MX 93的NPU能力就需要先对模型文件进行量化。当然如果觉得i.MX 93的CPU推理能力已经够用了，此步骤也可以省略。

zealsoft

2024-7-8 19:50:50

接下来我就想把录音和关键词识别整合在一个程序里面。

Python中进行语音操作首先想到的是PyAudio。不过在板子上安装PyAudio遇到一点麻烦。Python仓库里面并没有现成的对应这个板子的软件包，需要在板子上编译生成软件包，而PyAudio又依赖PortAudio，而PortAudio在板子上没有移植，所以PyAudio暂时用不了，这个问题以后再想办法解决。

我采用的临时办法是修改前面提到的测试音频的shell脚本，由它录制1秒的语音，然后调用Python程序进行关键字识别，如果是YES就打开开发板上的LED灯，如果是NO就关闭开发板上的LED灯，开灯或关灯完成之后会播放相应的提示音。为了方便调试，在录音之后会自动播放录音结果，以确定录音是否正确。

完整的程序见压缩包*附件：yes-no-test.zip。核心的脚本代码如下：

#!/bin/bash



INIT_FLAG="/home/root/shell/audio/.initialized_audio_device"

RECORD_FILE="test.wav"

PLAY_FILE="/home/root/shell/audio/short.mp3"

RC_LOCAL_FILE="/etc/rc.local"

DELETE_COMMAND="rm -f $INIT_FLAG"



# 将开机自动删除音频初始化标志文件命令加入到开机自启中

add_command() {

    if ! grep -qFx "$DELETE_COMMAND" "$RC_LOCAL_FILE"; then

        echo "$DELETE_COMMAND" >> "$RC_LOCAL_FILE"

        sync

    fi

}

   

# 检查命令是否存在

check_command() {

    command -v "$1" > /dev/null 2>&1

}



# 初始化音频设备

init_audio_device() {

    amixer cset name='PCM Volume' 192

    amixer cset name='Mono Mux' 'Stereo'

……

    amixer cset name='Left PGA Mux' 'DifferentialL'

    amixer cset name='Right PGA Mux' 'DifferentialR'

    # touch "$INIT_FLAG"

}



# 检查是否已初始化

check_initialized() {

    [ -e "$INIT_FLAG" ]

}



function init_board_mic() {

    # 初始化板载麦克风

    check_command amixer && {

        amixer -q cset name='Differential Mux' 'Line 2'

        amixer -q cset name='Left Line Mux' 'Line 2L'

        amixer -q cset name='Right Line Mux' 'Line 2R'

    }

}



function init_headphone_mic() {

    # 初始化耳机麦克风

    check_command amixer && {

        amixer cset name='Differential Mux' 'Line 1'

        amixer cset name='Left Line Mux' 'Line 1L'

        amixer cset name='Right Line Mux' 'NC'

    }

}



function cleanup() {

    # 清理并退出

    printf "\\\\n清理并退出...\\\\n"

    stty sane  # 还原终端状态

    exit 0

}



function switch_mode() {

    # 录音/播音模式切换

    case $1 in

        1)

            # 进入录音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'on'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'on'

                amixer -q cset name='Left Mixer Left Playback Switch' 'off'

                amixer -q cset name='Right Mixer Right Playback Switch' 'off'

            }

            ;;

        2)

            # 进入播音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'off'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'off'

                amixer -q cset name='Left Mixer Left Playback Switch' 'on'

                amixer -q cset name='Right Mixer Right Playback Switch' 'on'

            }

            ;;

        3)

            # 关闭录音和播音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'off'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'off'

                amixer -q cset name='Left Mixer Left Playback Switch' 'off'

                amixer -q cset name='Right Mixer Right Playback Switch' 'off'

                amixer -q cset name='Left Line Mux' 'NC'

                amixer -q cset name='Right Line Mux' 'NC'

            }

            ;;        

    esac

}



function apply_config() {

    # printf "\\\\n可选麦克风测试项目:\\\\n"

    # printf "1. 耳机麦克风\\\\n"

    # printf "2. 板载麦克风\\\\n"



    # while true; do

    #     read -r -p "请输入您的选择: " choice



    #     if [[ "$choice" == "1" || "$choice" == "2" ]]; then

    #         break

    #     else

    #         printf "无效输入。请输入1或2。\\\\n"

    #     fi

    # done

    choice=2

    printf "\\\\n应用麦克风配置项 %s\\\\n" "$choice"

    case $choice in

        1)

            switch_mode 1

            init_headphone_mic

            ;;

        2)

            switch_mode 1

            init_board_mic

            ;;

        *)

            printf "无效选项\\\\n"

            ;;

    esac

}



# 捕获Ctrl+C信号，并调用cleanup函数

trap cleanup SIGINT



# 检查是否已初始化，如果没有，则进行初始化

# if ! check_initialized; then

#     printf "第一次运行，执行音频设备初始化...\\\\n"

    init_audio_device

#     add_command

# fi



while true; do

    while true; do

        command=1

        case $command in

            1)

                apply_config

                printf "\\\\n开始录音...\\\\n"

                #sleep 1

                check_command arecord && arecord -f cd -d 1 -r 16000 "$RECORD_FILE"

                switch_mode 2

                printf "\\\\n播放录音...\\\\n"

                check_command aplay && aplay "$RECORD_FILE"

                        switch_mode 3

                # 调用Python程序并捕获其输出  

                output=$(python3 simple_audio.py --input=test.wav)  

                echo "$output"



                # 检查输出是否包含">>> YES"  

                if echo "$output" | grep -q ">>> YES"; then  

                    echo "Python程序输出YES，执行相应代码..."  

                        echo 1 > /sys/class/leds/sys-led/brightness

                        echo heartbeat > /sys/class/leds/sys-led/trigger

                    # 在这里添加当输出为YES时需要执行的代码  

                    switch_mode 2

                    #gst-play-1.0 haodeyiweinindakai.mp3

                        aplay haodeyiweinindakai.wav

                    switch_mode 3

                    

                elif echo "$output" | grep -q ">>> NO"; then  

                    echo "Python程序输出NO，执行其他代码..."  

                        echo none > /sys/class/leds/sys-led/trigger 

                        echo 0 > /sys/class/leds/sys-led/brightness

                    # 在这里添加当输出为NO时需要执行的代码  

                    switch_mode 2

                    #gst-play-1.0 haodeyiweininguanbi.mp3      

                        aplay haodeyiweininguanbi.wav

                    switch_mode 3

                else  

                    echo "Python程序输出未知结果，或者没有输出结果。"  

                    # 可以选择添加处理未知输出的代码  

                fi

                #break

                ;;

            2)

                switch_mode 2

                printf "\\\\n开始播音，按 Ctrl+C 可退出播音\\\\n"

                gst-play-1.0 --audiosink="alsasink" "$PLAY_FILE"

                        switch_mode 3

                #break

                ;;

            *)

                cleanup

                ;;

        esac

    done



    break

done

从下面的视频看，基本实现了所需要的效果。原来担心板子的麦克风录音效果会影响识别，目前看问题不大。由于采用的是先录音成文件的方法，而且时间仅有1秒，所以使用时还是比较麻烦的，有的时候说话稍微慢了点就没有录制完全。这个需要后续优化一下Python的语音处理。

视频：

另外，目前使用的模型是预训练好的，后面计划会训练中文的提示词，以方便使用。

接下来我就想把录音和关键词识别整合在一个程序里面。

完整的程序见压缩包*附件：yes-no-test.zip。核心的脚本代码如下：

#!/bin/bash



INIT_FLAG="/home/root/shell/audio/.initialized_audio_device"

RECORD_FILE="test.wav"

PLAY_FILE="/home/root/shell/audio/short.mp3"

RC_LOCAL_FILE="/etc/rc.local"

DELETE_COMMAND="rm -f $INIT_FLAG"



# 将开机自动删除音频初始化标志文件命令加入到开机自启中

add_command() {

    if ! grep -qFx "$DELETE_COMMAND" "$RC_LOCAL_FILE"; then

        echo "$DELETE_COMMAND" >> "$RC_LOCAL_FILE"

        sync

    fi

}

   

# 检查命令是否存在

check_command() {

    command -v "$1" > /dev/null 2>&1

}



# 初始化音频设备

init_audio_device() {

    amixer cset name='PCM Volume' 192

    amixer cset name='Mono Mux' 'Stereo'

……

    amixer cset name='Left PGA Mux' 'DifferentialL'

    amixer cset name='Right PGA Mux' 'DifferentialR'

    # touch "$INIT_FLAG"

}



# 检查是否已初始化

check_initialized() {

    [ -e "$INIT_FLAG" ]

}



function init_board_mic() {

    # 初始化板载麦克风

    check_command amixer && {

        amixer -q cset name='Differential Mux' 'Line 2'

        amixer -q cset name='Left Line Mux' 'Line 2L'

        amixer -q cset name='Right Line Mux' 'Line 2R'

    }

}



function init_headphone_mic() {

    # 初始化耳机麦克风

    check_command amixer && {

        amixer cset name='Differential Mux' 'Line 1'

        amixer cset name='Left Line Mux' 'Line 1L'

        amixer cset name='Right Line Mux' 'NC'

    }

}



function cleanup() {

    # 清理并退出

    printf "\\\\n清理并退出...\\\\n"

    stty sane  # 还原终端状态

    exit 0

}



function switch_mode() {

    # 录音/播音模式切换

    case $1 in

        1)

            # 进入录音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'on'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'on'

                amixer -q cset name='Left Mixer Left Playback Switch' 'off'

                amixer -q cset name='Right Mixer Right Playback Switch' 'off'

            }

            ;;

        2)

            # 进入播音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'off'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'off'

                amixer -q cset name='Left Mixer Left Playback Switch' 'on'

                amixer -q cset name='Right Mixer Right Playback Switch' 'on'

            }

            ;;

        3)

            # 关闭录音和播音模式

            check_command amixer && {

                amixer -q cset name='Left Mixer Left Bypass Switch' 'off'

                amixer -q cset name='Right Mixer Right Bypass Switch' 'off'

                amixer -q cset name='Left Mixer Left Playback Switch' 'off'

                amixer -q cset name='Right Mixer Right Playback Switch' 'off'

                amixer -q cset name='Left Line Mux' 'NC'

                amixer -q cset name='Right Line Mux' 'NC'

            }

            ;;        

    esac

}



function apply_config() {

    # printf "\\\\n可选麦克风测试项目:\\\\n"

    # printf "1. 耳机麦克风\\\\n"

    # printf "2. 板载麦克风\\\\n"



    # while true; do

    #     read -r -p "请输入您的选择: " choice



    #     if [[ "$choice" == "1" || "$choice" == "2" ]]; then

    #         break

    #     else

    #         printf "无效输入。请输入1或2。\\\\n"

    #     fi

    # done

    choice=2

    printf "\\\\n应用麦克风配置项 %s\\\\n" "$choice"

    case $choice in

        1)

            switch_mode 1

            init_headphone_mic

            ;;

        2)

            switch_mode 1

            init_board_mic

            ;;

        *)

            printf "无效选项\\\\n"

            ;;

    esac

}



# 捕获Ctrl+C信号，并调用cleanup函数

trap cleanup SIGINT



# 检查是否已初始化，如果没有，则进行初始化

# if ! check_initialized; then

#     printf "第一次运行，执行音频设备初始化...\\\\n"

    init_audio_device

#     add_command

# fi



while true; do

    while true; do

        command=1

        case $command in

            1)

                apply_config

                printf "\\\\n开始录音...\\\\n"

                #sleep 1

                check_command arecord && arecord -f cd -d 1 -r 16000 "$RECORD_FILE"

                switch_mode 2

                printf "\\\\n播放录音...\\\\n"

                check_command aplay && aplay "$RECORD_FILE"

                        switch_mode 3

                # 调用Python程序并捕获其输出  

                output=$(python3 simple_audio.py --input=test.wav)  

                echo "$output"



                # 检查输出是否包含">>> YES"  

                if echo "$output" | grep -q ">>> YES"; then  

                    echo "Python程序输出YES，执行相应代码..."  

                        echo 1 > /sys/class/leds/sys-led/brightness

                        echo heartbeat > /sys/class/leds/sys-led/trigger

                    # 在这里添加当输出为YES时需要执行的代码  

                    switch_mode 2

                    #gst-play-1.0 haodeyiweinindakai.mp3

                        aplay haodeyiweinindakai.wav

                    switch_mode 3

                    

                elif echo "$output" | grep -q ">>> NO"; then  

                    echo "Python程序输出NO，执行其他代码..."  

                        echo none > /sys/class/leds/sys-led/trigger 

                        echo 0 > /sys/class/leds/sys-led/brightness

                    # 在这里添加当输出为NO时需要执行的代码  

                    switch_mode 2

                    #gst-play-1.0 haodeyiweininguanbi.mp3      

                        aplay haodeyiweininguanbi.wav

                    switch_mode 3

                else  

                    echo "Python程序输出未知结果，或者没有输出结果。"  

                    # 可以选择添加处理未知输出的代码  

                fi

                #break

                ;;

            2)

                switch_mode 2

                printf "\\\\n开始播音，按 Ctrl+C 可退出播音\\\\n"

                gst-play-1.0 --audiosink="alsasink" "$PLAY_FILE"

                        switch_mode 3

                #break

                ;;

            *)

                cleanup

                ;;

        esac

    done



    break

done

1 举报

更多回帖

zealsoft

【正点原子i.MX93开发板试用连载体验】基于深度学习的语音本地控制

项目计划

开箱体验

音频播放测试

录音测试

回帖（6）

zealsoft

zealsoft

zealsoft

zealsoft

相关帖子

【正点原子i.MX93开发板试用连载体验】+烧录与调试

【正点原子i.MX93开发板试用连载体验】01 - 开箱报告

【正点原子i.MX93开发板试用连载体验】+准备工作

【正点原子i.MX93开发板试用连载体验】+开箱与硬件介绍

正点原子i.MX93开发板

【正点原子i.MX93开发板试用连载体验】第二篇：细节评测

【正点原子i.MX93开发板试用连载体验】第一篇：开箱

【正点原子i.MX93开发板试用连载体验】02 - 异核通讯测试

【新品体验】正点原子i.MX93开发板免费试用

免费！NXP i.MX 93开发板有奖试用

20万+工程师都在用，免费PCB检查工具