项目介绍:Faster Whisper 是一个基于 CTranslate2 的 OpenAI Whisper 模型的高效实现。它是一个快速推理引擎,用于 Transformer 模型,相比 OpenAI 的 Whisper 模型,速度提升了 4 倍。该项目支持 Windows、Linux 和 macOS 平台,并且提供了多种优化选项,如 FP16 和 INT8 计算类型,以适应不同的硬件环境。
硬件平台:QCS6490
一、环境准备
打开终端,在命令行界面中输入如下命令来安装Faster Whisper
sudo apt update && sudo apt install -y python3-pip ffmpeg
#因为这里使用CPU进行推理,安装 CPU 优化的 CTranslate2 和 Faster-Whisper
pip install faster-whisper
pip install ctranslate2 --no-deps # 确保不安装 GPU 相关依赖
pip install faster-whisper
二、准备推理脚本
编写或拉取一个脚本,可命名为 test.py
from faster_whisper import WhisperModel
import sys
import time
def main():
# 获取音频文件名
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = input("请输入要转录的音频文件名:")
# 选择模型大小,例如 "base", "small", "medium", "large-v3"
model_size = "small"
# 加载模型并统计加载时间
load_start = time.perf_counter()
model = WhisperModel(
model_size,
device="cpu",
compute_type="int8"
)
load_duration = time.perf_counter() - load_start
print(f"模型加载耗时: {load_duration:.2f}秒")
# 开始转录计时
transcribe_start = time.perf_counter()
# 自动检测语言转录
segments, info = model.transcribe(filename, beam_size=5)
# 立即处理所有分段以确保准确计时
segments = list(segments)
# 结束计时
transcribe_duration = time.perf_counter() - transcribe_start
# 输出结果
print(f"\n检测到的语言: {info.language} (置信度: {info.language_probability:.2f})")
print(f"音频时长: {info.duration:.2f}秒")
print(f"转录处理耗时: {transcribe_duration:.2f}秒")
print(f"总耗时(含加载): {load_duration + transcribe_duration:.2f}秒\n")
# 输出逐句转录结果
for segment in segments:
print(f"[{segment.start:6.2f}s -> {segment.end:6.2f}s] {segment.text.strip()}")
if __name__ == "__main__":
main()
三、运行测试
可在浏览器上任意下载一个音频文件
将音频放入测试脚本的同级目录进行测试,脚本可自动检测语言
python3 test.py youshengshu.wma
直接使用sherpa-onnx, 在手机aidlux运行语音识,强过Faster Whisper 多多倍。
应该是这里只使用了cpu进行推理,gpu依赖对系统环境有特定要求,避免不必要的环境冲突吧
aidlux@localhost:~$ uname -a
Linux localhost 5.4.0-aidlite #2 SMP PREEMPT Tue Apr 13 12:47:09 KST 2021 aarch64 aarch64 aarch64 GNU/Linux
aidlux@localhost:~$ python3 -V
Python 3.8.10
aidlux@localhost:~$ pip install ctranslate2 --no-deps
Collecting ctranslate2
Using cached ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (10 kB)
Downloading ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.2/17.2 MB 6.0 MB/s eta 0:00:00
Installing collected packages: ctranslate2
Successfully installed ctranslate2-4.5.0
aidlux@localhost:~$ pip install faster-whisper
Collecting faster-whisper
Using cached faster_whisper-1.1.0-py3-none-any.whl.metadata (16 kB)
Requirement already satisfied: ctranslate2<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from faster-whisper) (4.5.0)
Collecting huggingface-hub>=0.13 (from faster-whisper)
Using cached huggingface_hub-0.34.4-py3-none-any.whl.metadata (14 kB)
Collecting tokenizers<1,>=0.13 (from faster-whisper)
Using cached tokenizers-0.21.0.tar.gz (343 kB)
Installing build dependencies … done
Getting requirements to build wheel … done
Installing backend dependencies … error
error: subprocess-exited-with-error
× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> [2 lines of output]
ERROR: Could not find a version that satisfies the requirement puccinialin (from versions: none)
ERROR: No matching distribution found for puccinialin
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
可能是pip 缓存或依赖索引污染,导致 tokenizers 安装时错误地将 puccinialin 识别为依赖,可以试试先清除可能污染的缓存:pip cache purge
然后重新去安装一下,最后调用的时候记得要科学上网