5.语音板使用科大讯飞语音唤醒

0x00 科大讯飞唤醒引擎介绍

语音唤醒是设备(手机、玩具、家电等)在休眠或锁屏状态下也能检测到用户的声音(设定的语音指令,即唤醒词),让处于休眠状态下的设备直接进入到等待指令状态,开启语音交互第一步。科大讯飞的唤醒引擎有什么优势呢?可以应用在哪些场景中呢?以下是从科大讯飞官网上获取到的信息,供大家参考:

科大讯飞唤醒引擎优势及应用场景

0x01 下载语音唤醒SDK

我们在使用讯飞的唤醒引擎服务之前,需要前往讯飞开放平台注册一个账户。然后登录账户后就可以前往自己的控制台,创建一个应用。创建这个应用以后,我们就可以得到一个调用科大讯飞各项语音服务的AppID。这个AppID是很重要的,它会跟我们后面下载的各项语音服务是打包在一起使用的。也就是说你下载的语音SDK必须要搭配着你的AppID来使用,否则可能调用就会出错。那注册科大讯飞开放平台账户是比较简单的,这里不再做介绍,大家可以直接去下面的讯飞开放平台上注册即可:

https://www.xfyun.cn/

那现在注册账户的话,还是会有各种免费的资源服务可以使用,官网页面的可以免费使用的服务如下:

讯飞开放平台官网
个人账户免费套餐

那注册账户后,我们就可以登录自己的控制台。这里要做的第一件事就是“创建新应用”,然后就可以准备使用“语音唤醒”服务了:

创建应用

当点击“语音唤醒”进入服务后,这里我们就可以设置自己唤醒词和下载对应的语音唤醒SDK了,如下图所示:

设置唤醒词

这里我们在提交唤醒词前,需要先使用“唤醒词评估小工具”来评估下你想设置的唤醒词效果。当评估效果好以后,我们再提交。如下图所示:

各唤醒词唤醒效果评估

可以看出来这里的“阿里巴巴”唤醒词效果就不好,只有四颗星,我们最好找五颗星的唤醒词。这里特别需要注意唤醒词设置的规则:

1.音节覆盖尽量多,长度最少为四个音节,相邻音节要规避,字要发音清晰响度大;

2.尽可能选择日常不容易出现的短语,可以有效降低误唤醒率。例如:“凯越在线”就是一个高质量唤醒词,它的音节覆盖多,差异大,而且平时较为不常说。质量较差的唤醒词:“语音在线”,前两个音节相近,不是一个质量高的唤醒词。

3.英文唤醒词仅支持有限的词库,不可超出词典范围,点击下载英文唤醒词典

当我们设置好自己的唤醒词后,接下来就可以进入下载SDK的页面了,这里需要注意选择Linux平台,如下图所示:

语音唤醒SDK下载

0x02 测试语音唤醒效果

当我们下载好自己设置的唤醒词SDK后,我们就可以将该SDK发送到树莓派板上。然后就可以解压,查看源码了。我们可以在本地电脑上使用如下命令,将下载的SDK发送到树莓派上:

scp Linux_awaken1226_5d5b9efd.zip corvin@192.168.*.*:~/

这里需要注意的是后面的IP地址,大家需要根据自己树莓派的IP地址来修改这条命令就可以了。执行完这条命令就可以将文件传输到树莓派的home目录下,注意我这里的树莓派系统用户名是corvin。这里需要注意的就是解压zip文件的命令,完整解压命令如下:

unzip -q Linux_awaken1226_5d5b9efd.zip -d xf_awaken/

查看语音唤醒源码组成

接下来最重要的一件事就是来替换默认代码中提供的唤醒库了,因为默认提供的都是x86版本的,都是在我们的台式机这样电脑上使用的。因此这里需要使用树莓派版本的库才能正常的在树莓派上进行编译和使用语音唤醒,如果你已经向科大讯飞申请了树莓派版本的唤醒库,那这里就可以替换了。如果你没有的话,也可以下载我申请好的树莓派库,但是由于唤醒库和AppID绑定的。如果大家下载使用的话,会占用我的语音唤醒装机量的。所以大家要想下载我的版本树莓派库的话,是需要收费的,希望可以理解:

隐藏内容需要支付:¥30
立即购买 升级VIP

这里的语音库只要下载好以后,以后都是永久使用的。当下载好该动态库后,我们就可以将其放到源码libs目录中备用了,具体操作如下:

放置树莓派版本动态库

当将动态库放置好以后,我们就可以来准备编译代码了,不过这里需要修改下才能开始编译。这里首先我们需要修改下32bit_make.sh这个脚本,修改也很简单就是将动态链接库的地址改成我们刚才修改的就行了,这里就是将export LD_LIBRARY_PATH=$(pwd)/../../libs/x86/最后的x86删除即可,最终的32bit_make.sh代码如下:

#编译32位可执行文件
make clean;make
#设置libmsc.so库搜索路径
export LD_LIBRARY_PATH=$(pwd)/../../libs/

接下来就是修改Makefile文件了,这里也是将链接的动态库地址改一下就可以了,主要就是修改LDFLAGS := -L$(DIR_LIB)/x86。将最后的x86删除就可以了。最终Makefile代码如下:

#common makefile header

DIR_INC = ../../include
DIR_BIN = ../../bin
DIR_LIB = ../../libs

TARGET    = awaken_offline_sample
BIN_TARGET = $(DIR_BIN)/$(TARGET)

CROSS_COMPILE = 
CFLAGS = -g -Wall -I$(DIR_INC)

ifdef LINUX64
LDFLAGS := -L$(DIR_LIB)/x64
else
LDFLAGS := -L$(DIR_LIB)/
endif
LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++

OBJECTS := $(patsubst %.c,%.o,$(wildcard *.c))

$(BIN_TARGET) : $(OBJECTS)
    $(CROSS_COMPILE)gcc $(CFLAGS) $^ -o $@ $(LDFLAGS)

%.o : %.c
    $(CROSS_COMPILE)gcc -c $(CFLAGS) $< -o $@
clean:
    @rm -f *.o $(BIN_TARGET)

.PHONY:clean

#common makefile foot

修改完这些代码,我们就可以来准备编译了。但是这里需要注意的是,科大讯飞提供的测试语音唤醒的demo程序比较低级。需要我们将录制好的语音文件放到指定的audio目录下,然后启动语音唤醒检测程序,这是程序会检测语音文件中是否包含有唤醒词。有的话就会有日志提示,整个操作过程如下所示,此时我们还没有录制语音文件:

编译后开始运行语音唤醒

这里我们可以从源码中找到我们需要录制的文件格式和文件名,这里打开awaken_offline_sample.c文件可以找到,如下图所示:

语音文件路径和名称

这里我们可以在audio目录中,使用如下命令来录制pcm格式的语料,然后我们就可以来运行程序来检测是否包含有唤醒词了,录制语料的命令如下:

arecord -d 3 -r 16000 -c 1 -t wav -f S16_LE awake.pcm

这里需要注意的是arecord命令后面的-d参数是表示录制3秒钟,所以我们执行完这条命令后需要立刻说出需要检测的唤醒词,到达3秒后,录音就自动结束了。

录制唤醒测试语料文件

当录制好测试语料后,我们就可以来运行唤醒测试程序了。当出现下图所示的日志,就说明唤醒程序已经检测到唤醒词了:

检测到唤醒词

这里需要注意的是唤醒结果中提示的各字段值,各字段值的意义如下图所示:

唤醒结构各字段参数解释

那什么样的日志打印是没有检测到唤醒词呢?如下图所示:

没有从语料中检测到唤醒词

那下面我通过视频给大家演示下整个过程,从录制唤醒语料开始,然后运行唤醒检测程序,这样大家会更清楚这个操作过程,视频如下:

测试语音唤醒

0x03 修改唤醒程序

通过上面的测试唤醒过程,大家就可以知道这个demo程序有点不够完善。它无法实现实时的检测唤醒词功能,每次都是要录制好测试语料,然后再运行唤醒检测程序。这样就很不方便了,那我们这里就来修改一下,使该demo程序可以像snowboy测试程序那样,可以实时的检测唤醒词。首先我们来看下原始的唤醒程序的源码:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>

#include "../../include/msp_cmn.h"
#include "../../include/qivw.h"
#include "../../include/msp_errors.h"

#define IVW_AUDIO_FILE_NAME "audio/awake.pcm"
#define FRAME_LEN    640 //16k采样率的16bit音频,一帧的大小为640B, 时长20ms


void sleep_ms(int ms)
{
    usleep(ms * 1000);
}

int cb_ivw_msg_proc( const char *sessionID, int msg, int param1, int param2, const void *info, void *userData )
{
    if (MSP_IVW_MSG_ERROR == msg) //唤醒出错消息
    {
        printf("\n\nMSP_IVW_MSG_ERROR errCode = %d\n\n", param1);
    }
    else if (MSP_IVW_MSG_WAKEUP == msg) //唤醒成功消息
    {
        printf("\n\nMSP_IVW_MSG_WAKEUP result = %s\n\n", info);
    }
    return 0;
}

void run_ivw(const char *grammar_list, const char* audio_filename ,  const char* session_begin_params)
{
    const char *session_id = NULL;
    int err_code = MSP_SUCCESS;
    FILE *f_aud = NULL;
    long audio_size = 0;
    long real_read = 0;
    long audio_count = 0;
    int count = 0;
    int audio_stat = MSP_AUDIO_SAMPLE_CONTINUE;
    char *audio_buffer=NULL;
    char sse_hints[128];
    if (NULL == audio_filename)
    {
        printf("params error\n");
        return;
    }

    f_aud=fopen(audio_filename, "rb");
    if (NULL == f_aud)
    {
        printf("audio file open failed! \n");
        return;
    }
    fseek(f_aud, 0, SEEK_END);
    audio_size = ftell(f_aud);
    fseek(f_aud, 0, SEEK_SET);
    audio_buffer = (char *)malloc(audio_size);
    if (NULL == audio_buffer)
    {
        printf("malloc failed! \n");
        goto exit;
    }
    real_read = fread((void *)audio_buffer, 1, audio_size, f_aud);
    if (real_read != audio_size)
    {
        printf("read audio file failed!\n");
        goto exit;
    }

    session_id=QIVWSessionBegin(grammar_list, session_begin_params, &err_code);
    if (err_code != MSP_SUCCESS)
    {
        printf("QIVWSessionBegin failed! error code:%d\n",err_code);
        goto exit;
    }

    err_code = QIVWRegisterNotify(session_id, cb_ivw_msg_proc,NULL);
    if (err_code != MSP_SUCCESS)
    {
        snprintf(sse_hints, sizeof(sse_hints), "QIVWRegisterNotify errorCode=%d", err_code);
        printf("QIVWRegisterNotify failed! error code:%d\n",err_code);
        goto exit;
    }
    while(1)
    {
        long len = 10*FRAME_LEN; //16k音频,10帧 (时长200ms)
        audio_stat = MSP_AUDIO_SAMPLE_CONTINUE;
        if(audio_size <= len)
        {
            len = audio_size;
            audio_stat = MSP_AUDIO_SAMPLE_LAST; //最后一块
        }
        if (0 == audio_count)
        {
            audio_stat = MSP_AUDIO_SAMPLE_FIRST;
        }

        printf("csid=%s,count=%d,aus=%d\n",session_id, count++, audio_stat);
        err_code = QIVWAudioWrite(session_id, (const void *)&audio_buffer[audio_count], len, audio_stat);
        if (MSP_SUCCESS != err_code)
        {
            printf("QIVWAudioWrite failed! error code:%d\n",err_code);
            snprintf(sse_hints, sizeof(sse_hints), "QIVWAudioWrite errorCode=%d", err_code);
            goto exit;
        }
        if (MSP_AUDIO_SAMPLE_LAST == audio_stat)
        {
            break;
        }
        audio_count += len;
        audio_size -= len;

        sleep_ms(200); //模拟人说话时间间隙,10帧的音频时长为200ms
    }
    snprintf(sse_hints, sizeof(sse_hints), "success");

exit:
    if (NULL != session_id)
    {
        QIVWSessionEnd(session_id, sse_hints);
    }
    if (NULL != f_aud)
    {
        fclose(f_aud);
    }
    if (NULL != audio_buffer)
    {
        free(audio_buffer);
    }
}


int main(int argc, char* argv[])
{
    int         ret       = MSP_SUCCESS;
    const char *lgi_param = "appid = 5d5b9efd,work_dir = .";
    const char *ssb_param = "ivw_threshold=0:1450,sst=wakeup,ivw_res_path =fo|res/ivw/wakeupresource.jet";

    ret = MSPLogin(NULL, NULL, lgi_param);
    if (MSP_SUCCESS != ret)
    {
        printf("MSPLogin failed, error code: %d.\n", ret);
        goto exit ;//登录失败,退出登录
    }
    printf("\n###############################################################################################################\n");
    printf("## 请注意,唤醒语音需要根据唤醒词内容自行录制并重命名为宏IVW_AUDIO_FILE_NAME所指定名称,存放在bin/audio文件里##\n");
    printf("###############################################################################################################\n\n");
    run_ivw(NULL, IVW_AUDIO_FILE_NAME, ssb_param); 

    sleep_ms(2000);
exit:
    printf("按任意键退出 ...\n");
    getchar();
    MSPLogout(); //退出登录
    return 0;
}

上述语音唤醒主要API调用流程如下图所示,我们就可以比较清楚的了解整个唤醒流程是什么样的了:

唤醒检测API调用流程图

从流程图可以看出,要想实现不间断的检测唤醒词流程。那就需要不断的调用QIVWAudioWrite(),即将录音语料不断的写入进行检测。那我们这里使用的是linuxrec.c录音代码,然后再稍加修改这里的唤醒检测代码就可以不间断的检测唤醒词了。这里的linuxrec.c录音代码如下:

/*
@file
@brief  record demo for linux
@author        taozhang9
@date        2016/05/27
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <alsa/asoundlib.h>
#include <signal.h>
#include <sys/stat.h>
#include <pthread.h>
#include "../../include/formats.h"
#include "../../include/linuxrec.h"
#define DBG_ON 1
#if DBG_ON
#define dbg  printf
#else
#define dbg
#endif
/* Do not change the sequence */
enum {
    RECORD_STATE_CREATED,   /* Init     */
    RECORD_STATE_CLOSING,
    RECORD_STATE_READY,     /* Opened   */
    RECORD_STATE_STOPPING,  /* During Stop  */
    RECORD_STATE_RECORDING, /* Started  */
};
#define SAMPLE_RATE  16000
#define SAMPLE_BIT_SIZE 16
#define FRAME_CNT   10
//#define BUF_COUNT   1
#define DEF_BUFF_TIME  500000
#define DEF_PERIOD_TIME 100000
static int show_xrun = 1;
static int start_record_internal(snd_pcm_t *pcm)
{
    return snd_pcm_start(pcm);
}
static int stop_record_internal(snd_pcm_t *pcm)
{
    return snd_pcm_drop(pcm);
}
static int is_stopped_internal(struct recorder *rec)
{
    snd_pcm_state_t state;
    state =  snd_pcm_state((snd_pcm_t *)rec->wavein_hdl);
    switch (state) {
    case SND_PCM_STATE_RUNNING:
    case SND_PCM_STATE_DRAINING:
        return 0;
    default: break;
    }
    return 1;
}
static int format_ms_to_alsa(const WAVEFORMATEX * wavfmt,
                        snd_pcm_format_t * format)
{
    snd_pcm_format_t tmp;
    tmp = snd_pcm_build_linear_format(wavfmt->wBitsPerSample,
            wavfmt->wBitsPerSample, wavfmt->wBitsPerSample == 8 ? 1 : 0, 0);
    if ( tmp == SND_PCM_FORMAT_UNKNOWN )
        return -EINVAL;
    *format = tmp;
    return 0;
}
/* set hardware and software params */
static int set_hwparams(struct recorder * rec,  const WAVEFORMATEX *wavfmt,
            unsigned int buffertime, unsigned int periodtime)
{
    snd_pcm_hw_params_t *params;
    int err;
    unsigned int rate;
    snd_pcm_format_t format;
    snd_pcm_uframes_t size;
    snd_pcm_t *handle = (snd_pcm_t *)rec->wavein_hdl;
    rec->buffer_time = buffertime;
    rec->period_time = periodtime;
    snd_pcm_hw_params_alloca(&params);
    err = snd_pcm_hw_params_any(handle, params);
    if (err < 0) {
        dbg("Broken configuration for this PCM");
        return err;
    }
    err = snd_pcm_hw_params_set_access(handle, params,
                       SND_PCM_ACCESS_RW_INTERLEAVED);
    if (err < 0) {
        dbg("Access type not available");
        return err;
    }
    err = format_ms_to_alsa(wavfmt, &format);
    if (err) {
        dbg("Invalid format");
        return - EINVAL;
    }
    err = snd_pcm_hw_params_set_format(handle, params, format);
    if (err < 0) {
        dbg("Sample format non available");
        return err;
    }
    err = snd_pcm_hw_params_set_channels(handle, params, wavfmt->nChannels);
    if (err < 0) {
        dbg("Channels count non available");
        return err;
    }
    rate = wavfmt->nSamplesPerSec;
    err = snd_pcm_hw_params_set_rate_near(handle, params, &rate, 0);
    if (err < 0) {
        dbg("Set rate failed");
        return err;
    }
    if(rate != wavfmt->nSamplesPerSec) {
        dbg("Rate mismatch");
        return -EINVAL;
    }
    if (rec->buffer_time == 0 || rec->period_time == 0) {
        err = snd_pcm_hw_params_get_buffer_time_max(params,
                            &rec->buffer_time, 0);
        assert(err >= 0);
        if (rec->buffer_time > 500000)
            rec->buffer_time = 500000;
        rec->period_time = rec->buffer_time / 4;
    }
    err = snd_pcm_hw_params_set_period_time_near(handle, params,
                         &rec->period_time, 0);
    if (err < 0) {
        dbg("set period time fail");
        return err;
    }
    err = snd_pcm_hw_params_set_buffer_time_near(handle, params,
                         &rec->buffer_time, 0);
    if (err < 0) {
        dbg("set buffer time failed");
        return err;
    }
    err = snd_pcm_hw_params_get_period_size(params, &size, 0);
    if (err < 0) {
        dbg("get period size fail");
        return err;
    }
    rec->period_frames = size;
    err = snd_pcm_hw_params_get_buffer_size(params, &size);
    if (size == rec->period_frames) {
        dbg("Can't use period equal to buffer size (%lu == %lu)",
                      size, rec->period_frames);
        return -EINVAL;
    }
    rec->buffer_frames = size;
    rec->bits_per_frame = wavfmt->wBitsPerSample;
    /* set to driver */
    err = snd_pcm_hw_params(handle, params);
    if (err < 0) {
        dbg("Unable to install hw params:");
        return err;
    }
    return 0;
}
static int set_swparams(struct recorder * rec)
{
    int err;
    snd_pcm_sw_params_t *swparams;
    snd_pcm_t * handle = (snd_pcm_t*)(rec->wavein_hdl);
    /* sw para */
    snd_pcm_sw_params_alloca(&swparams);
    err = snd_pcm_sw_params_current(handle, swparams);
    if (err < 0) {
        dbg("get current sw para fail");
        return err;
    }
    err = snd_pcm_sw_params_set_avail_min(handle, swparams,
                        rec->period_frames);
    if (err < 0) {
        dbg("set avail min failed");
        return err;
    }
    /* set a value bigger than the buffer frames to prevent the auto start.
     * we use the snd_pcm_start to explicit start the pcm */
    err = snd_pcm_sw_params_set_start_threshold(handle, swparams,
            rec->buffer_frames * 2);
    if (err < 0) {
        dbg("set start threshold fail");
        return err;
    }
    if ( (err = snd_pcm_sw_params(handle, swparams)) < 0) {
        dbg("unable to install sw params:");
        return err;
    }
    return 0;
}
static int set_params(struct recorder *rec, WAVEFORMATEX *fmt,
        unsigned int buffertime, unsigned int periodtime)
{
    int err;
    WAVEFORMATEX defmt =
{WAVE_FORMAT_PCM,    1, 16000, 32000, 2, 16, sizeof(WAVEFORMATEX)};
    if (fmt == NULL) {
        fmt = &defmt;
    }
    err = set_hwparams(rec, fmt, buffertime, periodtime);
    if (err)
        return err;
    err = set_swparams(rec);
    if (err)
        return err;
    return 0;
}
/*
 *   Underrun and suspend recovery
 */
static int xrun_recovery(snd_pcm_t *handle, int err)
{
    if (err == -EPIPE) {    /* over-run */
        if (show_xrun)
            printf("!!!!!!overrun happend!!!!!!");
        err = snd_pcm_prepare(handle);
        if (err < 0) {
            if (show_xrun)
                printf("Can't recovery from overrun,"
                "prepare failed: %s\n", snd_strerror(err));
            return err;
        }
        return 0;
    } else if (err == -ESTRPIPE) {
        while ((err = snd_pcm_resume(handle)) == -EAGAIN)
            usleep(200000); /* wait until the suspend flag is released */
        if (err < 0) {
            err = snd_pcm_prepare(handle);
            if (err < 0) {
                if (show_xrun)
                    printf("Can't recovery from suspend,"
                    "prepare failed: %s\n", snd_strerror(err));
                return err;
            }
        }
        return 0;
    }
    return err;
}
static ssize_t pcm_read(struct recorder *rec, size_t rcount)
{
    ssize_t r;
    size_t count = rcount;
    char *data;
    snd_pcm_t *handle = (snd_pcm_t *)rec->wavein_hdl;
    if(!handle)
        return -EINVAL;
    data = rec->audiobuf;
    while (count > 0) {
        r = snd_pcm_readi(handle, data, count);
        if (r == -EAGAIN || (r >= 0 && (size_t)r < count)) {
            snd_pcm_wait(handle, 100);
        } else if (r < 0) {
            if(xrun_recovery(handle, r) < 0) {
                return -1;
            }
        }
        if (r > 0) {
            count -= r;
            data += r * rec->bits_per_frame / 8;
        }
    }
    return rcount;
}
static void * record_thread_proc(void * para)
{
    struct recorder * rec = (struct recorder *) para;
    size_t frames, bytes;
    sigset_t mask, oldmask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigaddset(&mask, SIGTERM);
    pthread_sigmask(SIG_BLOCK, &mask, &oldmask);
    while(1) {
        frames = rec->period_frames;
        bytes = frames * rec->bits_per_frame / 8;
        /* closing, exit the thread */
        if (rec->state == RECORD_STATE_CLOSING)
            break;
        if(rec->state < RECORD_STATE_RECORDING)
            usleep(100000);
        if (pcm_read(rec, frames) != frames) {
            return NULL;
        }
        if (rec->on_data_ind)
            rec->on_data_ind(rec->audiobuf, bytes,
                    rec->user_cb_para);
    }
    return rec;
}
static int create_record_thread(void * para, pthread_t * tidp)
{
    int err;
    err = pthread_create(tidp, NULL, record_thread_proc, (void *)para);
    if (err != 0)
        return err;
    return 0;
}
static void free_rec_buffer(struct recorder * rec)
{
    if (rec->audiobuf) {
        free(rec->audiobuf);
        rec->audiobuf = NULL;
    }
}
static int prepare_rec_buffer(struct recorder * rec )
{
    /* the read and QISRWrite is blocked, currently only support one buffer,
     * if overrun too much, need more buffer and another new thread
     * to write the audio to network */
    size_t sz = (rec->period_frames * rec->bits_per_frame / 8);
    rec->audiobuf = (char *)malloc(sz);
    if(!rec->audiobuf)
        return -ENOMEM;
    return 0;
}
#endif
static int open_recorder_internal(struct recorder * rec,
        record_dev_id dev, WAVEFORMATEX * fmt)
{
    int err = 0;
    err = snd_pcm_open((snd_pcm_t **)&rec->wavein_hdl, dev.u.name,
            SND_PCM_STREAM_CAPTURE, 0);
    if(err < 0)
        goto fail;
    err = set_params(rec, fmt, DEF_BUFF_TIME, DEF_PERIOD_TIME);
    if(err)
        goto fail;
    assert(rec->bufheader == NULL);
    err = prepare_rec_buffer(rec);
    if(err)
        goto fail;
    err = create_record_thread((void*)rec,
            &rec->rec_thread);
    if(err)
        goto fail;
    return 0;
fail:
    if(rec->wavein_hdl)
        snd_pcm_close((snd_pcm_t *) rec->wavein_hdl);
    rec->wavein_hdl = NULL;
    free_rec_buffer(rec);
    return err;
}
static void close_recorder_internal(struct recorder *rec)
{
    snd_pcm_t * handle;
    handle = (snd_pcm_t *) rec->wavein_hdl;
    /* may be the thread is blocked at read, cancel it */
    pthread_cancel(rec->rec_thread);
    /* wait for the pcm thread quit first */
    pthread_join(rec->rec_thread, NULL);
    if(handle) {
        snd_pcm_close(handle);
        rec->wavein_hdl = NULL;
    }
    free_rec_buffer(rec);
}
/* return the count of pcm device */
/* list all cards */
static int get_pcm_device_cnt(snd_pcm_stream_t stream)
{
    void **hints, **n;
    char *io, *filter, *name;
    int cnt = 0;
    if (snd_device_name_hint(-1, "pcm", &hints) < 0)
        return 0;
    n = hints;
    filter = stream == SND_PCM_STREAM_CAPTURE ? "Input" : "Output";
    while (*n != NULL) {
        io = snd_device_name_get_hint(*n, "IOID");
        name = snd_device_name_get_hint(*n, "NAME");
        if (name && (io == NULL || strcmp(io, filter) == 0))
            cnt ++;
        if (io != NULL)
            free(io);
        if (name != NULL)
            free(name);
        n++;
    }
    snd_device_name_free_hint(hints);
    return cnt;
}
/* -------------------------------------
 * Interfaces
 --------------------------------------*/
/* the device id is a pcm string name in linux */
record_dev_id  get_default_input_dev()
{
    record_dev_id id;
    id.u.name = "default";
    return id;
}
record_dev_id * list_input_device()
{
    // TODO: unimplemented
    return NULL;
}
int get_input_dev_num()
{
    return get_pcm_device_cnt(SND_PCM_STREAM_CAPTURE);
}
/* callback will be run on a new thread */
int create_recorder(struct recorder ** out_rec,
                void (*on_data_ind)(char *data, unsigned long len, void *user_cb_para),
                void* user_cb_para)
{
    struct recorder * myrec;
    myrec = (struct recorder *)malloc(sizeof(struct recorder));
    if(!myrec)
        return -RECORD_ERR_MEMFAIL;
    memset(myrec, 0, sizeof(struct recorder));
    myrec->on_data_ind = on_data_ind;
    myrec->user_cb_para = user_cb_para;
    myrec->state = RECORD_STATE_CREATED;
    *out_rec = myrec;
    return 0;
}
void destroy_recorder(struct recorder *rec)
{
    if(!rec)
        return;
    free(rec);
}
int open_recorder(struct recorder * rec, record_dev_id dev, WAVEFORMATEX * fmt)
{
    int ret = 0;
    if(!rec )
        return -RECORD_ERR_INVAL;
    if(rec->state >= RECORD_STATE_READY)
        return 0;
    ret = open_recorder_internal(rec, dev, fmt);
    if(ret == 0)
        rec->state = RECORD_STATE_READY;
    return 0;
}
void close_recorder(struct recorder *rec)
{
    if(rec == NULL || rec->state < RECORD_STATE_READY)
        return;
    if(rec->state == RECORD_STATE_RECORDING)
        stop_record(rec);
    rec->state = RECORD_STATE_CLOSING;
    close_recorder_internal(rec);
    rec->state = RECORD_STATE_CREATED;
}
int start_record(struct recorder * rec)
{
    int ret;
    if(rec == NULL)
        return -RECORD_ERR_INVAL;
    if( rec->state < RECORD_STATE_READY)
        return -RECORD_ERR_NOT_READY;
    if( rec->state == RECORD_STATE_RECORDING)
        return 0;
    ret = start_record_internal((snd_pcm_t *)rec->wavein_hdl);
    if(ret == 0)
        rec->state = RECORD_STATE_RECORDING;
    return ret;
}
int stop_record(struct recorder * rec)
{
    int ret;
    if(rec == NULL)
        return -RECORD_ERR_INVAL;
    if( rec->state < RECORD_STATE_RECORDING)
        return 0;
    rec->state = RECORD_STATE_STOPPING;
    ret = stop_record_internal((snd_pcm_t *)rec->wavein_hdl);
    if(ret == 0) {
        rec->state = RECORD_STATE_READY;
    }
    return ret;
}
int is_record_stopped(struct recorder *rec)
{
    if(rec->state == RECORD_STATE_RECORDING)
        return 0;
    return is_stopped_internal(rec);
}

这里对应的头文件为linuxrec.h,内容如下:

/*
 * @file
 * @brief a record demo in linux
 *
 * a simple record code. using alsa-lib APIs.
 * keep the function same as winrec.h
 *
 * Common steps:
 *    create_recorder,
 *    open_recorder, 
 *    start_record, 
 *    stop_record, 
 *    close_recorder,
 *    destroy_recorder
 *
 * @author        taozhang9
 * @date        2016/06/01
 */

#ifndef __IFLY_WINREC_H__
#define __IFLY_WINREC_H__

#include "formats.h"
/* error code */
enum {
    RECORD_ERR_BASE = 0,
    RECORD_ERR_GENERAL,
    RECORD_ERR_MEMFAIL,
    RECORD_ERR_INVAL,
    RECORD_ERR_NOT_READY
};

typedef struct {
    union {
        char *  name;
        int index;
        void *  resv;
    }u;
}record_dev_id;

/* recorder object. */
struct recorder {
    void (*on_data_ind)(char *data, unsigned long len, void *user_para);
    void * user_cb_para;
    volatile int state;     /* internal record state */

    void * wavein_hdl;
    /* thread id may be a struct. by implementation 
     * void * will not be ported!! */
    pthread_t rec_thread; 
    /*void * rec_thread_hdl;*/

    void * bufheader;
    unsigned int bufcount; 

    char *audiobuf;
    int bits_per_frame;
    unsigned int buffer_time;
    unsigned int period_time;
    size_t period_frames;
    size_t buffer_frames;
};

#ifdef __cplusplus
extern "C" {
#endif /* C++ */

/** 
 * @fn
 * @brief    Get the default input device ID
 *
 * @return    returns "default" in linux.
 *
 */
record_dev_id get_default_input_dev();

/**
 * @fn 
 * @brief    Get the total number of active input devices.
 * @return    
 */
int get_input_dev_num();

/**
 * @fn 
 * @brief    Create a recorder object.
 *
 * Never call the close_recorder in the callback function. as close
 * action will wait for the callback thread to quit. 
 *
 * @return    int         - Return 0 in success, otherwise return error code.
 * @param    out_rec     - [out] recorder object holder
 * @param    on_data_ind - [in]  callback. called when data coming.
 * @param    user_cb_para    - [in] user params for the callback.
 * @see
 */
int create_recorder(struct recorder ** out_rec, 
                void (*on_data_ind)(char *data, unsigned long len, void *user_para), 
                void* user_cb_para);

/**
 * @fn 
 * @brief    Destroy recorder object. free memory. 
 * @param    rec - [in]recorder object
 */
void destroy_recorder(struct recorder *rec);

/**
 * @fn 
 * @brief    open the device.
 * @return    int         - Return 0 in success, otherwise return error code.
 * @param    rec         - [in] recorder object
 * @param    dev         - [in] device id, from 0.
 * @param    fmt         - [in] record format.
 * @see
 *     get_default_input_dev()
 */
int open_recorder(struct recorder * rec, record_dev_id dev, WAVEFORMATEX * fmt);

/**
 * @fn
 * @brief    close the device.
 * @param    rec         - [in] recorder object
 */

void close_recorder(struct recorder *rec);

/**
 * @fn
 * @brief    start record.
 * @return    int         - Return 0 in success, otherwise return error code.
 * @param    rec         - [in] recorder object
 */
int start_record(struct recorder * rec);

/**
 * @fn
 * @brief    stop record.
 * @return    int         - Return 0 in success, otherwise return error code.
 * @param    rec         - [in] recorder object
 */
int stop_record(struct recorder * rec);

/**
 * @fn
 * @brief    test if the recording has been stopped.
 * @return    int         - 1: stopped. 0 : recording.
 * @param    rec         - [in] recorder object
 */
int is_record_stopped(struct recorder *rec);

#ifdef __cplusplus
} /* extern "C" */    
#endif /* C++ */

#endif

这里还有一个对应的formats.h头文件,用来表示wav文件格式的,源码如下:

#ifndef FORMATS_H_160601_TT
#define FORMATS_H_160601_TT        1

#ifndef WAVE_FORMAT_PCM  
#define WAVE_FORMAT_PCM  1
typedef struct tWAVEFORMATEX {
    unsigned short    wFormatTag;
    unsigned short    nChannels;
    unsigned int      nSamplesPerSec;
    unsigned int      nAvgBytesPerSec;
    unsigned short    nBlockAlign;
    unsigned short    wBitsPerSample;
    unsigned short    cbSize;
} WAVEFORMATEX;
#endif

#endif

这里需要注意的是上述三个新增的代码放的位置,两个头文件需要放到include目录下,linuxrec.c录音源码需要和唤醒测试的源码放到一起。最后,我们就可以来调用linuxrec.c下面的各种录音接口函数来开始录音了,然后将录音语料不断的发送给检测唤醒的函数来执行。完整代码如下:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>


#include "../../include/msp_cmn.h"
#include "../../include/qivw.h"
#include "../../include/msp_errors.h"
#include "../../include/linuxrec.h"
#include "../../include/formats.h"


#define SAMPLE_RATE_16K     (16000)

#define DEFAULT_FORMAT    \
{\
    WAVE_FORMAT_PCM,\
    1,          \
    16000,      \
    32000,      \
    2,          \
    16,         \
    sizeof(WAVEFORMATEX)\
}

struct recorder *recorder = NULL;

void sleep_ms(int ms)
{
    usleep(ms * 1000);
}

/* the record call back */
void record_data_cb(char *data, unsigned long len, void *user_para)
{
    int errcode = 0;
    const char *session_id = (const char *)user_para;

    if(len == 0 || data == NULL)
        return;

    errcode = QIVWAudioWrite(session_id, (const void *)data, len, MSP_AUDIO_SAMPLE_CONTINUE);
    if (MSP_SUCCESS != errcode)
    {
        printf("QIVWAudioWrite failed! error code:%d\n",errcode);
        int ret = stop_record(recorder);
        if (ret != 0) {
            printf("Stop failed! \n");
        }
        QIVWAudioWrite(session_id, NULL, 0, MSP_AUDIO_SAMPLE_LAST);
    }
}

int cb_ivw_msg_proc( const char *sessionID, int msg, int param1, int param2, const void *info, void *userData )
{
  if (MSP_IVW_MSG_ERROR == msg) //唤醒出错消息
  {
    printf("\n\nMSP_IVW_MSG_ERROR errCode = %d\n\n", param1);
  }else if (MSP_IVW_MSG_WAKEUP == msg) //唤醒成功消息
  {
    //printf("\n\nMSP_IVW_MSG_WAKEUP result = %s\n\n", (char*)info);
    system("play ~/Music/ding.wav");
  }

  return 0;
}


void run_ivw(const char* session_begin_params)
{
    const char *session_id = NULL;
    int err_code = MSP_SUCCESS;
    char sse_hints[128];

    WAVEFORMATEX wavfmt = DEFAULT_FORMAT;
    wavfmt.nSamplesPerSec = SAMPLE_RATE_16K;
    wavfmt.nAvgBytesPerSec = wavfmt.nBlockAlign * wavfmt.nSamplesPerSec;

    //start QIVW
    session_id=QIVWSessionBegin(NULL, session_begin_params, &err_code);
    if (err_code != MSP_SUCCESS)
    {
        printf("QIVWSessionBegin failed! error code:%d\n",err_code);
        goto exit;
    }

    err_code = QIVWRegisterNotify(session_id, cb_ivw_msg_proc, NULL);
    if (err_code != MSP_SUCCESS)
    {
        snprintf(sse_hints, sizeof(sse_hints), "QIVWRegisterNotify errorCode=%d", err_code);
        printf("QIVWRegisterNotify failed! error code:%d\n",err_code);
        goto exit;
    }

    //1.create recorder
    err_code = create_recorder(&recorder, record_data_cb, (void*)session_id);
    if (recorder == NULL || err_code != 0)
    {
            printf("create recorder failed: %d\n", err_code);
            err_code = MSP_ERROR_FAIL;
            goto exit;
    }

    //2.open_recorder
    err_code = open_recorder(recorder, get_default_input_dev(), &wavfmt);
    if (err_code != 0)
    {
        printf("recorder open failed: %d\n", err_code);
        err_code = MSP_ERROR_FAIL;
        goto exit;
    }

    //3.start record
    err_code = start_record(recorder);
    if (err_code != 0) {
        printf("start record failed: %d\n", err_code);
        err_code = MSP_ERROR_FAIL;
        goto exit;
    }

    while(1)
    {
        sleep_ms(2000); 
        printf("Listening... Press Ctrl+C to exit\n");
    }
    snprintf(sse_hints, sizeof(sse_hints), "success");

exit:
    if (recorder)
        {
        if(!is_record_stopped(recorder))
            stop_record(recorder);
        close_recorder(recorder);
        destroy_recorder(recorder);
        recorder = NULL;
    }
    if (NULL != session_id)
    {
        QIVWSessionEnd(session_id, sse_hints);
    }
}


int main(int argc, char* argv[])
{
    int         ret       = MSP_SUCCESS;
    const char *lgi_param = "appid = 5d5b9efd, work_dir = .";
    const char *ssb_param = "ivw_threshold=0:1450, sst=wakeup, ivw_res_path =fo|res/ivw/wakeupresource.jet";


    ret = MSPLogin(NULL, NULL, lgi_param);
    if (MSP_SUCCESS != ret)
    {
        printf("MSPLogin failed, error code: %d.\n", ret);
        MSPLogout();//登录失败,退出登录
    }

    run_ivw(ssb_param);
    return 0;
}

从上述代码我们可以看出来实现的逻辑流程,首先根据linuxrec.c中实现的录音功能,创建一个录音设备。然后打开录音设备,最后开启设备进行录音。在录音过程中设置了回调函数,当有数据返回时自动调用record_data_cb()函数处理录音数据。这里处理数据就是直接丢给QIVWAudioWrite()进行唤醒词的检测,此时在录音检测的时候也是设置了回调函数。当检测到录音语料中包含有唤醒词时,就自动的回调cb_ivw_msg_proc()函数进行处理。当然这里回调函数也不是百分百都是检测到唤醒词的,当检测程序异常的时候,也会抛出MSP_IVW_MSG_ERROR消息,只要返回MSP_IVW_MSG_WAKEUP消息,这才是最终的检测到唤醒词了。所以最终的唤醒提示是在cb_ivw_msg_proc()回调函数中提示的,如果我们需要在唤醒后有后续操作就可以在这个回调函数中进行设置。

0x04 编译程序并测试唤醒效果

这里最后就是准备编译程序了,不过在编译程序前,需要修改下Makefile文件。因为这里我们调用了alsa库中的一些API,所以在编译的时候需要加上-lasound这个选型就可以了,这样在编译程序的时候就会链接到对应的动态库。完整的Makefile文件如下:

#common makefile header

DIR_INC = ../../include
DIR_BIN = ../../bin
DIR_LIB = ../../libs

TARGET    = awaken_offline_sample
BIN_TARGET = $(DIR_BIN)/$(TARGET)

CROSS_COMPILE = 
CFLAGS = -g -Wall -I$(DIR_INC)

LDFLAGS := -L$(DIR_LIB)/
LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++ -lasound

OBJECTS := $(patsubst %.c,%.o,$(wildcard *.c))

$(BIN_TARGET) : $(OBJECTS)
    $(CROSS_COMPILE)gcc $(CFLAGS) $^ -o $@ $(LDFLAGS)

%.o : %.c
    $(CROSS_COMPILE)gcc -c $(CFLAGS) $< -o $@
clean:
    @rm -f *.o $(BIN_TARGET)

.PHONY:clean

#common makefile foot

在我们开始编译前,再来整体看一下各个文件的存储路径以及编译过程,如下图所示:

最终源码组成

接下来就可以来编译源码了,这里我把以前的bash脚本给修改了下名字,整个编译、执行过程如下图:

编译和执行唤醒检测过程

只看图片感受不到科大讯飞唤醒词的效果,接下来通过视频演示来看看唤醒效果吧,我这里设置的唤醒词是“灵龟机器人”。就是我们在前面看到的这个唤醒词的得分是5颗星:

科大讯飞唤醒效果演示

通过上面视频可以得知,无论科大讯飞的唤醒还是snowboy的唤醒效果,我们都需要根据不同唤醒词来配置不同的唤醒门限。不然就会有误唤醒率,这里仍然是需要不断的修改,不断的测试唤醒效果,最终才能得到最好的唤醒效果。对于科大讯飞的唤醒门限可以参考官方的文档库,如下图所示:

科大讯飞唤醒词门限设置

0x05 唤醒源码下载

对于上述修改好的测试源码,大家可以直接从以下代码仓库中下载:

https://code.corvin.cn/corvin_zhang/AIVoiceSystem

下载源码

0x06 参考资料

[1].讯飞开放平台官网. https://www.xfyun.cn/

[2].科大讯飞语音唤醒sdk文档. https://www.xfyun.cn/doc/asr/awaken/Linux-SDK.html

[3].科大讯飞语音唤醒介绍. https://www.xfyun.cn/services/awaken?type=awaken

[4].科大讯飞实时语音唤醒+离线命令词识别在Linux及ROS下的应用. https://haoqchen.site/2018/04/26/iflytek-awaken-asr/

[5].语音唤醒头文件qivw.h的API介绍. http://mscdoc.xfyun.cn/windows/api/iFlytekMSCReferenceManual/qivw_8h.html#details


0x07 问题反馈

大家在按照教程操作过程中有任何问题,可以直接在文章末尾给我留言,或者关注ROS小课堂的官方微信公众号,在公众号中给我发消息反馈问题也行。我基本上每天都会处理公众号中的留言!当然如果你要是顺便给ROS小课堂打个赏,我也会感激不尽的,打赏30块还会被邀请进ROS小课堂的微信群,与更多志同道合的小伙伴一起学习和交流!

发表评论