機器人語音問答的需要,調用百度AI的語音識別
這裏的思路很簡單,就是用百度的API,初始化客戶端,然後輸入參數進行調用。
代碼
- import wave
- import pyaudio
- from aip import AipSpeech,AipNlp
- from playsound import playsound
-
- """ 你的 APPID AK SK """
- APP_ID = '****'
- API_KEY = '****'
- SECRET_KEY = '****'
-
- # 讀取文件
- def get_file_content(filePath):
- with open(filePath, 'rb') as fp:
- return fp.read()
-
-
- # 錄音功能
- def record_content():
- CHUNK = 1024
- FORMAT = pyaudio.paInt16
- CHANNELS = 1
- RATE = 16000
- RECORD_SECONDS = 3
-
- WAVE_OUTPUT_FILENAME = "audio.wav"
- p = pyaudio.PyAudio()
- stream = p.open(format=FORMAT, channels=CHANNELS,
- rate=RATE, input=True,
- frames_per_buffer=CHUNK)
- print("* recording")
-
- frames = []
- for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
- data = stream.read(CHUNK)
- frames.append(data)
-
- print("* done recording")
-
- stream.stop_stream()
- stream.close()
- p.terminate()
-
- wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
- wf.setnchannels(CHANNELS)
- wf.setsampwidth(p.get_sample_size(FORMAT))
- wf.setframerate(RATE)
- wf.writeframes(b''.join(frames))
- wf.close()
- print("done ------------------------------ ")
- return WAVE_OUTPUT_FILENAME
-
-
- # 生成語音功能客戶端
- client_audio = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
-
- # 語音錄製
- filePath = record_content()
-
- # 語音識別
- result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
- 'dev_pid': 1536,
- })
- content_audio = result_audio['result'][0]
- print(content_audio)
-
-
- # 自然語音處理客戶端
- client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
-
- # text = "百度是一家高科技公司"
- text = content_audio
-
- """ 調用詞法分析 """
- xx = client_nlp.lexer(text)
- content_answer = xx['items'][0]['item']
-
- # 語音合成
- try:
- result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
- 'vol': 5,
- })
- except Exception as e:
- print(e)
-
- # 語音寫入
- if not isinstance(result_answer, dict):
- with open('audio.mp3', 'wb') as f:
- f.write(result_answer)
-
- # 語音播放
- playsound('audio.mp3')
錄音
首先將對方的語音錄下,存爲 **.wav 音頻文件,其中原始 PCM 的錄音參數必須符合 16k 採樣率、16bit 位深、單聲道,支持的格式有:pcm(不壓縮)、wav(不壓縮,pcm編碼)、amr(壓縮格式)
- # 錄音功能
- def record_content():
- CHUNK = 1024
- FORMAT = pyaudio.paInt16
- CHANNELS = 1
- RATE = 16000
- RECORD_SECONDS = 3
-
- WAVE_OUTPUT_FILENAME = "audio.wav"
- p = pyaudio.PyAudio()
- stream = p.open(format=FORMAT, channels=CHANNELS,
- rate=RATE, input=True,
- frames_per_buffer=CHUNK)
- print("* recording")
-
- frames = []
- for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
- data = stream.read(CHUNK)
- frames.append(data)
-
- print("* done recording")
-
- stream.stop_stream()
- stream.close()
- p.terminate()
-
- wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
- wf.setnchannels(CHANNELS)
- wf.setsampwidth(p.get_sample_size(FORMAT))
- wf.setframerate(RATE)
- wf.writeframes(b''.join(frames))
- wf.close()
- print("done ------------------------------ ")
- return WAVE_OUTPUT_FILENAME
-
識別
然後將錄音文件進行識別
代碼如下
- # 語音識別
- result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
- 'dev_pid': 1536,
- })
- content_audio = result_audio['result'][0]
- print(content_audio)
處理
- # 自然語音處理客戶端
- client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
-
- # text = "百度是一家高科技公司"
- text = content_audio
-
- """ 調用詞法分析 """
- xx = client_nlp.lexer(text)
- content_answer = xx['items'][0]['item']
回答
語音文件識別結束之後,將其寫入到本地文件,並進行播放(python幾種播放方法)
- # 語音合成
- try:
- result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
- 'vol': 5,
- })
- except Exception as e:
- print(e)
-
- # 語音寫入
- if not isinstance(result_answer, dict):
- with open('audio.mp3', 'wb') as f:
- f.write(result_answer)
-
- # 語音播放
- playsound('audio.mp3')
需要解決的問題(有建議請評論告知,感謝!):
1.不定長語音文件的判定(音頻文件時長不固定,根據說話時長來確定)
2.一羣人中確定一個說話人接收指令