百度AI的語音識別與語音合成

機器人語音問答的需要,調用百度AI的語音識別

這裏的思路很簡單,就是用百度的API,初始化客戶端,然後輸入參數進行調用。

代碼

  1. import wave
  2. import pyaudio
  3. from aip import AipSpeech,AipNlp
  4. from playsound import playsound
  5. """ 你的 APPID AK SK """
  6. APP_ID = '****'
  7. API_KEY = '****'
  8. SECRET_KEY = '****'
  9. # 讀取文件
  10. def get_file_content(filePath):
  11. with open(filePath, 'rb') as fp:
  12. return fp.read()
  13. # 錄音功能
  14. def record_content():
  15. CHUNK = 1024
  16. FORMAT = pyaudio.paInt16
  17. CHANNELS = 1
  18. RATE = 16000
  19. RECORD_SECONDS = 3
  20. WAVE_OUTPUT_FILENAME = "audio.wav"
  21. p = pyaudio.PyAudio()
  22. stream = p.open(format=FORMAT, channels=CHANNELS,
  23. rate=RATE, input=True,
  24. frames_per_buffer=CHUNK)
  25. print("* recording")
  26. frames = []
  27. for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
  28. data = stream.read(CHUNK)
  29. frames.append(data)
  30. print("* done recording")
  31. stream.stop_stream()
  32. stream.close()
  33. p.terminate()
  34. wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
  35. wf.setnchannels(CHANNELS)
  36. wf.setsampwidth(p.get_sample_size(FORMAT))
  37. wf.setframerate(RATE)
  38. wf.writeframes(b''.join(frames))
  39. wf.close()
  40. print("done ------------------------------ ")
  41. return WAVE_OUTPUT_FILENAME
  42. # 生成語音功能客戶端
  43. client_audio = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
  44. # 語音錄製
  45. filePath = record_content()
  46. # 語音識別
  47. result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
  48. 'dev_pid': 1536,
  49. })
  50. content_audio = result_audio['result'][0]
  51. print(content_audio)
  52. # 自然語音處理客戶端
  53. client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
  54. # text = "百度是一家高科技公司"
  55. text = content_audio
  56. """ 調用詞法分析 """
  57. xx = client_nlp.lexer(text)
  58. content_answer = xx['items'][0]['item']
  59. # 語音合成
  60. try:
  61. result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
  62. 'vol': 5,
  63. })
  64. except Exception as e:
  65. print(e)
  66. # 語音寫入
  67. if not isinstance(result_answer, dict):
  68. with open('audio.mp3', 'wb') as f:
  69. f.write(result_answer)
  70. # 語音播放
  71. playsound('audio.mp3')

 

錄音

首先將對方的語音錄下,存爲 **.wav 音頻文件,其中原始 PCM 的錄音參數必須符合 16k 採樣率16bit 位深單聲道,支持的格式有:pcm(不壓縮)、wav(不壓縮,pcm編碼)、amr(壓縮格式)

  1. # 錄音功能
  2. def record_content():
  3. CHUNK = 1024
  4. FORMAT = pyaudio.paInt16
  5. CHANNELS = 1
  6. RATE = 16000
  7. RECORD_SECONDS = 3
  8. WAVE_OUTPUT_FILENAME = "audio.wav"
  9. p = pyaudio.PyAudio()
  10. stream = p.open(format=FORMAT, channels=CHANNELS,
  11. rate=RATE, input=True,
  12. frames_per_buffer=CHUNK)
  13. print("* recording")
  14. frames = []
  15. for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
  16. data = stream.read(CHUNK)
  17. frames.append(data)
  18. print("* done recording")
  19. stream.stop_stream()
  20. stream.close()
  21. p.terminate()
  22. wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
  23. wf.setnchannels(CHANNELS)
  24. wf.setsampwidth(p.get_sample_size(FORMAT))
  25. wf.setframerate(RATE)
  26. wf.writeframes(b''.join(frames))
  27. wf.close()
  28. print("done ------------------------------ ")
  29. return WAVE_OUTPUT_FILENAME

 

識別

然後將錄音文件進行識別

代碼如下

  1. # 語音識別
  2. result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
  3. 'dev_pid': 1536,
  4. })
  5. content_audio = result_audio['result'][0]
  6. print(content_audio)

處理

  1. # 自然語音處理客戶端
  2. client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
  3. # text = "百度是一家高科技公司"
  4. text = content_audio
  5. """ 調用詞法分析 """
  6. xx = client_nlp.lexer(text)
  7. content_answer = xx['items'][0]['item']

 

回答

語音文件識別結束之後,將其寫入到本地文件,並進行播放(python幾種播放方法

  1. # 語音合成
  2. try:
  3. result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
  4. 'vol': 5,
  5. })
  6. except Exception as e:
  7. print(e)
  8. # 語音寫入
  9. if not isinstance(result_answer, dict):
  10. with open('audio.mp3', 'wb') as f:
  11. f.write(result_answer)
  12. # 語音播放
  13. playsound('audio.mp3')

 

 

需要解決的問題(有建議請評論告知,感謝!):

1.不定長語音文件的判定(音頻文件時長不固定,根據說話時長來確定)

2.一羣人中確定一個說話人接收指令

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章