前言
之前我已經向大家介紹了 CloudRTC中的 iOS 端,Mac 端是如何使用和初始化音頻設備的,今天我們就是看一看Windows端又是如何做的。
在Windows下對音頻的採集分爲兩種方式,一種是CoreAudio,只有在Vista之後的版本中纔可以用它;另一種是Wave方式,這是一種比較老的音頻採集方法。
下面會對這兩種方法都做一下介紹。
使用哪種方式進行音頻採集?
在講解之前,我們先來回顧一下CloudRTC的音頻處理流程,其時序圖如下:
在上圖可以看出,在音頻設備初始化時,會調用AudioDeviceModuleImpl類中的 CreatePlatformSpecificObjects方法,在該方法中會根據不同的操作系統平臺創建不同的設備對象。如 iOS,就會創建 AudioDeviceIOS對象等等。
對於Windows來說,還會盡一步細分。如果是 Vista之後的Windows版本,它會創建 AudioDeviceWindowsCore 對象;而 Vista 之前的操作系統,則會創建 AudioDeviceWindowsWave對象。
它是如何區分是Vista之後的Windows版本還是之前的版本的呢? 是通過下面這個函數 AudioDeviceWindowsCore::CoreAudioIsSupported() 來判斷的。
我們來看一下它的大體過程:
int32_t AudioDeviceModuleImpl::CreatePlatformSpecificObjects()
{
...
#if defined(HJAV_WINDOWS_CORE_AUDIO_BUILD)
...
if (AudioDeviceWindowsCore::CoreAudioIsSupported())
{
// create *Windows Core Audio* implementation
ptrAudioDevice = new AudioDeviceWindowsCore(Id());
...
}
else
{
// create *Windows Wave Audio* implementation
ptrAudioDevice = new AudioDeviceWindowsWave(Id());
...
}
...
}
從上面的代碼,我們可以看出CloudRTC會通過 CoreAudioIsSupported
函數的返回值來判定是使用 AudioDeviceWindowsCore
採集、播放音頻,還是使用 AudioDeviceWindowsWave
處理音頻。
OK,瞭解了上面的邏輯後,我們再來詳細的看一下 CoreAudioIsSupported
是如何判斷是否支持 CoreAudio的。
基本知識
在看 CoreAudioIsSupported 之前,我們還要了解以下一些基本知識:
MMDevice(Windows Multimedia Device)API能夠找到音頻終端設備,並決定它們的能力以及爲這些設備創建驅動實例 。
在 Mmdeviceapi.h 頭文件中,定義了 MMDevice API 相關接口。MMDevice API 由幾個接口組成。這些接口中的第一個是 IMMDeviceEnumerator 。通過調用CoCreateInstance函數,IAudioClient能夠遍歷設備對象的IMMDeviceEnumerator 接口的引用。
通過 IMMDeviceEnumerator 接口,IAudioClient又能獲得在MMDevice API中其它接口的引用。MMDevice API實現了下面的接口:
- IMMDevice:代表一個音頻設備。
- IMMDeviceCollection: 代表音頻設備集。
- IMMDeviceEnumerator: 提供了遍歷音頻設備的方法。
- IMMEndpoint: 代表一個音頻終端設備。
CoreAudioIsSupported函數
CoreAudioIsSupported 大體上做了以下幾件事兒:
- 通過系統函數
VerifyVersionInfo
獲取當前系統的版本,只有 Vista 及之後的版本纔可以用 CoreAudio。 - 初始化COM組件,並使用 MTA模式,也就是多線程模式。
- 檢查MMDevice API是否有效。
- 驗證我們是否可以創建並初始化 Core Audio 類。
下面我們看一下具體的代碼邏輯:
bool AudioDeviceWindowsCore::CoreAudioIsSupported()
{
...
// 1) Check if Windows version is Vista SP1 or later.
//
// CoreAudio is only available on Vista SP1 and later.
//
OSVERSIONINFOEX osvi;
DWORDLONG dwlConditionMask = 0;
int op = VER_LESS_EQUAL;
// Initialize the OSVERSIONINFOEX structure.
ZeroMemory(&osvi, sizeof(OSVERSIONINFOEX));
osvi.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX);
osvi.dwMajorVersion = 6;
osvi.dwMinorVersion = 0;
osvi.wServicePackMajor = 0;
osvi.wServicePackMinor = 0;
osvi.wProductType = VER_NT_WORKSTATION;
// Initialize the condition mask.
VER_SET_CONDITION(dwlConditionMask, VER_MAJORVERSION, op);
VER_SET_CONDITION(dwlConditionMask, VER_MINORVERSION, op);
VER_SET_CONDITION(dwlConditionMask, VER_SERVICEPACKMAJOR, op);
VER_SET_CONDITION(dwlConditionMask, VER_SERVICEPACKMINOR, op);
VER_SET_CONDITION(dwlConditionMask, VER_PRODUCT_TYPE, VER_EQUAL);
DWORD dwTypeMask = VER_MAJORVERSION | VER_MINORVERSION |
VER_SERVICEPACKMAJOR | VER_SERVICEPACKMINOR |
VER_PRODUCT_TYPE;
// Perform the test.
BOOL isVistaRTMorXP = VerifyVersionInfo(&osvi, dwTypeMask,
dwlConditionMask);
...
// 2) Initializes the COM library for use by the calling thread.
// The COM init wrapper sets the thread's concurrency model to MTA,
// and creates a new apartment for the thread if one is required. The
// wrapper also ensures that each call to CoInitializeEx is balanced
// by a corresponding call to CoUninitialize.
//
ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
if (!comInit.succeeded()) {
// Things will work even if an STA thread is calling this method but we
// want to ensure that MTA is used and therefore return false here.
return false;
}
// 3) Check if the MMDevice API is available.
IMMDeviceEnumerator* pIMMD(NULL);
const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);
hr = CoCreateInstance(
CLSID_MMDeviceEnumerator, // GUID value of MMDeviceEnumerator coclass
NULL,
CLSCTX_ALL,
IID_IMMDeviceEnumerator, // GUID value of the IMMDeviceEnumerator interface
(void**)&pIMMD );
...
// 4) Verify that we can create and initialize our Core Audio class.
//
// Also, perform a limited "API test" to ensure that Core Audio is supported for all devices.
//
if (MMDeviceIsAvailable)
{
coreAudioIsSupported = false;
AudioDeviceWindowsCore* p = new AudioDeviceWindowsCore(-1);
...
ok |= p->Init();
int16_t numDevsRec = p->RecordingDevices();
for (uint16_t i = 0; i < numDevsRec; i++)
{
...
}
int16_t numDevsPlay = p->PlayoutDevices();
for (uint16_t i = 0; i < numDevsPlay; i++)
{
...
}
}
...
return (coreAudioIsSupported);
}
幾個重要函數
與iOS, Mac是一樣的,Windows的音視頻也有兩個重要函數, InitRecording 和 InitPlayout。下面我們就對 CoreAudio 和 Wave 這兩個不同的採集、播放的方式做一下介紹:
CoreAudio
首先,我們來看一下 CoreAudio 是如何採集、播放音視頻的。
InitRecording
先來看一下 InitRecording 都做了哪些事兒:
- 初始化Microphone。
- 使用MMDevice創建IAudioClient COM對象。
- 使用 IAudioClient 獲取音頻格式(採樣率、通道數、採樣大小)。
- 根據用戶自己設置的音頻格式,初始化 IAudioClient。
- 獲取IAudioClient的緩衝區大小,用於存放數據。
- 設置事件,當音頻緩衝區準備好後,觸發該事件。
- 通過 IAudioClient 獲取 IAudioCaptureClient 接口。該接口會在 DoCaptureThread 線程中使用。
int32_t AudioDeviceWindowsCore::InitRecording()
{
#ifdef ASYNC_INIT_RECORDING
...
HANDLE recInitThread = CreateThread(NULL,
0,
WSAPICaptureInitThread,
this,
0,
NULL);
...
// Set thread priority to highest possible
SetThreadPriority(recInitThread, THREAD_PRIORITY_TIME_CRITICAL);
...
}
DWORD WINAPI AudioDeviceWindowsCore::WSAPICaptureInitThread(LPVOID context)
{
return reinterpret_cast<AudioDeviceWindowsCore*>(context)->
DoCaptureInitThread();
}
DWORD AudioDeviceWindowsCore::DoCaptureInitThread()
{
// Initialize COM as MTA in this thread.
ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
InitRecording_DoIt();
...
}
int32_t AudioDeviceWindowsCore::InitRecording_DoIt()
{
CriticalSectionScoped lock(&_critSect);
...
// Initialize the microphone (devices might have been added or removed)
if (InitMicrophone() == -1)
...
if (_builtInAudioProcessEnabled)
{
// The DMO will configure the capture device.
return InitRecordingDMO();
}
HRESULT hr = S_OK;
WAVEFORMATEX* pWfxIn = NULL;
WAVEFORMATEX Wfx = WAVEFORMATEX();
WAVEFORMATEX* pWfxClosestMatch = NULL;
// Create COM object with IAudioClient interface.
SAFE_RELEASE(_ptrClientIn);
hr = _ptrDeviceIn->Activate(
__uuidof(IAudioClient),
CLSCTX_ALL,
NULL,
(void**)&_ptrClientIn);
EXIT_ON_ERROR(hr);
// Retrieve the stream format that the audio engine uses for its internal
// processing (mixing) of shared-mode streams.
hr = _ptrClientIn->GetMixFormat(&pWfxIn);
...
// Set wave format
Wfx.wFormatTag = WAVE_FORMAT_PCM;
Wfx.wBitsPerSample = 16;
Wfx.cbSize = 0;
const int freqs[6] = {48000, 44100, 16000, 96000, 32000, 8000};
hr = S_FALSE;
// Iterate over frequencies and channels, in order of priority
for (int freq = 0; freq < sizeof(freqs)/sizeof(freqs[0]); freq++)
{
for (int chan = 0; chan < sizeof(_recChannelsPrioList)/sizeof(_recChannelsPrioList[0]); chan++)
{
Wfx.nChannels = _recChannelsPrioList[chan];
Wfx.nSamplesPerSec = freqs[freq];
Wfx.nBlockAlign = Wfx.nChannels * Wfx.wBitsPerSample / 8;
Wfx.nAvgBytesPerSec = Wfx.nSamplesPerSec * Wfx.nBlockAlign;
// If the method succeeds and the audio endpoint device supports the specified stream format,
// it returns S_OK. If the method succeeds and provides a closest match to the specified format,
// it returns S_FALSE.
hr = _ptrClientIn->IsFormatSupported(
AUDCLNT_SHAREMODE_SHARED,
&Wfx,
&pWfxClosestMatch);
...
}
if (hr == S_OK)
{
_recAudioFrameSize = Wfx.nBlockAlign;
_recSampleRate = Wfx.nSamplesPerSec;
_recBlockSize = Wfx.nSamplesPerSec/100;
_recChannels = Wfx.nChannels;
...
}
// Create a capturing stream.
hr = _ptrClientIn->Initialize(
AUDCLNT_SHAREMODE_SHARED, // share Audio Engine with other applications
AUDCLNT_STREAMFLAGS_EVENTCALLBACK | // processing of the audio buffer by the client will be event driven
AUDCLNT_STREAMFLAGS_NOPERSIST, // volume and mute settings for an audio session will not persist across system restarts
0, // required for event-driven shared mode
0, // periodicity
&Wfx, // selected wave format
NULL); // session GUID
...
// Get the actual size of the shared (endpoint buffer).
// Typical value is 960 audio frames <=> 20ms @ 48kHz sample rate.
UINT bufferFrameCount(0);
hr = _ptrClientIn->GetBufferSize(
&bufferFrameCount);
...
// Set the event handle that the system signals when an audio buffer is ready
// to be processed by the client.
hr = _ptrClientIn->SetEventHandle(
_hCaptureSamplesReadyEvent);
EXIT_ON_ERROR(hr);
// Get an IAudioCaptureClient interface.
SAFE_RELEASE(_ptrCaptureClient);
hr = _ptrClientIn->GetService(
__uuidof(IAudioCaptureClient),
(void**)&_ptrCaptureClient);
EXIT_ON_ERROR(hr);
// Mark capture side as initialized
_recIsInitialized = true;
CoTaskMemFree(pWfxIn);
CoTaskMemFree(pWfxClosestMatch);
...
return -1;
}
從上面的代碼中可以看到在 InitRecording_DoIt 函數中設置了音頻參數,並設置了 _hCaptureSamplesReadyEvent事件。也就是當 audio buffer準備好後,音頻設備就會發送_hCaptureSamplesReadyEvent事件,而另外一個線程 DoCaptureThread 在收到這個事件後,進行數據的拷貝,從設備buffer中拷到用戶的 buffer 中。
下面我們就來看一下 DoCaptureThread 都做了哪些事兒?
DoCaptureThread
在DoCaptureThread線程中做了以下幾件事兒:
- 初始化 COM 組件。
- 通過 IAudioClient獲取Capture緩衝區的大小。
- 在用戶空間創建一個緩衝區。
- 啓動 IAudioClient,開始捕獲音頻數據。
- 等待 _hCaptureSamplesReadyEvent 事件,當收到該事件後,就調用CopyMemory將數據從設備緩衝區拷到用戶緩衝區。
- 之後通過 DeliverRecordedData 將用戶緩衝區的數據傳輸出去。
具體代碼如下
DWORD AudioDeviceWindowsCore::DoCaptureThread()
{
...
// Initialize COM as MTA in this thread.
ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
...
hr = InitCaptureThreadPriority();
...
_Lock();
...
// Get size of capturing buffer (length is expressed as the number of audio frames the buffer can hold).
// This value is fixed during the capturing session.
UINT32 bufferLength = 0;
hr = _ptrClientIn->GetBufferSize(&bufferLength);
...
// Allocate memory for sync buffer.
// It is used for compensation between native 44.1 and internal 44.0 and
// for cases when the capture buffer is larger than 10ms.
const UINT32 syncBufferSize = 2*(bufferLength * _recAudioFrameSize);
syncBuffer = new BYTE[syncBufferSize];
...
// Start up the capturing stream.
hr = _ptrClientIn->Start();
...
_UnLock();
// Set event which will ensure that the calling thread modifies the recording state to true
SetEvent(_hCaptureStartedEvent);
while (keepRecording)
{
// Wait for a capture notification event or a shutdown event
DWORD waitResult = WaitForMultipleObjects(2, waitArray, FALSE, 10000);
switch (waitResult)
{
case WAIT_OBJECT_0 + 0: // _hShutdownCaptureEvent
keepRecording = false;
break;
case WAIT_OBJECT_0 + 1: // _hCaptureSamplesReadyEvent
break;
case WAIT_TIMEOUT: // timeout notification
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "capture event timed out after 10 seconds");
goto Exit;
default: // unexpected error
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "unknown wait termination on capture side");
goto Exit;
}
while (keepRecording)
{
...
_Lock();
...
// Find out how much capture data is available
//
hr = _ptrCaptureClient->GetBuffer(&pData, // packet which is ready to be read by used
&framesAvailable, // #frames in the captured packet (can be zero)
&flags, // support flags (check)
&recPos, // device position of first audio frame in data packet
&recTime); // value of performance counter at the time of recording the first audio frame
if (SUCCEEDED(hr))
{
...
if (pData)
{
CopyMemory(&syncBuffer[syncBufIndex*_recAudioFrameSize], pData, framesAvailable*_recAudioFrameSize);
}
else
{
ZeroMemory(&syncBuffer[syncBufIndex*_recAudioFrameSize], framesAvailable*_recAudioFrameSize);
}
...
while (syncBufIndex >= _recBlockSize)
{
if (_ptrAudioBuffer)
{
...
_UnLock(); // release lock while making the callback
_ptrAudioBuffer->DeliverRecordedData();
_Lock(); // restore the lock
...
}
...
}
...
}
...
_UnLock();
}
}
hr = _ptrClientIn->Stop();
...
}
通過上面的代碼我們已經可以看的很清楚了。
InitPlayout
下面我們再來看一下 InitPlayout
這個函數都做了哪些事兒:
- 初始化 speaker。
- 通過 MMDevice 創建 IAudioClient 接口的 COM對象。
- 通過 IAudioClient獲得當前音頻格式(採樣率,採樣大小,通道數)
- 設置用戶設定的音頻參數,並初始化IAudioClient 。
- 獲得設備終端的緩衝區大小。
- 設備事件,當 audio buffer 準備好後,觸發該事件。
- 通過 IAudioClient 獲得 IAudioRenderClient接口,該接口會在音頻渲染線程中使用。
下面是具體的代碼:
int32_t AudioDeviceWindowsCore::InitPlayout()
{
CriticalSectionScoped lock(&_critSect);
...
// Initialize the speaker (devices might have been added or removed)
if (InitSpeaker() == -1)
...
HRESULT hr = S_OK;
WAVEFORMATEX* pWfxOut = NULL;
WAVEFORMATEX Wfx = WAVEFORMATEX();
WAVEFORMATEX* pWfxClosestMatch = NULL;
// Create COM object with IAudioClient interface.
SAFE_RELEASE(_ptrClientOut);
hr = _ptrDeviceOut->Activate(
__uuidof(IAudioClient),
CLSCTX_ALL,
NULL,
(void**)&_ptrClientOut);
EXIT_ON_ERROR(hr);
// Retrieve the stream format that the audio engine uses for its internal
// processing (mixing) of shared-mode streams.
hr = _ptrClientOut->GetMixFormat(&pWfxOut);
...
// Set wave format
Wfx.wFormatTag = WAVE_FORMAT_PCM;
Wfx.wBitsPerSample = 16;
Wfx.cbSize = 0;
const int freqs[] = {48000, 44100, 16000, 96000, 32000, 8000};
hr = S_FALSE;
// Iterate over frequencies and channels, in order of priority
for (int freq = 0; freq < sizeof(freqs)/sizeof(freqs[0]); freq++)
{
for (int chan = 0; chan < sizeof(_playChannelsPrioList)/sizeof(_playChannelsPrioList[0]); chan++)
{
Wfx.nChannels = _playChannelsPrioList[chan];
Wfx.nSamplesPerSec = freqs[freq];
Wfx.nBlockAlign = Wfx.nChannels * Wfx.wBitsPerSample / 8;
Wfx.nAvgBytesPerSec = Wfx.nSamplesPerSec * Wfx.nBlockAlign;
// If the method succeeds and the audio endpoint device supports the specified stream format,
// it returns S_OK. If the method succeeds and provides a closest match to the specified format,
// it returns S_FALSE.
hr = _ptrClientOut->IsFormatSupported(
AUDCLNT_SHAREMODE_SHARED,
&Wfx,
&pWfxClosestMatch);
...
}
// TODO(andrew): what happens in the event of failure in the above loop?
// Is _ptrClientOut->Initialize expected to fail?
// Same in InitRecording().
if (hr == S_OK)
{
_playAudioFrameSize = Wfx.nBlockAlign;
_playBlockSize = Wfx.nSamplesPerSec/100;
_playSampleRate = Wfx.nSamplesPerSec;
_devicePlaySampleRate = Wfx.nSamplesPerSec; // The device itself continues to run at 44.1 kHz.
_devicePlayBlockSize = Wfx.nSamplesPerSec/100;
_playChannels = Wfx.nChannels;
...
}
// Create a rendering stream.
//
// ****************************************************************************
// For a shared-mode stream that uses event-driven buffering, the caller must
// set both hnsPeriodicity and hnsBufferDuration to 0. The Initialize method
// determines how large a buffer to allocate based on the scheduling period
// of the audio engine. Although the client's buffer processing thread is
// event driven, the basic buffer management process, as described previously,
// is unaltered.
// Each time the thread awakens, it should call IAudioClient::GetCurrentPadding
// to determine how much data to write to a rendering buffer or read from a capture
// buffer. In contrast to the two buffers that the Initialize method allocates
// for an exclusive-mode stream that uses event-driven buffering, a shared-mode
// stream requires a single buffer.
// ****************************************************************************
//
REFERENCE_TIME hnsBufferDuration = 0; // ask for minimum buffer size (default)
if (_devicePlaySampleRate == 44100)
{
// Ask for a larger buffer size (30ms) when using 44.1kHz as render rate.
// There seems to be a larger risk of underruns for 44.1 compared
// with the default rate (48kHz). When using default, we set the requested
// buffer duration to 0, which sets the buffer to the minimum size
// required by the engine thread. The actual buffer size can then be
// read by GetBufferSize() and it is 20ms on most machines.
hnsBufferDuration = 30*10000;
}
hr = _ptrClientOut->Initialize(
AUDCLNT_SHAREMODE_SHARED, // share Audio Engine with other applications
AUDCLNT_STREAMFLAGS_EVENTCALLBACK, // processing of the audio buffer by the client will be event driven
hnsBufferDuration, // requested buffer capacity as a time value (in 100-nanosecond units)
0, // periodicity
&Wfx, // selected wave format
NULL); // session GUID
...
if (_ptrAudioBuffer)
{
// Update the audio buffer with the selected parameters
_ptrAudioBuffer->SetPlayoutSampleRate(_playSampleRate);
_ptrAudioBuffer->SetPlayoutChannels((uint8_t)_playChannels);
}
else
{
// We can enter this state during CoreAudioIsSupported() when no AudioDeviceImplementation
// has been created, hence the AudioDeviceBuffer does not exist.
// It is OK to end up here since we don't initiate any media in CoreAudioIsSupported().
HJAV_TRACE(kTraceInfo, kTraceAudioDevice, _id, "AudioDeviceBuffer must be attached before streaming can start");
}
// Get the actual size of the shared (endpoint buffer).
// Typical value is 960 audio frames <=> 20ms @ 48kHz sample rate.
UINT bufferFrameCount(0);
hr = _ptrClientOut->GetBufferSize(
&bufferFrameCount);
...
// Set the event handle that the system signals when an audio buffer is ready
// to be processed by the client.
hr = _ptrClientOut->SetEventHandle
_hRenderSamplesReadyEvent);
EXIT_ON_ERROR(hr);
// Get an IAudioRenderClient interface.
SAFE_RELEASE(_ptrRenderClient);
hr = _ptrClientOut->GetService(
__uuidof(IAudioRenderClient),
(void**)&_ptrRenderClient);
EXIT_ON_ERROR(hr);
// Mark playout side as initialized
_playIsInitialized = true;
CoTaskMemFree(pWfxOut);
CoTaskMemFree(pWfxClosestMatch);
...
}
通過上面的代碼,可以知道 InitPlayout函數裏做了哪些事兒。下面我們再看一下渲染線程的具體工作:
DoRenderThread
在 DoRenderThread 線程中會做以下事情:
- 設置COM的線程模型爲 MTA。
- 通過 IAudioClient獲得緩衝區的大小。
- 然後通過 IAudioRenderClient 獲取音頻設備緩衝區地址。
- 啓動 IAudioClient。
- 當收到 _hRenderSamplesReadyEvent事件後,從AudioBuffer中獲音頻數據到設備緩衝區。
- 最終通過音頻設備將聲音播放出來
DWORD AudioDeviceWindowsCore::DoRenderThread()
{
...
// Initialize COM as MTA in this thread.
ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
...
_SetThreadName(0, "hjav_core_audio_render_thread");
...
_Lock();
...
// Get size of rendering buffer (length is expressed as the number of audio frames the buffer can hold).
// This value is fixed during the rendering session.
//
UINT32 bufferLength = 0;
hr = _ptrClientOut->GetBufferSize(&bufferLength);
...
// Before starting the stream, fill the rendering buffer with silence.
//
BYTE *pData = NULL;
hr = _ptrRenderClient->GetBuffer(bufferLength, &pData);
EXIT_ON_ERROR(hr);
...
// Start up the rendering audio stream.
hr = _ptrClientOut->Start();
_UnLock();
// Set event which will ensure that the calling thread modifies the playing state to true.
//
SetEvent(_hRenderStartedEvent);
while (keepPlaying)
{
// Wait for a render notification event or a shutdown event
DWORD waitResult = WaitForMultipleObjects(2, waitArray, FALSE, 500);
switch (waitResult)
{
case WAIT_OBJECT_0 + 0: // _hShutdownRenderEvent
keepPlaying = false;
break;
case WAIT_OBJECT_0 + 1: // _hRenderSamplesReadyEvent
break;
case WAIT_TIMEOUT: // timeout notification
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "render event timed out after 0.5 seconds");
goto Exit;
continue;
default: // unexpected error
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "unknown wait termination on render side");
goto Exit;
}
while (keepPlaying)
{
_Lock();
...
// Write n*10ms buffers to the render buffer
const uint32_t n10msBuffers = (framesAvailable / _playBlockSize);
for (uint32_t n = 0; n < n10msBuffers; n++)
{
// Get pointer (i.e., grab the buffer) to next space in the shared render buffer.
hr = _ptrRenderClient->GetBuffer(_playBlockSize, &pData);
...
if (_ptrAudioBuffer)
{
// Request data to be played out (#bytes = _playBlockSize*_audioFrameSize)
_UnLock();
int32_t nSamples = _ptrAudioBuffer->RequestPlayoutData(_playBlockSize);
_Lock();
...
// Get the actual (stored) data
nSamples = _ptrAudioBuffer->GetPlayoutData((int8_t*)pData);
}
...
}
...
_UnLock();
}
}
...
}
OK,至此我已經將 CoreAudio向大家介紹完了,下面我們來看一下 Wave是如何做的。
Wave
接下來我們看一下 Wave 是如何進行音頻採集與播放的。使用 Wave進行音頻採集主要涉及三個重要函數:
- Init
- InitRecording
- InitPlay
Init
首先,我們看一下 Init函數都做了哪些事兒:
- 在 Init 函數中啓動了幾個線程,其中最關鍵的是 ThreadFunc 線程。在該線程中主要做以下幾件事兒:
- 調用 PrepareStartPlayout 函數:將音頻設備緩衝區清空,共 30ms 的數據。
- 調用 PrepareStartRecording 函數:它會循環創建 buffer,並將這些 buffer添加到音頻設備中。然後啓動 Wave錄製設備。
- 調用 PlayProc函數:該函數從AudioBuffer獲得數據,並將數據寫到音頻設備緩衝區進行播放。
- 循環調用 RecProc 函數:它將音頻設備的數據拷貝到用戶緩衝我,並最終傳輸出去。
- 啓動 GetCaptureVolumeThread 線程。
- 啓動 SetCaptureVolumeThread 線程。
int32_t AudioDeviceWindowsWave::Init()
{
CriticalSectionScoped lock(&_critSect);
...
const char* threadName = "hjav_audio_module_thread";
_ptrThread = ThreadWrapper::CreateThread(ThreadFunc,
this,
threadName);
...
_ptrThread->SetPriority(kRealtimePriority);
_threadID = _ptrThread->GetThreadId();
...
_hGetCaptureVolumeThread = CreateThread(NULL,
0,
GetCaptureVolumeThread,
this,
0,
NULL);
...
SetThreadPriority(_hGetCaptureVolumeThread, THREAD_PRIORITY_NORMAL);
_hSetCaptureVolumeThread = CreateThread(NULL,
0,
SetCaptureVolumeThread,
this,
0,
NULL);
...
SetThreadPriority(_hSetCaptureVolumeThread, THREAD_PRIORITY_NORMAL);
...
}
bool AudioDeviceWindowsWave::ThreadFunc(void* pThis)
{
return (static_cast<AudioDeviceWindowsWave*>(pThis)->ThreadProcess());
}
bool AudioDeviceWindowsWave::ThreadProcess()
{
...
switch (_timeEvent.Wait(1000))
{
case kEventSignaled:
break;
case kEventError:
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "EventWrapper::Wait() failed => restarting timer");
_timeEvent.StopTimer();
_timeEvent.StartTimer(true, TIMER_PERIOD_MS);
return true;
case kEventTimeout:
return true;
}
time = AudioDeviceUtility::GetTimeInMS();
if (_startPlay)
{
if (PrepareStartPlayout() == 0)
{
...
}
}
if (_startRec)
{
if (PrepareStartRecording() == 0)
{
...
}
}
...
if (_playing &&
(playDiff > (uint32_t)(_dTcheckPlayBufDelay - 1)) ||
(playDiff < 0))
{
Lock();
if (_playing)
{
if (PlayProc(playTime) == -1)
...
}
UnLock();
}
if (_playing && (playDiff > 12))
{
// It has been a long time since we were able to play out, try to
// compensate by calling PlayProc again.
//
Lock();
if (_playing)
{
if (PlayProc(playTime))
...
}
UnLock();
}
if (_recording &&
(recDiff > REC_CHECK_TIME_PERIOD_MS) ||
(recDiff < 0))
{
Lock();
if (_recording)
{
...
// Deliver all availiable recorded buffers and update the CPU load measurement.
// We use a while loop here to compensate for the fact that the multi-media timer
// can sometimed enter a "bad state" after hibernation where the resolution is
// reduced from ~1ms to ~10-15 ms.
//
while ((nRecordedBytes = RecProc(recTime)) > 0)
{
...
}
...
// Monitor the recording process and generate error/warning callbacks if needed
MonitorRecording(time);
}
UnLock();
}
...
}
int32_t AudioDeviceWindowsWave::PrepareStartPlayout()
{
CriticalSectionScoped lock(&_critSect);
...
// A total of 30ms of data is immediately placed in the SC buffer
//
int8_t zeroVec[4*PLAY_BUF_SIZE_IN_SAMPLES]; // max allocation
memset(zeroVec, 0, 4*PLAY_BUF_SIZE_IN_SAMPLES);
{
Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
}
...
return 0;
}
int32_t AudioDeviceWindowsWave::Write(int8_t* data, uint16_t nSamples)
{
...
if (_playIsInitialized)
{
...
const uint16_t bufCount(_playBufCount);
...
// Send a data block to the given waveform-audio output device.
//
// When the buffer is finished, the WHDR_DONE bit is set in the dwFlags
// member of the WAVEHDR structure. The buffer must be prepared with the
// waveOutPrepareHeader function before it is passed to waveOutWrite.
// Unless the device is paused by calling the waveOutPause function,
// playback begins when the first data block is sent to the device.
//
res = waveOutWrite(_hWaveOut, &_waveHeaderOut[bufCount], sizeof(_waveHeaderOut[bufCount]));
...
}
return 0;
}
int32_t AudioDeviceWindowsWave::PrepareStartRecording()
{
CriticalSectionScoped lock(&_critSect);
...
res = waveInGetPosition(_hWaveIn, &mmtime, sizeof(mmtime));
...
_read_samples = mmtime.u.sample;
_read_samples_old = _read_samples;
_rec_samples_old = mmtime.u.sample;
_wrapCounter = 0;
for (int n = 0; n < N_BUFFERS_IN; n++)
{
const uint8_t nBytesPerSample = 2*_recChannels;
// set up the input wave header
_waveHeaderIn[n].lpData = reinterpret_cast<LPSTR>(&_recBuffer[n]);
_waveHeaderIn[n].dwBufferLength = nBytesPerSample * REC_BUF_SIZE_IN_SAMPLES;
_waveHeaderIn[n].dwFlags = 0;
_waveHeaderIn[n].dwBytesRecorded = 0;
_waveHeaderIn[n].dwUser = 0;
memset(_recBuffer[n], 0, nBytesPerSample * REC_BUF_SIZE_IN_SAMPLES);
// prepare a buffer for waveform-audio input
res = waveInPrepareHeader(_hWaveIn, &_waveHeaderIn[n], sizeof(WAVEHDR));
...
// send an input buffer to the given waveform-audio input device
res = waveInAddBuffer(_hWaveIn, &_waveHeaderIn[n], sizeof(WAVEHDR));
...
}
// start input on the given waveform-audio input device
res = waveInStart(_hWaveIn);
...
}
int32_t AudioDeviceWindowsWave::RecProc(LONGLONG& consumedTime)
{
...
bufCount = _recBufCount;
// take mono/stereo mode into account when deriving size of a full buffer
const uint16_t bytesPerSample = 2*_recChannels;
const uint32_t fullBufferSizeInBytes = bytesPerSample * REC_BUF_SIZE_IN_SAMPLES;
// read number of recorded bytes for the given input-buffer
nBytesRecorded = _waveHeaderIn[bufCount].dwBytesRecorded;
if (nBytesRecorded == fullBufferSizeInBytes ||
(nBytesRecorded > 0))
{
...
uint32_t nSamplesRecorded = (nBytesRecorded/bytesPerSample); // divide by 2 or 4 depending on mono or stereo
...
// store the recorded buffer (no action will be taken if the #recorded samples is not a full buffer)
_ptrAudioBuffer->SetRecordedBuffer(_waveHeaderIn[bufCount].lpData, nSamplesRecorded);
...
if (send)
{
...
// deliver recorded samples at specified sample rate, mic level etc. to the observer using callback
UnLock();
_ptrAudioBuffer->DeliverRecordedData();
Lock();
...
}
...
// increase main buffer count since one complete buffer has now been delivered
_recBufCount++;
...
} // if ((nBytesRecorded == fullBufferSizeInBytes))
return nBytesRecorded;
}
int AudioDeviceWindowsWave::PlayProc(LONGLONG& consumedTime)
{
...
// Get number of ms of sound that remains in the sound card buffer for playback.
//
remTimeMS = GetPlayoutBufferDelay(writtenSamples, playedSamples);
// The threshold can be adaptive or fixed. The adaptive scheme is updated
// also for fixed mode but the updated threshold is not utilized.
//
const uint16_t thresholdMS =
(_playBufType == AudioDeviceModule::kAdaptiveBufferSize) ? _playBufDelay : _playBufDelayFixed;
if (remTimeMS < thresholdMS + 9)
{
...
// Ask for new PCM data to be played out using the AudioDeviceBuffer.
// Ensure that this callback is executed without taking the audio-thread lock.
//
UnLock();
uint32_t nSamples = _ptrAudioBuffer->RequestPlayoutData(PLAY_BUF_SIZE_IN_SAMPLES);
Lock();
...
nSamples = _ptrAudioBuffer->GetPlayoutData(playBuffer);
...
Write(playBuffer, PLAY_BUF_SIZE_IN_SAMPLES);
}
...
return (0);
}
InitRecording
我們再看一下 InitRecording 函數都做了哪些事兒:
- 設置音頻採集格式。
- 打開音頻設備。
- 獲得設備能力。
int32_t AudioDeviceWindowsWave::InitRecording()
{
...
// Initialize the microphone (devices might have been added or removed)
if (InitMicrophone() == -1)
{
HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "InitMicrophone() failed");
}
...
// Set the input wave format
//
WAVEFORMATEX waveFormat;
waveFormat.wFormatTag = WAVE_FORMAT_PCM;
waveFormat.nChannels = _recChannels; // mono <=> 1, stereo <=> 2
waveFormat.nSamplesPerSec = N_REC_SAMPLES_PER_SEC;
waveFormat.wBitsPerSample = 16;
waveFormat.nBlockAlign = waveFormat.nChannels * (waveFormat.wBitsPerSample/8);
waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec * waveFormat.nBlockAlign;
waveFormat.cbSize = 0;
// Open the given waveform-audio input device for recording
//
HWAVEIN hWaveIn(NULL);
...
// verify settings first
res = waveInOpen(NULL, _inputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL | WAVE_FORMAT_QUERY);
if (MMSYSERR_NOERROR == res)
{
// open the given waveform-audio input device for recording
res = waveInOpen(&hWaveIn, _inputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL);
HJAV_TRACE(kTraceInfo, kTraceAudioDevice, _id, "opening input device corresponding to device ID %u", _inputDeviceIndex);
}
...
// Log information about the aquired input device
//
WAVEINCAPS caps;
res = waveInGetDevCaps((UINT_PTR)hWaveIn, &caps, sizeof(WAVEINCAPS));
...
UINT deviceID(0);
res = waveInGetID(hWaveIn, &deviceID);
...
return 0;
}
InitPlayout
InitPlayout 函數做了以下幾件事兒:
- 首先,先初始化 speaker 。
- 遍歷所有的播放設備。
- 設備播放音頻參數。
- 打開 Wave 設備。
- 循環清除音頻播放設備的緩衝區。
int32_t AudioDeviceWindowsWave::InitPlayout()
{
CriticalSectionScoped lock(&_critSect);
...
// Initialize the speaker (devices might have been added or removed)
if (InitSpeaker() == -1)
...
// Enumerate all availiable output devices
EnumeratePlayoutDevices();
...
// Set the output wave format
//
WAVEFORMATEX waveFormat;
waveFormat.wFormatTag = WAVE_FORMAT_PCM;
waveFormat.nChannels = _playChannels; // mono <=> 1, stereo <=> 2
waveFormat.nSamplesPerSec = N_PLAY_SAMPLES_PER_SEC;
waveFormat.wBitsPerSample = 16;
waveFormat.nBlockAlign = waveFormat.nChannels * (waveFormat.wBitsPerSample/8);
waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec * waveFormat.nBlockAlign;
waveFormat.cbSize = 0;
// Open the given waveform-audio output device for playout
//
HWAVEOUT hWaveOut(NULL);
...
// verify settings first
res = waveOutOpen(NULL, _outputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL | WAVE_FORMAT_QUERY);
if (MMSYSERR_NOERROR == res)
{
// open the given waveform-audio output device for recording
res = waveOutOpen(&hWaveOut, _outputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL);
...
}
...
// Log information about the aquired output device
//
WAVEOUTCAPS caps;
res = waveOutGetDevCaps((UINT_PTR)hWaveOut, &caps, sizeof(WAVEOUTCAPS));
...
UINT deviceID(0);
res = waveOutGetID(hWaveOut, &deviceID);
...
// Store valid handle for the open waveform-audio output device
_hWaveOut = hWaveOut;
// Store the input wave header as well
_waveFormatOut = waveFormat;
// Prepare wave-out headers
//
const uint8_t bytesPerSample = 2*_playChannels;
for (int n = 0; n < N_BUFFERS_OUT; n++)
{
// set up the output wave header
_waveHeaderOut[n].lpData = reinterpret_cast<LPSTR>(&_playBuffer[n]);
_waveHeaderOut[n].dwBufferLength = bytesPerSample*PLAY_BUF_SIZE_IN_SAMPLES;
_waveHeaderOut[n].dwFlags = 0;
_waveHeaderOut[n].dwLoops = 0;
memset(_playBuffer[n], 0, bytesPerSample*PLAY_BUF_SIZE_IN_SAMPLES);
// The waveOutPrepareHeader function prepares a waveform-audio data block for playback.
// The lpData, dwBufferLength, and dwFlags members of the WAVEHDR structure must be set
// before calling this function.
//
res = waveOutPrepareHeader(_hWaveOut, &_waveHeaderOut[n], sizeof(WAVEHDR));
...
}
...
return 0;
}