CloudRTC Windows端音頻設備設置

前言

之前我已經向大家介紹了 CloudRTC中的 iOS 端,Mac 端是如何使用和初始化音頻設備的,今天我們就是看一看Windows端又是如何做的。

在Windows下對音頻的採集分爲兩種方式,一種是CoreAudio,只有在Vista之後的版本中纔可以用它;另一種是Wave方式,這是一種比較老的音頻採集方法。

下面會對這兩種方法都做一下介紹。

使用哪種方式進行音頻採集?

在講解之前,我們先來回顧一下CloudRTC的音頻處理流程,其時序圖如下:


在上圖可以看出,在音頻設備初始化時,會調用AudioDeviceModuleImpl類中的 CreatePlatformSpecificObjects方法,在該方法中會根據不同的操作系統平臺創建不同的設備對象。如 iOS,就會創建 AudioDeviceIOS對象等等。

對於Windows來說,還會盡一步細分。如果是 Vista之後的Windows版本,它會創建 AudioDeviceWindowsCore 對象;而 Vista 之前的操作系統,則會創建 AudioDeviceWindowsWave對象。

它是如何區分是Vista之後的Windows版本還是之前的版本的呢? 是通過下面這個函數 AudioDeviceWindowsCore::CoreAudioIsSupported() 來判斷的。

我們來看一下它的大體過程:

int32_t AudioDeviceModuleImpl::CreatePlatformSpecificObjects()
{
  ...

#if defined(HJAV_WINDOWS_CORE_AUDIO_BUILD)
    ...

        if (AudioDeviceWindowsCore::CoreAudioIsSupported())
        {
            // create *Windows Core Audio* implementation
            ptrAudioDevice = new AudioDeviceWindowsCore(Id());
            ...
        }
        else
        {
            // create *Windows Wave Audio* implementation
            ptrAudioDevice = new AudioDeviceWindowsWave(Id());
            ...
        }

    ...
}

從上面的代碼,我們可以看出CloudRTC會通過 CoreAudioIsSupported 函數的返回值來判定是使用 AudioDeviceWindowsCore 採集、播放音頻,還是使用 AudioDeviceWindowsWave處理音頻。

OK,瞭解了上面的邏輯後,我們再來詳細的看一下 CoreAudioIsSupported是如何判斷是否支持 CoreAudio的。

基本知識

在看 CoreAudioIsSupported 之前,我們還要了解以下一些基本知識:
MMDevice(Windows Multimedia Device)API能夠找到音頻終端設備,並決定它們的能力以及爲這些設備創建驅動實例 。

在 Mmdeviceapi.h 頭文件中,定義了 MMDevice API 相關接口。MMDevice API 由幾個接口組成。這些接口中的第一個是 IMMDeviceEnumerator 。通過調用CoCreateInstance函數,IAudioClient能夠遍歷設備對象的IMMDeviceEnumerator 接口的引用。

通過 IMMDeviceEnumerator 接口,IAudioClient又能獲得在MMDevice API中其它接口的引用。MMDevice API實現了下面的接口:

  • IMMDevice:代表一個音頻設備。
  • IMMDeviceCollection: 代表音頻設備集。
  • IMMDeviceEnumerator: 提供了遍歷音頻設備的方法。
  • IMMEndpoint: 代表一個音頻終端設備。

CoreAudioIsSupported函數

CoreAudioIsSupported 大體上做了以下幾件事兒:

  1. 通過系統函數 VerifyVersionInfo 獲取當前系統的版本,只有 Vista 及之後的版本纔可以用 CoreAudio。
  2. 初始化COM組件,並使用 MTA模式,也就是多線程模式。
  3. 檢查MMDevice API是否有效。
  4. 驗證我們是否可以創建並初始化 Core Audio 類。

下面我們看一下具體的代碼邏輯:

bool AudioDeviceWindowsCore::CoreAudioIsSupported()
{
    ...

    // 1) Check if Windows version is Vista SP1 or later.
    //
    // CoreAudio is only available on Vista SP1 and later.
    //
    OSVERSIONINFOEX osvi;
    DWORDLONG dwlConditionMask = 0;
    int op = VER_LESS_EQUAL;

    // Initialize the OSVERSIONINFOEX structure.
    ZeroMemory(&osvi, sizeof(OSVERSIONINFOEX));
    osvi.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX);
    osvi.dwMajorVersion = 6;
    osvi.dwMinorVersion = 0;
    osvi.wServicePackMajor = 0;
    osvi.wServicePackMinor = 0;
    osvi.wProductType = VER_NT_WORKSTATION;

    // Initialize the condition mask.
    VER_SET_CONDITION(dwlConditionMask, VER_MAJORVERSION, op);
    VER_SET_CONDITION(dwlConditionMask, VER_MINORVERSION, op);
    VER_SET_CONDITION(dwlConditionMask, VER_SERVICEPACKMAJOR, op);
    VER_SET_CONDITION(dwlConditionMask, VER_SERVICEPACKMINOR, op);
    VER_SET_CONDITION(dwlConditionMask, VER_PRODUCT_TYPE, VER_EQUAL);

    DWORD dwTypeMask = VER_MAJORVERSION | VER_MINORVERSION |
                       VER_SERVICEPACKMAJOR | VER_SERVICEPACKMINOR |
                       VER_PRODUCT_TYPE;

    // Perform the test.
    BOOL isVistaRTMorXP = VerifyVersionInfo(&osvi, dwTypeMask,
                                            dwlConditionMask);
    ...

    // 2) Initializes the COM library for use by the calling thread.

    // The COM init wrapper sets the thread's concurrency model to MTA,
    // and creates a new apartment for the thread if one is required. The
    // wrapper also ensures that each call to CoInitializeEx is balanced
    // by a corresponding call to CoUninitialize.
    //
    ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
    if (!comInit.succeeded()) {
      // Things will work even if an STA thread is calling this method but we
      // want to ensure that MTA is used and therefore return false here.
      return false;
    }

    // 3) Check if the MMDevice API is available.
    IMMDeviceEnumerator* pIMMD(NULL);
    const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
    const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);

    hr = CoCreateInstance(
            CLSID_MMDeviceEnumerator,   // GUID value of MMDeviceEnumerator coclass
            NULL,
            CLSCTX_ALL,
            IID_IMMDeviceEnumerator,    // GUID value of the IMMDeviceEnumerator interface
            (void**)&pIMMD );

    ...
    // 4) Verify that we can create and initialize our Core Audio class.
    //
    // Also, perform a limited "API test" to ensure that Core Audio is supported for all devices.
    //
    if (MMDeviceIsAvailable)
    {
        coreAudioIsSupported = false;

        AudioDeviceWindowsCore* p = new AudioDeviceWindowsCore(-1);
        ...

        ok |= p->Init();

        int16_t numDevsRec = p->RecordingDevices();
        for (uint16_t i = 0; i < numDevsRec; i++)
        {
           ...
        }

        int16_t numDevsPlay = p->PlayoutDevices();
        for (uint16_t i = 0; i < numDevsPlay; i++)
        {
           ...
        }
    }

    ...

    return (coreAudioIsSupported);
}

幾個重要函數

與iOS, Mac是一樣的,Windows的音視頻也有兩個重要函數, InitRecording 和 InitPlayout。下面我們就對 CoreAudio 和 Wave 這兩個不同的採集、播放的方式做一下介紹:

CoreAudio

首先,我們來看一下 CoreAudio 是如何採集、播放音視頻的。

InitRecording

先來看一下 InitRecording 都做了哪些事兒:

  • 初始化Microphone。
  • 使用MMDevice創建IAudioClient COM對象。
  • 使用 IAudioClient 獲取音頻格式(採樣率、通道數、採樣大小)。
  • 根據用戶自己設置的音頻格式,初始化 IAudioClient。
  • 獲取IAudioClient的緩衝區大小,用於存放數據。
  • 設置事件,當音頻緩衝區準備好後,觸發該事件。
  • 通過 IAudioClient 獲取 IAudioCaptureClient 接口。該接口會在 DoCaptureThread 線程中使用。
int32_t AudioDeviceWindowsCore::InitRecording()
{
#ifdef ASYNC_INIT_RECORDING
    ...

    HANDLE recInitThread = CreateThread(NULL,
                               0,
                               WSAPICaptureInitThread,
                               this,
                               0,
                               NULL);
    ...
    // Set thread priority to highest possible
    SetThreadPriority(recInitThread, THREAD_PRIORITY_TIME_CRITICAL);
    ...
}

DWORD WINAPI AudioDeviceWindowsCore::WSAPICaptureInitThread(LPVOID context)
{
    return reinterpret_cast<AudioDeviceWindowsCore*>(context)->
        DoCaptureInitThread();
}

DWORD AudioDeviceWindowsCore::DoCaptureInitThread()
{
    // Initialize COM as MTA in this thread.
    ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);

    InitRecording_DoIt();

    ...
}

int32_t AudioDeviceWindowsCore::InitRecording_DoIt()
{
    CriticalSectionScoped lock(&_critSect);
    ...
    // Initialize the microphone (devices might have been added or removed)
    if (InitMicrophone() == -1)
    ...

    if (_builtInAudioProcessEnabled)
    {
        // The DMO will configure the capture device.
        return InitRecordingDMO();
    }

    HRESULT hr = S_OK;
    WAVEFORMATEX* pWfxIn = NULL;
    WAVEFORMATEX Wfx = WAVEFORMATEX();
    WAVEFORMATEX* pWfxClosestMatch = NULL;

    // Create COM object with IAudioClient interface.
    SAFE_RELEASE(_ptrClientIn);
    hr = _ptrDeviceIn->Activate(
                          __uuidof(IAudioClient),
                          CLSCTX_ALL,
                          NULL,
                          (void**)&_ptrClientIn);
    EXIT_ON_ERROR(hr);

    // Retrieve the stream format that the audio engine uses for its internal
    // processing (mixing) of shared-mode streams.
    hr = _ptrClientIn->GetMixFormat(&pWfxIn);
    ...

    // Set wave format
    Wfx.wFormatTag = WAVE_FORMAT_PCM;
    Wfx.wBitsPerSample = 16;
    Wfx.cbSize = 0;

    const int freqs[6] = {48000, 44100, 16000, 96000, 32000, 8000};
    hr = S_FALSE;

    // Iterate over frequencies and channels, in order of priority
    for (int freq = 0; freq < sizeof(freqs)/sizeof(freqs[0]); freq++)
    {
        for (int chan = 0; chan < sizeof(_recChannelsPrioList)/sizeof(_recChannelsPrioList[0]); chan++)
        {
            Wfx.nChannels = _recChannelsPrioList[chan];
            Wfx.nSamplesPerSec = freqs[freq];
            Wfx.nBlockAlign = Wfx.nChannels * Wfx.wBitsPerSample / 8;
            Wfx.nAvgBytesPerSec = Wfx.nSamplesPerSec * Wfx.nBlockAlign;
            // If the method succeeds and the audio endpoint device supports the specified stream format,
            // it returns S_OK. If the method succeeds and provides a closest match to the specified format,
            // it returns S_FALSE.
            hr = _ptrClientIn->IsFormatSupported(
                                  AUDCLNT_SHAREMODE_SHARED,
                                  &Wfx,
                                  &pWfxClosestMatch);
            ...
    }

    if (hr == S_OK)
    {
        _recAudioFrameSize = Wfx.nBlockAlign;
        _recSampleRate = Wfx.nSamplesPerSec;
        _recBlockSize = Wfx.nSamplesPerSec/100;
        _recChannels = Wfx.nChannels;
        ...
    }

    // Create a capturing stream.
    hr = _ptrClientIn->Initialize(
                          AUDCLNT_SHAREMODE_SHARED,             // share Audio Engine with other applications
                          AUDCLNT_STREAMFLAGS_EVENTCALLBACK |   // processing of the audio buffer by the client will be event driven
                          AUDCLNT_STREAMFLAGS_NOPERSIST,        // volume and mute settings for an audio session will not persist across system restarts
                          0,                                    // required for event-driven shared mode
                          0,                                    // periodicity
                          &Wfx,                                 // selected wave format
                          NULL);                                // session GUID
    ...

    // Get the actual size of the shared (endpoint buffer).
    // Typical value is 960 audio frames <=> 20ms @ 48kHz sample rate.
    UINT bufferFrameCount(0);
    hr = _ptrClientIn->GetBufferSize(
                          &bufferFrameCount);
    ...

    // Set the event handle that the system signals when an audio buffer is ready
    // to be processed by the client.
    hr = _ptrClientIn->SetEventHandle(
                          _hCaptureSamplesReadyEvent);
    EXIT_ON_ERROR(hr);

    // Get an IAudioCaptureClient interface.
    SAFE_RELEASE(_ptrCaptureClient);
    hr = _ptrClientIn->GetService(
                          __uuidof(IAudioCaptureClient),
                          (void**)&_ptrCaptureClient);
    EXIT_ON_ERROR(hr);

    // Mark capture side as initialized
    _recIsInitialized = true;

    CoTaskMemFree(pWfxIn);
    CoTaskMemFree(pWfxClosestMatch);

    ...
    return -1;
}

從上面的代碼中可以看到在 InitRecording_DoIt 函數中設置了音頻參數,並設置了 _hCaptureSamplesReadyEvent事件。也就是當 audio buffer準備好後,音頻設備就會發送_hCaptureSamplesReadyEvent事件,而另外一個線程 DoCaptureThread 在收到這個事件後,進行數據的拷貝,從設備buffer中拷到用戶的 buffer 中。

下面我們就來看一下 DoCaptureThread 都做了哪些事兒?

DoCaptureThread

在DoCaptureThread線程中做了以下幾件事兒:

  • 初始化 COM 組件。
  • 通過 IAudioClient獲取Capture緩衝區的大小。
  • 在用戶空間創建一個緩衝區。
  • 啓動 IAudioClient,開始捕獲音頻數據。
  • 等待 _hCaptureSamplesReadyEvent 事件,當收到該事件後,就調用CopyMemory將數據從設備緩衝區拷到用戶緩衝區。
  • 之後通過 DeliverRecordedData 將用戶緩衝區的數據傳輸出去。

具體代碼如下

DWORD AudioDeviceWindowsCore::DoCaptureThread()
{
    ...
    // Initialize COM as MTA in this thread.
    ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
    ...
    hr = InitCaptureThreadPriority();
    ...
    _Lock();
    ...

    // Get size of capturing buffer (length is expressed as the number of audio frames the buffer can hold).
    // This value is fixed during the capturing session.
    UINT32 bufferLength = 0;
    hr = _ptrClientIn->GetBufferSize(&bufferLength);
    ...

    // Allocate memory for sync buffer.
    // It is used for compensation between native 44.1 and internal 44.0 and
    // for cases when the capture buffer is larger than 10ms.
    const UINT32 syncBufferSize = 2*(bufferLength * _recAudioFrameSize);
    syncBuffer = new BYTE[syncBufferSize];
    ...

    // Start up the capturing stream.
    hr = _ptrClientIn->Start();
    ...

    _UnLock();

    // Set event which will ensure that the calling thread modifies the recording state to true
    SetEvent(_hCaptureStartedEvent);

    while (keepRecording)
    {
        // Wait for a capture notification event or a shutdown event
        DWORD waitResult = WaitForMultipleObjects(2, waitArray, FALSE, 10000);
        switch (waitResult)
        {
        case WAIT_OBJECT_0 + 0:        // _hShutdownCaptureEvent
            keepRecording = false;
            break;
        case WAIT_OBJECT_0 + 1:        // _hCaptureSamplesReadyEvent
            break;
        case WAIT_TIMEOUT:            // timeout notification
            HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "capture event timed out after 10 seconds");
            goto Exit;
        default:                    // unexpected error
            HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "unknown wait termination on capture side");
            goto Exit;
        }

        while (keepRecording)
        {
            ...

            _Lock();

            ...

            //  Find out how much capture data is available
            //
            hr = _ptrCaptureClient->GetBuffer(&pData,           // packet which is ready to be read by used
                                              &framesAvailable, // #frames in the captured packet (can be zero)
                                              &flags,           // support flags (check)
                                              &recPos,          // device position of first audio frame in data packet
                                              &recTime);        // value of performance counter at the time of recording the first audio frame

            if (SUCCEEDED(hr))
            {
               ...

                if (pData)
                {
                    CopyMemory(&syncBuffer[syncBufIndex*_recAudioFrameSize], pData, framesAvailable*_recAudioFrameSize);
                }
                else
                {
                    ZeroMemory(&syncBuffer[syncBufIndex*_recAudioFrameSize], framesAvailable*_recAudioFrameSize);
                }
                ...

                while (syncBufIndex >= _recBlockSize)
                {
                    if (_ptrAudioBuffer)
                    {
                        ...

                        _UnLock();  // release lock while making the callback
                        _ptrAudioBuffer->DeliverRecordedData();
                        _Lock();    // restore the lock

                        ...
                    }
                    ...
                }

                ...
            }
            ...

            _UnLock();
        }
    }
    hr = _ptrClientIn->Stop();
    ...
}

通過上面的代碼我們已經可以看的很清楚了。

InitPlayout

下面我們再來看一下 InitPlayout 這個函數都做了哪些事兒:

  • 初始化 speaker。
  • 通過 MMDevice 創建 IAudioClient 接口的 COM對象。
  • 通過 IAudioClient獲得當前音頻格式(採樣率,採樣大小,通道數)
  • 設置用戶設定的音頻參數,並初始化IAudioClient 。
  • 獲得設備終端的緩衝區大小。
  • 設備事件,當 audio buffer 準備好後,觸發該事件。
  • 通過 IAudioClient 獲得 IAudioRenderClient接口,該接口會在音頻渲染線程中使用。

下面是具體的代碼:

int32_t AudioDeviceWindowsCore::InitPlayout()
{

    CriticalSectionScoped lock(&_critSect);

    ...

    // Initialize the speaker (devices might have been added or removed)
    if (InitSpeaker() == -1)
    ...

    HRESULT hr = S_OK;
    WAVEFORMATEX* pWfxOut = NULL;
    WAVEFORMATEX Wfx = WAVEFORMATEX();
    WAVEFORMATEX* pWfxClosestMatch = NULL;

    // Create COM object with IAudioClient interface.
    SAFE_RELEASE(_ptrClientOut);
    hr = _ptrDeviceOut->Activate(
                          __uuidof(IAudioClient),
                          CLSCTX_ALL,
                          NULL,
                          (void**)&_ptrClientOut);
    EXIT_ON_ERROR(hr);

    // Retrieve the stream format that the audio engine uses for its internal
    // processing (mixing) of shared-mode streams.
    hr = _ptrClientOut->GetMixFormat(&pWfxOut);
    ...

    // Set wave format
    Wfx.wFormatTag = WAVE_FORMAT_PCM;
    Wfx.wBitsPerSample = 16;
    Wfx.cbSize = 0;

    const int freqs[] = {48000, 44100, 16000, 96000, 32000, 8000};
    hr = S_FALSE;

    // Iterate over frequencies and channels, in order of priority
    for (int freq = 0; freq < sizeof(freqs)/sizeof(freqs[0]); freq++)
    {
        for (int chan = 0; chan < sizeof(_playChannelsPrioList)/sizeof(_playChannelsPrioList[0]); chan++)
        {
            Wfx.nChannels = _playChannelsPrioList[chan];
            Wfx.nSamplesPerSec = freqs[freq];
            Wfx.nBlockAlign = Wfx.nChannels * Wfx.wBitsPerSample / 8;
            Wfx.nAvgBytesPerSec = Wfx.nSamplesPerSec * Wfx.nBlockAlign;
            // If the method succeeds and the audio endpoint device supports the specified stream format,
            // it returns S_OK. If the method succeeds and provides a closest match to the specified format,
            // it returns S_FALSE.
            hr = _ptrClientOut->IsFormatSupported(
                                  AUDCLNT_SHAREMODE_SHARED,
                                  &Wfx,
                                  &pWfxClosestMatch);
            ...
    }

    // TODO(andrew): what happens in the event of failure in the above loop?
    //   Is _ptrClientOut->Initialize expected to fail?
    //   Same in InitRecording().
    if (hr == S_OK)
    {
        _playAudioFrameSize = Wfx.nBlockAlign;
        _playBlockSize = Wfx.nSamplesPerSec/100;
        _playSampleRate = Wfx.nSamplesPerSec;
        _devicePlaySampleRate = Wfx.nSamplesPerSec; // The device itself continues to run at 44.1 kHz.
        _devicePlayBlockSize = Wfx.nSamplesPerSec/100;
        _playChannels = Wfx.nChannels;

        ...
    }

    // Create a rendering stream.
    //
    // ****************************************************************************
    // For a shared-mode stream that uses event-driven buffering, the caller must
    // set both hnsPeriodicity and hnsBufferDuration to 0. The Initialize method
    // determines how large a buffer to allocate based on the scheduling period
    // of the audio engine. Although the client's buffer processing thread is
    // event driven, the basic buffer management process, as described previously,
    // is unaltered.
    // Each time the thread awakens, it should call IAudioClient::GetCurrentPadding
    // to determine how much data to write to a rendering buffer or read from a capture
    // buffer. In contrast to the two buffers that the Initialize method allocates
    // for an exclusive-mode stream that uses event-driven buffering, a shared-mode
    // stream requires a single buffer.
    // ****************************************************************************
    //
    REFERENCE_TIME hnsBufferDuration = 0;  // ask for minimum buffer size (default)
    if (_devicePlaySampleRate == 44100)
    {
        // Ask for a larger buffer size (30ms) when using 44.1kHz as render rate.
        // There seems to be a larger risk of underruns for 44.1 compared
        // with the default rate (48kHz). When using default, we set the requested
        // buffer duration to 0, which sets the buffer to the minimum size
        // required by the engine thread. The actual buffer size can then be
        // read by GetBufferSize() and it is 20ms on most machines.
        hnsBufferDuration = 30*10000;
    }
    hr = _ptrClientOut->Initialize(
                          AUDCLNT_SHAREMODE_SHARED,             // share Audio Engine with other applications
                          AUDCLNT_STREAMFLAGS_EVENTCALLBACK,    // processing of the audio buffer by the client will be event driven
                          hnsBufferDuration,                    // requested buffer capacity as a time value (in 100-nanosecond units)
                          0,                                    // periodicity
                          &Wfx,                                 // selected wave format
                          NULL);                                // session GUID

    ...

    if (_ptrAudioBuffer)
    {
        // Update the audio buffer with the selected parameters
        _ptrAudioBuffer->SetPlayoutSampleRate(_playSampleRate);
        _ptrAudioBuffer->SetPlayoutChannels((uint8_t)_playChannels);
    }
    else
    {
        // We can enter this state during CoreAudioIsSupported() when no AudioDeviceImplementation
        // has been created, hence the AudioDeviceBuffer does not exist.
        // It is OK to end up here since we don't initiate any media in CoreAudioIsSupported().
        HJAV_TRACE(kTraceInfo, kTraceAudioDevice, _id, "AudioDeviceBuffer must be attached before streaming can start");
    }

    // Get the actual size of the shared (endpoint buffer).
    // Typical value is 960 audio frames <=> 20ms @ 48kHz sample rate.
    UINT bufferFrameCount(0);
    hr = _ptrClientOut->GetBufferSize(
                          &bufferFrameCount);
    ...

    // Set the event handle that the system signals when an audio buffer is ready
    // to be processed by the client.
    hr = _ptrClientOut->SetEventHandle
                          _hRenderSamplesReadyEvent);
    EXIT_ON_ERROR(hr);

    // Get an IAudioRenderClient interface.
    SAFE_RELEASE(_ptrRenderClient);
    hr = _ptrClientOut->GetService(
                          __uuidof(IAudioRenderClient),
                          (void**)&_ptrRenderClient);
    EXIT_ON_ERROR(hr);

    // Mark playout side as initialized
    _playIsInitialized = true;

    CoTaskMemFree(pWfxOut);
    CoTaskMemFree(pWfxClosestMatch);

    ...
}

通過上面的代碼,可以知道 InitPlayout函數裏做了哪些事兒。下面我們再看一下渲染線程的具體工作:

DoRenderThread

在 DoRenderThread 線程中會做以下事情:

  • 設置COM的線程模型爲 MTA。
  • 通過 IAudioClient獲得緩衝區的大小。
  • 然後通過 IAudioRenderClient 獲取音頻設備緩衝區地址。
  • 啓動 IAudioClient。
  • 當收到 _hRenderSamplesReadyEvent事件後,從AudioBuffer中獲音頻數據到設備緩衝區。
  • 最終通過音頻設備將聲音播放出來
DWORD AudioDeviceWindowsCore::DoRenderThread()
{
    ...

    // Initialize COM as MTA in this thread.
    ScopedCOMInitializer comInit(ScopedCOMInitializer::kMTA);
    ...
    _SetThreadName(0, "hjav_core_audio_render_thread");
    ...
    _Lock();
    ...

    // Get size of rendering buffer (length is expressed as the number of audio frames the buffer can hold).
    // This value is fixed during the rendering session.
    //
    UINT32 bufferLength = 0;
    hr = _ptrClientOut->GetBufferSize(&bufferLength);
    ...
    // Before starting the stream, fill the rendering buffer with silence.
    //
    BYTE *pData = NULL;
    hr = _ptrRenderClient->GetBuffer(bufferLength, &pData);
    EXIT_ON_ERROR(hr);

    ...
    // Start up the rendering audio stream.
    hr = _ptrClientOut->Start();
    
    _UnLock();

    // Set event which will ensure that the calling thread modifies the playing state to true.
    //
    SetEvent(_hRenderStartedEvent);
    while (keepPlaying)
    {
        // Wait for a render notification event or a shutdown event
        DWORD waitResult = WaitForMultipleObjects(2, waitArray, FALSE, 500);
        switch (waitResult)
        {
        case WAIT_OBJECT_0 + 0:     // _hShutdownRenderEvent
            keepPlaying = false;
            break;
        case WAIT_OBJECT_0 + 1:     // _hRenderSamplesReadyEvent
            break;
        case WAIT_TIMEOUT:          // timeout notification
            HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "render event timed out after 0.5 seconds");
            goto Exit;
            continue;
        default:                    // unexpected error
            HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "unknown wait termination on render side");
            goto Exit;
        }

        while (keepPlaying)
        {
            _Lock();

            ...

            // Write n*10ms buffers to the render buffer
            const uint32_t n10msBuffers = (framesAvailable / _playBlockSize);
            for (uint32_t n = 0; n < n10msBuffers; n++)
            {
                // Get pointer (i.e., grab the buffer) to next space in the shared render buffer.
                hr = _ptrRenderClient->GetBuffer(_playBlockSize, &pData);
                ...

                if (_ptrAudioBuffer)
                {
                    // Request data to be played out (#bytes = _playBlockSize*_audioFrameSize)
                    _UnLock();
                    int32_t nSamples = _ptrAudioBuffer->RequestPlayoutData(_playBlockSize);
                    _Lock();

                    ...

                    // Get the actual (stored) data
                    nSamples = _ptrAudioBuffer->GetPlayoutData((int8_t*)pData);
                }

                ...
            }

            ...
            _UnLock();
        }
    }

    ...
}

OK,至此我已經將 CoreAudio向大家介紹完了,下面我們來看一下 Wave是如何做的。

Wave

接下來我們看一下 Wave 是如何進行音頻採集與播放的。使用 Wave進行音頻採集主要涉及三個重要函數:

  • Init
  • InitRecording
  • InitPlay

Init

首先,我們看一下 Init函數都做了哪些事兒:

  • 在 Init 函數中啓動了幾個線程,其中最關鍵的是 ThreadFunc 線程。在該線程中主要做以下幾件事兒:
    • 調用 PrepareStartPlayout 函數:將音頻設備緩衝區清空,共 30ms 的數據。
    • 調用 PrepareStartRecording 函數:它會循環創建 buffer,並將這些 buffer添加到音頻設備中。然後啓動 Wave錄製設備。
    • 調用 PlayProc函數:該函數從AudioBuffer獲得數據,並將數據寫到音頻設備緩衝區進行播放。
    • 循環調用 RecProc 函數:它將音頻設備的數據拷貝到用戶緩衝我,並最終傳輸出去。
  • 啓動 GetCaptureVolumeThread 線程。
  • 啓動 SetCaptureVolumeThread 線程。
int32_t AudioDeviceWindowsWave::Init()
{

    CriticalSectionScoped lock(&_critSect);

    ...
    const char* threadName = "hjav_audio_module_thread";
    _ptrThread = ThreadWrapper::CreateThread(ThreadFunc,
                                             this,
                                             threadName);
    ...
    _ptrThread->SetPriority(kRealtimePriority);
    _threadID = _ptrThread->GetThreadId();

    ...

    _hGetCaptureVolumeThread = CreateThread(NULL,
                                            0,
                                            GetCaptureVolumeThread,
                                            this,
                                            0,
                                            NULL);
    ...
    SetThreadPriority(_hGetCaptureVolumeThread, THREAD_PRIORITY_NORMAL);

    _hSetCaptureVolumeThread = CreateThread(NULL,
                                            0,
                                            SetCaptureVolumeThread,
                                            this,
                                            0,
                                            NULL);
    ...

    SetThreadPriority(_hSetCaptureVolumeThread, THREAD_PRIORITY_NORMAL);

    ...
}

bool AudioDeviceWindowsWave::ThreadFunc(void* pThis)
{
    return (static_cast<AudioDeviceWindowsWave*>(pThis)->ThreadProcess());
}

bool AudioDeviceWindowsWave::ThreadProcess()
{
    ...

    switch (_timeEvent.Wait(1000))
    {
    case kEventSignaled:
        break;
    case kEventError:
        HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "EventWrapper::Wait() failed => restarting timer");
        _timeEvent.StopTimer();
        _timeEvent.StartTimer(true, TIMER_PERIOD_MS);
        return true;
    case kEventTimeout:
        return true;
    }

    time = AudioDeviceUtility::GetTimeInMS();

    if (_startPlay)
    {
        if (PrepareStartPlayout() == 0)
        {
            ...
        }
    }

    if (_startRec)
    {
        if (PrepareStartRecording() == 0)
        {
            ...
        }
    }
    ...
    if (_playing &&
        (playDiff > (uint32_t)(_dTcheckPlayBufDelay - 1)) ||
        (playDiff < 0))
    {
        Lock();
        if (_playing)
        {
            if (PlayProc(playTime) == -1)
            ...
        }
        UnLock();
    }

    if (_playing && (playDiff > 12))
    {
        // It has been a long time since we were able to play out, try to
        // compensate by calling PlayProc again.
        //
        Lock();
        if (_playing)
        {
            if (PlayProc(playTime))
            ...
        }
        UnLock();
    }

    if (_recording &&
       (recDiff > REC_CHECK_TIME_PERIOD_MS) ||
       (recDiff < 0))
    {
        Lock();
        if (_recording)
        {
            ...

            // Deliver all availiable recorded buffers and update the CPU load measurement.
            // We use a while loop here to compensate for the fact that the multi-media timer
            // can sometimed enter a "bad state" after hibernation where the resolution is
            // reduced from ~1ms to ~10-15 ms.
            //
            while ((nRecordedBytes = RecProc(recTime)) > 0)
            {
                ...
            }
            ...

            // Monitor the recording process and generate error/warning callbacks if needed
            MonitorRecording(time);
        }
        UnLock();
    }
    ...
}

int32_t AudioDeviceWindowsWave::PrepareStartPlayout()
{

    CriticalSectionScoped lock(&_critSect);

    ...
    // A total of 30ms of data is immediately placed in the SC buffer
    //
    int8_t zeroVec[4*PLAY_BUF_SIZE_IN_SAMPLES];  // max allocation
    memset(zeroVec, 0, 4*PLAY_BUF_SIZE_IN_SAMPLES);

    {
        Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
        Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
        Write(zeroVec, PLAY_BUF_SIZE_IN_SAMPLES);
    }

    ...

    return 0;
}

int32_t AudioDeviceWindowsWave::Write(int8_t* data, uint16_t nSamples)
{
    ...
    if (_playIsInitialized)
    {
        ...
        const uint16_t bufCount(_playBufCount);
        ...
        // Send a data block to the given waveform-audio output device.
        //
        // When the buffer is finished, the WHDR_DONE bit is set in the dwFlags
        // member of the WAVEHDR structure. The buffer must be prepared with the
        // waveOutPrepareHeader function before it is passed to waveOutWrite.
        // Unless the device is paused by calling the waveOutPause function,
        // playback begins when the first data block is sent to the device.
        //
        res = waveOutWrite(_hWaveOut, &_waveHeaderOut[bufCount], sizeof(_waveHeaderOut[bufCount]));
        ...
    }

    return 0;
}

int32_t AudioDeviceWindowsWave::PrepareStartRecording()
{

    CriticalSectionScoped lock(&_critSect);

    ...

    res = waveInGetPosition(_hWaveIn, &mmtime, sizeof(mmtime));
    ...

    _read_samples = mmtime.u.sample;
    _read_samples_old = _read_samples;
    _rec_samples_old = mmtime.u.sample;
    _wrapCounter = 0;

    for (int n = 0; n < N_BUFFERS_IN; n++)
    {
        const uint8_t nBytesPerSample = 2*_recChannels;

        // set up the input wave header
        _waveHeaderIn[n].lpData          = reinterpret_cast<LPSTR>(&_recBuffer[n]);
        _waveHeaderIn[n].dwBufferLength  = nBytesPerSample * REC_BUF_SIZE_IN_SAMPLES;
        _waveHeaderIn[n].dwFlags         = 0;
        _waveHeaderIn[n].dwBytesRecorded = 0;
        _waveHeaderIn[n].dwUser          = 0;

        memset(_recBuffer[n], 0, nBytesPerSample * REC_BUF_SIZE_IN_SAMPLES);

        // prepare a buffer for waveform-audio input
        res = waveInPrepareHeader(_hWaveIn, &_waveHeaderIn[n], sizeof(WAVEHDR));
        ...

        // send an input buffer to the given waveform-audio input device
        res = waveInAddBuffer(_hWaveIn, &_waveHeaderIn[n], sizeof(WAVEHDR));
        ...
    }

    // start input on the given waveform-audio input device
    res = waveInStart(_hWaveIn);
    ...    
}

int32_t AudioDeviceWindowsWave::RecProc(LONGLONG& consumedTime)
{
    ...

    bufCount = _recBufCount;

    // take mono/stereo mode into account when deriving size of a full buffer
    const uint16_t bytesPerSample = 2*_recChannels;
    const uint32_t fullBufferSizeInBytes = bytesPerSample * REC_BUF_SIZE_IN_SAMPLES;

    // read number of recorded bytes for the given input-buffer
    nBytesRecorded = _waveHeaderIn[bufCount].dwBytesRecorded;

    if (nBytesRecorded == fullBufferSizeInBytes ||
       (nBytesRecorded > 0))
    {
        ...

        uint32_t nSamplesRecorded = (nBytesRecorded/bytesPerSample);  // divide by 2 or 4 depending on mono or stereo

        ...
        // store the recorded buffer (no action will be taken if the #recorded samples is not a full buffer)
        _ptrAudioBuffer->SetRecordedBuffer(_waveHeaderIn[bufCount].lpData, nSamplesRecorded);

        ...

        if (send)
        {
            ...

            // deliver recorded samples at specified sample rate, mic level etc. to the observer using callback
            UnLock();
            _ptrAudioBuffer->DeliverRecordedData();
            Lock();

           ...
        }

       ...

        // increase main buffer count since one complete buffer has now been delivered
        _recBufCount++;

        ...

    }  // if ((nBytesRecorded == fullBufferSizeInBytes))

    return nBytesRecorded;
}

int AudioDeviceWindowsWave::PlayProc(LONGLONG& consumedTime)
{
    ...

    // Get number of ms of sound that remains in the sound card buffer for playback.
    //
    remTimeMS = GetPlayoutBufferDelay(writtenSamples, playedSamples);

    // The threshold can be adaptive or fixed. The adaptive scheme is updated
    // also for fixed mode but the updated threshold is not utilized.
    //
    const uint16_t thresholdMS =
        (_playBufType == AudioDeviceModule::kAdaptiveBufferSize) ? _playBufDelay : _playBufDelayFixed;

    if (remTimeMS < thresholdMS + 9)
    {
        ...

        // Ask for new PCM data to be played out using the AudioDeviceBuffer.
        // Ensure that this callback is executed without taking the audio-thread lock.
        //
        UnLock();
        uint32_t nSamples = _ptrAudioBuffer->RequestPlayoutData(PLAY_BUF_SIZE_IN_SAMPLES);
        Lock();

        ...

        nSamples = _ptrAudioBuffer->GetPlayoutData(playBuffer);
        ...  

        Write(playBuffer, PLAY_BUF_SIZE_IN_SAMPLES);

    } 
    ...

    return (0);
}

InitRecording

我們再看一下 InitRecording 函數都做了哪些事兒:

  • 設置音頻採集格式。
  • 打開音頻設備。
  • 獲得設備能力。
int32_t AudioDeviceWindowsWave::InitRecording()
{

    ...

    // Initialize the microphone (devices might have been added or removed)
    if (InitMicrophone() == -1)
    {
        HJAV_TRACE(kTraceWarning, kTraceAudioDevice, _id, "InitMicrophone() failed");
    }
    ...
    // Set the input wave format
    //
    WAVEFORMATEX waveFormat;

    waveFormat.wFormatTag      = WAVE_FORMAT_PCM;
    waveFormat.nChannels       = _recChannels;  // mono <=> 1, stereo <=> 2
    waveFormat.nSamplesPerSec  = N_REC_SAMPLES_PER_SEC;
    waveFormat.wBitsPerSample  = 16;
    waveFormat.nBlockAlign     = waveFormat.nChannels * (waveFormat.wBitsPerSample/8);
    waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec * waveFormat.nBlockAlign;
    waveFormat.cbSize          = 0;

    // Open the given waveform-audio input device for recording
    //
    HWAVEIN hWaveIn(NULL);

    ...
        // verify settings first
        res = waveInOpen(NULL, _inputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL | WAVE_FORMAT_QUERY);
        if (MMSYSERR_NOERROR == res)
        {
            // open the given waveform-audio input device for recording
            res = waveInOpen(&hWaveIn, _inputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL);
            HJAV_TRACE(kTraceInfo, kTraceAudioDevice, _id, "opening input device corresponding to device ID %u", _inputDeviceIndex);
        }
   
    ...

    // Log information about the aquired input device
    //
    WAVEINCAPS caps;

    res = waveInGetDevCaps((UINT_PTR)hWaveIn, &caps, sizeof(WAVEINCAPS));
    ...

    UINT deviceID(0);
    res = waveInGetID(hWaveIn, &deviceID);
    ...
    
    return 0;
}

InitPlayout

InitPlayout 函數做了以下幾件事兒:

  • 首先,先初始化 speaker 。
  • 遍歷所有的播放設備。
  • 設備播放音頻參數。
  • 打開 Wave 設備。
  • 循環清除音頻播放設備的緩衝區。
int32_t AudioDeviceWindowsWave::InitPlayout()
{

    CriticalSectionScoped lock(&_critSect);

    ...

    // Initialize the speaker (devices might have been added or removed)
    if (InitSpeaker() == -1)
    ...

    // Enumerate all availiable output devices
    EnumeratePlayoutDevices();
    ...

    // Set the output wave format
    //
    WAVEFORMATEX waveFormat;

    waveFormat.wFormatTag      = WAVE_FORMAT_PCM;
    waveFormat.nChannels       = _playChannels;  // mono <=> 1, stereo <=> 2
    waveFormat.nSamplesPerSec  = N_PLAY_SAMPLES_PER_SEC;
    waveFormat.wBitsPerSample  = 16;
    waveFormat.nBlockAlign     = waveFormat.nChannels * (waveFormat.wBitsPerSample/8);
    waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec * waveFormat.nBlockAlign;
    waveFormat.cbSize          = 0;

    // Open the given waveform-audio output device for playout
    //
    HWAVEOUT hWaveOut(NULL);

    ...
        // verify settings first
        res = waveOutOpen(NULL, _outputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL | WAVE_FORMAT_QUERY);
        if (MMSYSERR_NOERROR == res)
        {
            // open the given waveform-audio output device for recording
            res = waveOutOpen(&hWaveOut, _outputDeviceIndex, &waveFormat, 0, 0, CALLBACK_NULL);
            ...
        }
    ...

    // Log information about the aquired output device
    //
    WAVEOUTCAPS caps;

    res = waveOutGetDevCaps((UINT_PTR)hWaveOut, &caps, sizeof(WAVEOUTCAPS));
    ...

    UINT deviceID(0);
    res = waveOutGetID(hWaveOut, &deviceID);
    ...

    // Store valid handle for the open waveform-audio output device
    _hWaveOut = hWaveOut;

    // Store the input wave header as well
    _waveFormatOut = waveFormat;

    // Prepare wave-out headers
    //
    const uint8_t bytesPerSample = 2*_playChannels;

    for (int n = 0; n < N_BUFFERS_OUT; n++)
    {
        // set up the output wave header
        _waveHeaderOut[n].lpData          = reinterpret_cast<LPSTR>(&_playBuffer[n]);
        _waveHeaderOut[n].dwBufferLength  = bytesPerSample*PLAY_BUF_SIZE_IN_SAMPLES;
        _waveHeaderOut[n].dwFlags         = 0;
        _waveHeaderOut[n].dwLoops         = 0;

        memset(_playBuffer[n], 0, bytesPerSample*PLAY_BUF_SIZE_IN_SAMPLES);

        // The waveOutPrepareHeader function prepares a waveform-audio data block for playback.
        // The lpData, dwBufferLength, and dwFlags members of the WAVEHDR structure must be set
        // before calling this function.
        //
        res = waveOutPrepareHeader(_hWaveOut, &_waveHeaderOut[n], sizeof(WAVEHDR));
        ...
    }

    ...

    return 0;
}

小結

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章