【學習DirectX12】渲染——第一節

Introduction

在DirectX 12裏面,用於描述資源的唯一接口是ID3D12Resource。ID3D12Heap接口允許不同的內存映射技術,用於優化GPU內存。爲了在DX12裏更好地狀態管理,使用pipeline state object (PSO)來描述不同的渲染和計算管線。也就是PSO結合了渲染管線中的input assembly (IA), vertex shader (VS), hull shader (HS), domain shader (DS), geometry shader (GS), stream output (SO), rasterizer stage (RS), pixel shader (PS), and output merger (OM),但是另外一些特性比如viewport和scissor rectangle需要額外的API。另一個特性是Root signature用於描述向可渲染編程管線傳遞參數。比如稍後我們將用Constant buffer作爲root signature來旋轉場景中的cube。

在本章中,我們將學習以下東西

 

  • Uploading Buffer Resources to the GPU
  • Heaps
  • Pipeline State Objects
  • Root signatures

Heaps and Resources

在之前的課程中,我們使用 IDXGIFactory2::CreateSwapChainForHwnd 方法來創建交換鏈,並且一些紋理資源被自動創建了。但在這節課,我們將手動創建一些緩存資源。使用ID3D12Heap來安排GPU資源有幾種方法。

  • Committed Resources
  • Placed Resources
  • Reserved Resources

Committed Resources

使用ID3D12Device::CreateCommittedResource來創建committed resource。這個方法會同時創建資源和implicit heap,後者非常大足以包含整個資源。資源同時也會映射到heap上。Committed resource也是操作最方便的,因爲開發者無需擔心在heap的哪裏放置這個資源。

Committed resource來創建大型資源比如紋理或大小不變的資源時效率很高,同時也可用於在uploda heap理創建大型資源,比如動態頂點或索引緩存,這在渲染UI或上傳每次draw call都要改變的constant buffer時非常有用。

Committed Resources

Placed Resource

placed resource位於一個heap中,但它還有另外一個屬性叫偏移量。不過在創建resource之前需要先用ID3D12Device::CreateHeap來創建heap。然後用ID3D12Device::CreatePlacedResouce來在heap裏創建資源。

Placed-Resource.png

因爲這兒的heap不需要被全局GPU分配,所以這個placed resource可以提供更好的性能,使用這個也需要圖形程序員遵循這個準則。Placed resouces爲增強不同的內存管理提供了更多的選項,但是能力越大,責任也越大。使用placed resouces時,也需要更多的考慮。用於放置placed resouces的heap的大小必須提前知道。如果太大了也沒必要,因爲重用GPU內存的方法只能是evict或完全摧毀它。由於你銷燬heap時只能整個從GPU內存中銷燬,此時這個heap中的任何資源都不能被正在GPU上正在執行的指令集所引用。

由於GPU架構,你可以在特定的heap裏創建的資源種類也是特定的。比如,緩存資源(頂點緩存,索引緩存,常數緩存,結構緩存等)只能存放在使用ALLOW_ONLY_BUFFERS這個標籤所創建的heap裏。而Render Target或depth/stencil資源只能使用ALLOW_ONLY_RT_DS_TEXTURES標籤,反之只能使用ALLOW_ONLY_NOT_RT_DS_TEXTURES而支持 heap tier 2 的適配器只能使用ALLOW_ALL_BUFFERS_AND_TEXTURES來允許任何資源種類來被放置在這個heap裏。但由於heap tier取決於GPU架構,大多數應用程序只能使用heap tier1 支持。

如果多個placed resouces不會同時放在一個aliased heap裏,那他們就可以放在一起。Aliasing可以幫助減少重複申請使用的GPU內存,因爲 the size of the heap can be limited to the size of the largest resource that will be placed in the heap at any moment in time。Aliased 資源可以說使用resource aliasing barrier來交換。

Aliasing-Placed-Resources.png

Reserved Resources

Reserved資源在創建時並未指定要放置的heap。但是在使用ID3D12Device::CreateReservedResource 創建Reserved 資源前,需要使用ID3D12CommandQueue::UpdateTileMappings 方法來把這個映射到heap。部分reserved資源可以被映射到物理GPU內存中的heap。Reserved resources哪怕比單個heap大,也可以放置進去。

如果使用Reserved Resource,可以使用虛擬內存來創建很大的體積紋理,但是隻有這個體積紋理的resident空間可以被映射到物理內存上。這樣一來,使用sparse voxel octrees就可以不爲GPU內存增添額外負擔了。

 Reserved Resources

Pipeline State Object

The Pipeline State Object (PSO) 包含了很多需要的渲染或計算管線。包括如下:

  • Shader bytecode (vertex, pixel, domain, hull, and geometry shaders)
  • Vertex format input layout
  • Primitive topology type (point, line, triangle, or patch)
  • Blend state
  • Rasterizer state
  • Depth-stencil state
  • Number of render targets and render target formats
  • Depth-stencil format
  • Multisample description
  • Stream output buffer description
  • Root signature

 

The pipeline state object 結構包含了信息,如果這些狀態需要在draw call之間被改變,比如需要不同的像素着色器或者混合狀態,那麼則需要一個新的pipeline state object。雖然pipeline state object 包含了很大信息,但其仍然需要幾個外部的參數。

  • Vertex and Index buffers
  • Stream output buffer
  • Render targets
  • Descriptor heaps
  • Shader parameters (constant buffers, read-write buffers, and read-write textures)
  • Viewports
  • Scissor rectangles
  • Constant blend factor
  • Stencil reference value
  • Primitive topology and adjacency information

The pipeline state object can optionally be specified for a graphics command list when the command list is reset using the ID3D12GraphicsCommandList::Reset method but it can also be changed for the command list at any time using the ID3D12GraphicsCommandList::SetPipelineState method.

Root Signatures

根簽名與C++的函數簽名很像,它定義了要傳遞給着色器管線的參數,然後被綁定至渲染管線上,之後要再改變這個參數就可以不用改變這個根簽名。根簽名中的根函數不僅定義了希望傳遞到着色器中的參數類型,他們也定義了shader registers和register spaces,用於作爲綁定至着色器的選項。

Shader Register & Register Spaces

着色器參數必須綁定至一個register,例如,常數緩存必須綁定至 b registers (b0 – bN),shader resouce view(紋理和非常數的緩存類型)要綁定至t registers (t0 – tN), unordered access views (writeable textures and buffer types) are bound to u registers (u0 – uN), and texture samplers are bound to s registers (s0 – sN) where N is the maximum number of shader registers。Shader Model 5.1 removes the limit to the number of shader registers that can be used. I think it is limited to \(2^{32}-1\) but I haven’t tested this upper limit.

在之前版本的DX中,不同的資源可以綁定到相同的register slot中,只要它們用在渲染管線的不同着色階段。例如,一個常數緩存可以綁定至register b0到頂點着色器中,而另一個常數緩存可以綁定至b0到像素着色器中,而不會引起衝突。而DX12的Shader Model 5.1中,甚至連不同的着色管線階段這個限制條件都沒了,只要它們在不同的register spaces中即可。Prior to Shader Model 5.1, resource registers could overlap across shader stages (left). Shader Model 5.1 introduces shader spaces which can be used to overlap register slots (right).It is important for the graphics programmer to understand the shader register and register spaces overlapping rules when porting legacy shaders to DirectX 12.

Registers-and-Spaces.png

 

 

Root Signature Parameters

一個根簽名可以包含很多參數,這些參數可以是如下種類:

32-BIT CONSTANTS

如果使用32-bit常數,那麼常數緩存就可以直接傳遞而無需創建常數緩存資源。存儲在根簽名中的常數數據不支持動態索引。比如,先的常數緩存定義可以被映射成32-bit 常數並存儲在根簽名中。

1

2

3

4

5

6

cbuffer TransformsCB : register(b0, space0)

{

    matrix Model;

    matrix View;

    matrix Projection;

}

但是數組就不行了,如下。必須要使用inline descriptor或者descriptor heap。每個根簽名中的根常數需要一個DWORD,32bit。

1

2

3

4

cbuffer TransformsCB : register(b0, space0)

{

    matrix MVPMatrices[3];

}

INLINE DESCRIPTORS

Descriptors可以被直接放在根簽名裏而無需descriptor heap[6]。Only constant buffers (CBV), and buffer resources (SRV, UAV) resources containing 32-bit (FLOATUINT, or SINT) components在根簽名裏可以用inline descriptors觸及到。Inline UAV descriptors for buffer resources cannot contain counters (for example, if a RWStructuredBuffers contains a counter resource, it may not be accessed through an inline descriptor in the root signature. Texture resources cannot be referenced using inline descriptors in the root signature and must be placed in a descriptor heap and referenced through a descriptor table.

與根常數不同的是,包含數組的常數緩存可以在根簽名裏可以使用inline descriptor獲取。每個inline descriptor需要消耗兩個DWORD也就是64bits。

1

2

3

4

5

6

cbuffer SceneData : register(b0, space0)

{

   uint foo;

   float bar[2];

   int moo;

};

DESCRIPTOR TABLES

Descriptor table定義了一些在GPU可見descriptor heap裏連續存儲的descriptor range。

Descriptor-Tables.png

上面這張圖展示了擁有一個descriptor table參數的根簽名A。這個descriptor table包含三個descriptor rangesB,這三個分別是3 Constant Buffer Views (CBV), 4 Shader Resource Views (SRV), and 2 Unordered Access Views (UAV).CBV’s, SRV’s and UAV’s因爲其這三種descritptors可以被存儲在同種descriptor heap裏,所以可以被同一種descriptor引用。The GPU visible descriptors (C)必須在在一段連續的heap上,而類似於D的資源則不用連續,甚至不用在相同的heap上。

Each descriptor table in the root signature costs 1 DWORD (32-bits) [5].

Static Samplers

使用紋理採樣來指定如何使用紋理。但是這裏可以直接在根簽名裏使用根簽名而無需descriptor heap。這是使用D3D12_STATIC_SAMPLER_DESC結構。靜態採樣不使用任何根簽名中的空間,不佔大小。

Root Signature Constraints

根簽名最大是64 DWORDs,即2048比特。

Root signatures are limited to 64 DWORDs (2048-bits) [5]

  • 32-bit constants each costs 1 DWORD
  • Inline descriptors each costs 2 DWORDs
  • Descriptor tables each costs 1 DWORD
  • static sampler each costs 0

開發者應當平衡好性能。如果一個根參數經常出現,那麼就應該讓它出現在根簽名中的第一位。否認則儘量靠後。Since 32-bit constants and inline descriptors have better performance in terms of level of indirection, they should be favored over using descriptor tables as long as the size of the root signature does not become to large.

DirectX 12 Demo

The previous lesson showed how to initialize a DirectX 12 application without using any C++ classes. Some of the source code from the first lesson was refactored in order to simplify the source code for this and future lessons. There are three primary classes that are used for this lesson:

  1. Application
  2. Window
  3. CommandQueue
  4. Game

The Application class is responsible for initializing application specific data such as the DirectX 12 device and command queues. The Application class is also responsible for creating the Window instances and it is also the owner of the Window instances (Window instances can only be created and destroyed using the Application class). The Application class also exposes a Run method which is used to run the game and execute the message loop. The Quit method is used to quit the running application.

The Window class creates the swap chain which contains the final rendered image that will be presented to the screen. The Window class also contains functions to resize the window and toggle vsync, and fullscreen state.

The source code for the Application and the Window classes are not discussed in this lesson. You are encouraged to go back to the previous lesson if you are not familiar with the functionality of these classes. You may also refer to the source code for this lesson on GitHub. A link to the source code for this lesson also is provided at the end of the tutorial.

The CommandQueue and the Game class on the other hand may not be immediately clear and therefore will be discussed in greater detail in this lesson.

If you would prefer to skip the discussion on refactoring of the CommandQueue and Game classes then you can continue directly to the section about Shaders. Be aware that you may see code later in the lesson that you are not familiar with if you skip these sections.

The Command Queue Class

CommandQueue類包裝了ID3D12CommandQueue接口以及用於讓GPU與CPU同步的同步圖元。其必須支持如下功能

  • 獲取可以用於記錄繪製命令的指令集
  • 在指令隊列上執行指令集
  • 在指令隊列上Signal a Fence
  • 檢查Fence是否到達了一個特殊值
  • 如果沒有,則繼續等待
  • 然後Flush所有在指令隊列上的指令

CommandQueue-1.png

/** * Wrapper class for a ID3D12CommandQueue. */
#pragma once#include <d3d12.h>  // For ID3D12CommandQueue, ID3D12Device2, and ID3D12Fence
#include <wrl.h>    // For Microsoft::WRL::ComPtr
#include <cstdint>  // For uint64_t
#include <queue>    // For std::queue

 

COMMANDQUEUE CLASS DEFINITION

class CommandQueue
{
public:
    CommandQueue(Microsoft::WRL::ComPtr<ID3D12Device2> device, D3D12_COMMAND_LIST_TYPE type);
    virtual ~CommandQueue();
    // Get an available command list from the command queue.
    Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList2> GetCommandList();
    // Execute a command list.
    // Returns the fence value to wait for for this command list.
    uint64_t ExecuteCommandList(Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList2> commandList);
    uint64_t Signal();
    bool IsFenceComplete(uint64_t fenceValue);
    void WaitForFenceValue(uint64_t fenceValue);
    void Flush();
    Microsoft::WRL::ComPtr<ID3D12CommandQueue> GetD3D12CommandQueue() const;

GetCommandList方法返回了可以發佈繪製指令的指令集。這種方法返回的指令集可以被立即用於發佈指令。並且不需要重設指令集或創建指令分配器。在指令被記錄到指令集後,然後可以使用ExecuteCommandList方法來在指令隊列上執行這些指令。這個方法返回Fence值,用於檢查指令集中的指令是否在指令隊列中執行完成。而GetD3D12CommandQueue方法用於獲取ID3D12CommmandQueue接口。

protected:
    Microsoft::WRL::ComPtr<ID3D12CommandAllocator> CreateCommandAllocator();
    Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList2> CreateCommandList(Microsoft::WRL::ComPtr<ID3D12CommandAllocator> allocator);
private:

    // Keep track of command allocators that are "in-flight"
    struct CommandAllocatorEntry
    {
        uint64_t fenceValue;
        Microsoft::WRL::ComPtr<ID3D12CommandAllocator> commandAllocator;
    };
    using CommandAllocatorQueue = std::queue<CommandAllocatorEntry>;
    using CommandListQueue = std::queue< Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList2> >;
    D3D12_COMMAND_LIST_TYPE                     m_CommandListType;
    Microsoft::WRL::ComPtr<ID3D12Device2>       m_d3d12Device;
    Microsoft::WRL::ComPtr<ID3D12CommandQueue>  m_d3d12CommandQueue;
    Microsoft::WRL::ComPtr<ID3D12Fence>         m_d3d12Fence;
    HANDLE                                      m_FenceEvent;
    uint64_t                                    m_FenceValue;
    CommandAllocatorQueue                       m_CommandAllocatorQueue;
    CommandListQueue                            m_CommandListQueue;

CommandAllocatorFactory結構用於把一個Fence值與一個指令分配器關聯起來。在之前一課說過,指令集在指令隊列上執行後就可以立即複用,但是指令分配器不能,除非存儲在指令分配器中的指令以及在指令隊列上執行完成。爲了檢查後者的指令是否已經執行完成,將給指令隊列一個相關聯的fence value。而CommandAllocatorQueue則色一個std::queue物體,用於給正在執行的command allocators排序。與CommmandAllocatorEntry類型,CommandListQueue也是一個std::queueu物體,用於給可以複用的指令集排序。

m_d3d12Device成員變量存儲了ID3D12Device2接口的指針用於創建指令隊列,指令集和指令分配器。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章