Inside Geometry Instancing（下）

3.3.3 Vertex Constants Instancing

在vertex constants instancing方法中，我們利用頂點常量來儲存實體屬性。就渲染性能而言，頂點常量批次是非常快的，同時支持實體位置的移動，但這些特點都是以犧牲可控性爲代價的。

以下是這種方法主要的限制：

l 根據常理數值的大小，每批次的實體數量是受限制的；通常對一次方法調用來說，批次中不會超過50到100個實體。但是，這足以滿足減少CPU調用繪圖函數的負載。

l 不支持skinning；頂點常量全部用於儲存實體屬性了

l 需要支持vertex shaders的硬件

首先，需要準備一塊靜態的頂點緩衝（同樣包括索引緩衝）來儲存同一幾何包的多個副本，每個副本都以模型座標空間保存，並且對應批次中的一個實體。

必須更新最初的頂點格式，爲每個頂點添加一個整數索引值。對每個實體來說，這個值將是一個常量，標誌了特定幾何包屬於哪個實體。這和palette skinning有些類似，每個頂點都包含了一個索引，指向將會影響他的一個或多個骨骼。

更新之後的頂點格式如下：

Stuct InstanceVertex

{

D3DVECTOR3 mPosition;

//other properties……

WORD mInstanceIndex[4]; //Direct3D requires SHORT4

};

在所有實體數據都添加到幾何批次之後，Commit()方法將按照正確的設計，準備好頂點緩衝。

接下來就是爲每個需要渲染的實體加載屬性。我們假設屬性只包括描述實體位置和朝向的模型矩陣，以及實體顏色。

對於支持DirectX9系列的GPU來說，最多能使用256個頂點常量：我們使用其中的200個來保存實體屬性。在我們所舉的例子中，每個實體需要4個常量儲存模型矩陣，1個常量儲存顏色，這樣每個實體需要5個常量，因此每批次最多包含40個實體。

以下是Update()方法。實際的實體將在vertex shader進行處理。

D3DVECTOR4 instancesData[MAX_NUMBER_OF_CONSTANTS];

unsigned int count = 0;

for(unsigned int i=0; i<GetInstancesCount(); ++i)

{

//write model matrix

instancesData[count++] = *(D3DXVECTOR4*) & mInstances[i].mModeMatrix.m11;

instancesData[count++] = *(D3DXVECTOR4*) & mInstances[i].mModelMatrix.m21;

instancesData[count++] = *(D3DXVECTOR4*) & mInstances[i].mModelMatrix.m31;

instancesData[count++] = *(D3DXVECTOR4*) & mInstances[i].mModelMatrix.m41;

//write instance color

instaceData[count++] = ConverColorToVec4(mInstances[i].mColor);

}

lpDevice->SetVertexConstants(INSTANCES_DATA_FIRST_CONSTANT, instancesData, count);

下面是vertex shader：

//vertex input declaration

struct vsInput

{

float4 postion : POSITON;

float3 normal : NORMAL;

//other vertex data

int4 instance_index : BLENDINDICES;

};

vsOutput VertexConstantsInstancingVS( in vsInput input)

{

//get the instance index; the index is premultiplied by 5 to take account of the number of constants used by each instance

int instanceIndex = ((int[4])(input.instance_index))[0];

//access each row of the instance model matrix

float4 m0 = InstanceData[instanceIndex + 0];

float4 m1 = InstanceData[instanceIndex + 1];

float4 m2 = InstanceData[instanceIndex + 2];

float4 m3 = InstanceData[instanceIndex + 3];

//construct the model matrix

float4x4 modelMatrix = {m0, m1, m2, m3}

//get the instance color

float instanceColor = InstanceData[instanceIndex + 4];

//transform input position and normal to world space with the instance model matrix

float4 worldPostion = mul(input.position, modelMatrix);

float3 worldNormal = mul(input.normal, modelMatrix;

//output posion, normal and color

output.position = mul(worldPostion, ViewProjectionMatrix);

output.normal = mul(worldPostion,ViewProjectionMatrix);

output.color = instanceColor;

//output other vertex data

}

Render()方法設置觀察和投影矩陣，並且調用一次DrawIndexedPrimitive()方法提交所有實體。

實際代碼中，可以把模型空間的旋轉部分儲存爲一個四元數（quaternion）,從而節約2個常量，把最大實體數增加到70左右。之後，在vertex shader中重新構造矩陣，當然，這也增加了編碼的複雜度和執行時間。

3.3.4 Batching with the Geometry Instancing API

最後介紹的一種方法就是在DirectX9中引入的，完全可由Geforce 6系列GPU硬件實現的幾何實體API批次。隨着原來越多的硬件支持幾何實體API，這項技術將變的更加有趣，它只需要佔用非常少的內存，另外也不需要太多CPU的干涉。它唯一的缺點就是隻能處理來自同一幾何包的實體。

DirectX9提供了以下函數來訪問幾何實體API：

HRESULT SetStreamSourceFreq( UINT StreamNumber, UINT FrequencyParameter);

StreamNumber是目標數據流的索引，FrequencyParameter表示每個頂點包含的實體數量。

我們首先創建2快頂點緩衝：一塊靜態緩衝，用來儲存將被多次實體化的單一幾何包；一塊動態緩衝，用來儲存實體數據。兩個數據流如下圖所示：

Commit()必須保證所有幾何體都使用了同一幾何包，並且把幾何體的信息複製到靜態緩衝中。

Update()只需簡單的把所有實體屬性複製到動態緩衝中。雖然它和動態批次中的Update()方法很類似，但是卻最小化了CPU的干涉和圖形總線（AGP或者PCI－E）帶寬。此外，我們可以分配一塊足夠大的頂點緩衝，來滿足所有實體屬性的需求，而不必擔心顯存消耗，因爲每個實體屬性只會佔用整個幾何包內存消耗的一小部分。

Render()方法使用正確流頻率（stream frequency）設置好兩個流，之後調用DrawIndexedPrimitive()方法渲染同一批次中的所有實體，其代碼如下：

unsigned int instancesCount = GetInstancesCount();

//set u stream source frequency for the first stream to render instancesCount instances

//D3DSTREAMSOURCE_INDEXEDDATA tell Direct3D we’ll use indexed geometry for instancing

lpDevice->SetStreamSourceFreq(0, D3DSTREAMSOURCE_INDEXEDDATA | instancesCount);

//set up first stream source with the vertex buffer containing geometry for the geometry packet

lpDevice->setStreamSource(0, mGeometryInstancingVB[0], 0, mGeometryPacketDeck);

//set up stream source frequency for the second stream; each set of instance attributes describes one instance to be rendered

lpDevice->SetstreamSouceFreq(1, D3DSTREAMSOURCE_INDEXEDDATA | 1);

// set up second stream source with the vertex buffer containing all instances’ attributes

pd3dDevice->SetStreamSource(1, mGeometryInstancingVB[0], 0, mInstancesDataVertexDecl);

GPU通過虛擬複製（virtually duplicating）把頂點從第一個流打包到第二個流中。vertex shader的輸入參數包括頂點在模型空間下的位置，以及額外的用來把模型矩陣變換到世界空間下的實體屬性。代碼如下：

// vertex input declaration

struct vsInput

{

//stream 0

float4 position : POSITION;

float3 normal : NORMAL;

//stream 1

float4 model_matrix0 : TEXCOORD0;

float4 model_matrix1 : TEXCOORD1;

float4 model_matrix2 : TEXCOORD2;

float4 model_matrix3 : TEXCOORD3;

float4 instance_color : D3DCOLOR;

};

vsOutput geometryInstancingVS(in vsInput input)

{

//construct the model matrix

float4x4 modelMatrix =

{

input.model_matrix0,

input.model_matrix1,

input.model_matrix2,

input.model_matrix3,

}

//transform inut position and normal to world space with the instance model matrix

float4 worldPosition = mul(input.position, modelMatrix);

float3 worldNormal = mul(input.normal,modelMatrix);

//output positon, normal ,and color

output.positon = mul(worldPostion,ViewProjectionMatrix);

output.normal = mul(worldNormal,ViewProjectionMatrix);

output.color = int.instance_color;

//output other vertex data…..

}

由於最小化了CPU負載和內存佔用，這種技術能高效的渲染同一幾何體的大量副本，因此，也是遊戲中理想的解決方案。當然，它的缺點在於需要硬件功能的支持，此外，也不能輕易實現skinning。

如果需要實現skinning，可以嘗試把所有實體的所有骨骼信息儲存爲一張紋理，之後爲相應的實體選擇正確的骨骼，這需要用到Shader Model3.0中的頂點紋理訪問功能。如果使用這種技術，那麼訪問頂點紋理帶來的性能消耗是不確定的，應該實現進行測試。

3．4 結論

本文描述了幾何實體的概念，並且描述了4中不同的技術，來達到高效渲染同一幾何體多次的目的。每一種技術都有有點和缺點，沒有哪種單一的方法能完美解決遊戲場景中可能遇到的問題。應該根據應用程序的類型和渲染的物體種類來選擇相應的方法。

一下是一些場景中建議使用的方法：

l 對於包含了同一幾何體大量靜態實體的室內場景，由於他們很少移動，靜態批次是最好的選擇。

l 包含了大量動畫實體的戶外場景，比如包含了數百戰士的即時戰略遊戲，動態批次也許是最好的選擇。

l 包含了大量蔬菜和樹木的戶外場景，通常需要對他們的屬性進行修改（比如實現隨風而動的效果），以及一些粒子系統，幾何批次API也許就是最好的選擇。

通常，同一應用程序會用到兩個以上的方法。這種情況下，使用一個抽象的幾何批次接口隱藏具體實現，能讓引擎更容易進行模塊化和管理。這樣，對整個程序來說，幾何實體化的實現工作也能減少很多。

（圖中，靜態的建築使用了靜態批次，而樹則使用了幾何實體API）

點擊這裏可以下載完整的PDF文檔，完整的demo大家可以參考NVIDIA SDK中的示例Instancing，也可以直接在這裏下載。另外也可參考DirectX SDK中的示例Instancing。

Inside Geometry Instancing（下）

python gdal 安裝使用（Windows， python 3.6.8）

在VS中讓Shader自動編譯

用Pix調試HLSL

Deferred Shading VS Deferred Lighting

Inside Geometry Instancing（下）

fxc的使用及shader調試技巧

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結