an introduction to the High Level Shading Language with a simple example :
2-sided shader
(level : medium)


Contents

Introduction
Two-sided objects' problem
Vertex shader in assembly language
Integrating the vertex shader in assembly language
HLSL vertex shader
Integrating the HLSL vertex shader
HLSL pixel shader
Comparison
Sample program
References


Introduction

What is the High Level Shading Language (HLSL for short) ? A programming language introduced by DirectX 9 and intended to replace the assembly language used before (since DirectX 8) to write more and more complex vertex and pixel shaders (if you don't know what a vertex or pixel shader is, read the related DirectX documentation before coming back to this text). As its name implies HLSL is a high level language, similar to the C language in its principles but more limited because it's only dedicated to shaders programming (for example, there is no concept of pointers).
[Note : HLSL and Cg are two equivalent languages, or more exactly the same language named differently by Microsoft and nVidia who co-developed it].

The goal of this article is not to exhaustively describe this language, its instructions and its syntax ; we're rather going to focus on its use, and the modifications required in the rendering code of a DirectX application. We will start from a real problem, develop a shader in assembly language to solve it, and finally rewrite this shader in HLSL and compare it with the assembly code.


Two-sided objects' problem

Two-sided objects are often used in video games to represent very thin objects : flags, curtains, water surfaces, sails, etc. They require less points and faces than the real volumes they figure, which is advantageous regarding memory, rendering speed, and calculation time when these objects are animated. In 3D Studio Max, materials own a flag to indicate if the objects using them have the two-sided property or not.

The primary difference between a standard mesh and a 2-sided object lies in the hidden faces removal (or "backface culling") : for a standard mesh, a face is only visible if the camera watching it is located on the "good" side of this face, that is the side the face's normal points to. On the contrary the faces of a two-sided object are visible from both sides, which means only one side is needed to render the object in every situation (= from every viewpoint).

2 faces of a mesh (left) and of a 2-sided object (right), in a top view

To allow a triangle to be drawn whether it is front facing or back facing is very simple, one only has to use the instruction :
pDevice->SetRenderState (D3DRS_CULLMODE, D3DCULL_NONE);
and the job is done. The result is not very satisfying though : the vertices' normals stay the same whichever side is viewed, which means lighting is not right for the back side. Imagine a blue light L1 is placed in front of the 2-sided object, and a red light L2 is located behind : the normals aim at L1, the object will appear to be blue on both sides (and L2's contribution will equal zero; this is because usual lighting formulas take the dot product between the vertex normal and the light's direction into account, and this dot product is negative for each vertex regarding L2, which means by definition that L2 does not light any of these points).

top view of a surface facing L1

As schematized in the right picture, the back side's normals would need to be inverted for its lighting to be correct. Several methods are available to deal with this problem :

- creating a second mesh representing the back side of the 2-sided object. The vertices of this mesh have the same positions as those of the original object, the same mapping coordinates if there are some, and inverted normals for the lighting. The indices are also swapped for each face : if one face of the original object uses points A-B-C in this order, then the corresponding triangle in the new mesh references C-B-A to follow the rule chosen for backface culling (only the CW - clockwise - faces or CCW - counter clockwise - faces are displayed).
Having two meshes to represent a single object is not very handy : their potential deformations and moves have to remain synchronized, the intersection or collision tests are slowed down by the increased number of objects, etc.

- modifying the 2-sided object's mesh so that each of its face has a corresponding back face. This solution is often used, because it does not require any special treatment in the rendering process, but it multiplies the number of faces of the object by 2. The number of points is not necessarily doubled : it is possible to use the same xyz and uv for both sides of the object, provided it is drawn in two steps (one for each side) and there are two different streams containing the normals and the inverted normals. The drawback of this method is the additional memory space needed by the object, and the requirement to use at least two streams to render it (and be able to change the normals alone).

How could we solve this problem in a more elegant way ? With the "fixed function pipeline" (defined and implemented by DirectX and only modifiable in the limits fixed by the various render states) it seems nothing can be done. On the other hand with a vertex shader we see immediately that we won't have any problem to invert the normals when needed. However it is not possible to know if a given face is front facing or back facing, because the shader only enables us to work on one vertex at a time ; this means we have to render the object in two steps, one for the front faces and the other for the back faces. Between these steps we will change the backface culling test, which avoids having to swap the order of indices for each face as explained previously.

The algorithm is then the following :
- initialize backface culling to CCW (counter clockwise) ;
- render the mesh with a classic vertex shader ;
- if the object is not two-sided, we're done with its rendering ;
- otherwise, switch backface culling to CW (clockwise) ;
- render the mesh with a vertex shader that inverts normals before the lighting calculation.


Vertex shader in assembly language

Here is the code of a vertex shader calculating the lighting of a mesh by two directional lights :

vs.1.1

//----------------------------------------------------------------------------------------
// vertex inputs
//----------------------------------------------------------------------------------------

#define iPos        v0                                      // vertex position
#define iNormal     v1                                      // vertex normal
#define iTex0       v2                                      // base texture coordinates

dcl_position        iPos
dcl_normal          iNormal
dcl_texcoord0       iTex0

//----------------------------------------------------------------------------------------
// constants
//----------------------------------------------------------------------------------------

def                 c0, 0, 0, 0, 0
#define Zero        c0                                      // c0       : 0;0;0;0

#define Matrix      c10                                     // c10-c13  : matrix
#define NFactor     c14                                     // c14      : normal factor (-1 or +1)
#define Ambient     c15                                     // c15      : global ambient * material ambient
#define MatDiff     c16                                     // c16      : material diffuse color
#define MatAlpha    c16.w

#define LightDiff1  c20                                     // c20      : light1 diffuse color
#define LightDir1   c21                                     // c21      : light1 dir in model space

#define LightDiff2  c30                                     // c30      : light2 diffuse color
#define LightDir2   c31                                     // c31      : light2 dir in model space

//----------------------------------------------------------------------------------------
// code
//----------------------------------------------------------------------------------------

m4x4                oPos, iPos, Matrix                      // transform position
mul                 r2, iNormal, NFactor                    // N or -N
mov                 oT0.xy, iTex0                           // copy tex coords

// directional light 1

dp3                 r0, r2, -LightDir1                      // N * -LightDir1
max                 r0, r0, Zero                            // clamp to [0;1]
mul                 r1, r0.x, MatDiff                       //   * material diffuse
mul                 r1, r1, LightDiff1                      //   * light1   diffuse

// directional light 2

dp3                 r0, r2, -LightDir2                      // N * -LightDir2
max                 r0, r0, Zero                            // clamp to [0;1]
mul                 r3, r0.x, MatDiff                       //   * material diffuse
mul                 r3, r3, LightDiff2                      //   * light2   diffuse

// final color

add                 r0, r1, r3
add                 oD0, r0, Ambient                        //   + ambient
mov                 oD0.w, MatAlpha                         // preserve alpha

Do not panic if you're not used to seeing that kind of things : HLSL is just here to save you having to learn / understand them. Let's review them quickly though to get the overall idea :

- "vs.1.1" indicates that the code following it uses the features of vertex shaders 1.1, that is to say it can run on most current 3D cards supporting vertex shaders (GeForce2 and above, Radeon...).

- the "vertex inputs" block defines the format of the points entering the shader : position (xyz) in v0, normal in v1, then mapping coordinates in v2. This declaration is done by the "dcl_..." instructions, the "#define" are only here to give more readable names to v0, v1 and v2 (like in the C language). This format must of course match the one of the stream or streams used by the DrawIndexedPrimitive call invoking the shader.

- constants are values (each made of 4 floats) defined in the shader itself (case of c0 also named Zero), or passed to the shader by the main program. Here we have the matrix (taking 4 registers) allowing us to transform vertices from object space to view space, a multiplying coefficient applied to the normals, the ambient lighting of the scene, the diffuse color of the object's material, and finally the colors and directions of the two directional lights. These constants are not fixed for the whole application's lifetime : this term means that they can't be modified inside the shader, and the main program can only change their values between calls to DrawIndexedPrimitive.

- the "code" block does the following operations : it transforms the point's coordinates to view space, inverts the normal if necessary, copies the mapping coordinates, calculates each light's contribution and finally determines the color of the vertex.

In the previous section, we've seen that the presented algorithm uses two different shaders, depending on the need to invert the normals or not. When you have already written a certain number of shaders, it is annoying to have to double this number and keep two different versions of each shader just to handle the special case of two-sided objects. This is where NFactor is useful : before the lighting calculations, the normal is multiplied by NFactor, which can take any value set before the call to DrawIndexedPrimitive. Interesting values are of course -1.f et 1.f, that invert the normals or leave them unmodified.

A note about the lights' directions : they are used in a dot product with the vertices normals. These normals are defined in object space, so it is easier and faster to transform the lights' directions to object space only once before the call to DrawIndexedPrimitive than to transform each normal in the vertex shader to the space where the lights are defined (in general this is the world space, but the lights could be attached to other objects of the scene). Contrary to what happens when the fixed function pipeline is used, it is important to normalize these directions, otherwise the result of the dot product (used to obtain the angle - actually its cosine - between two vectors) will not mean anything.


Integrating the vertex shader in assembly language

For simplicity's sake I have not created a material class etc in the sample program : the vertex shader and the pixel shader (that we're going to see later) as well as the test object (a single quad) and the textures are members of the DirectX 9 rendering class. So you will find them in the RendererDX9.h file. (Note : a pixel shader is not required to use the already seen vertex shader, each part of the fixed function pipeline can be replaced by a shader independently of the other).

  // protected data

  protected:

    bool                      m_bo2Sided;

    // DX9 access

    LPDIRECT3DDEVICE9         m_pDevice;
    LPDIRECT3D9               m_pD3D;

    // IB/VB

    LPDIRECT3DINDEXBUFFER9    m_pIB;
    LPDIRECT3DVERTEXBUFFER9   m_pVB;

    // vertex shader

    LPDIRECT3DVERTEXSHADER9   m_pVertexShader;
    LPDIRECT3DVERTEXDECLARATION9 m_pVertexDeclaration;
    LPD3DXCONSTANTTABLE       m_pVertexConstants;

    LPDIRECT3DPIXELSHADER9    m_pPixelShader;
    LPD3DXCONSTANTTABLE       m_pPixelConstants;

    // textures

    LPDIRECT3DTEXTURE9        m_pTexFront;
    LPDIRECT3DTEXTURE9        m_pTexBack;

The vertex shader is created with the following lines :

  // vshader

  D3DVERTEXELEMENT9 decl[] =
    {
      { 0,  0, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0 },
      { 0, 12, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL,   0 },
      { 0, 24, D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0 },
        D3DDECL_END()
    };

  hrErr = m_pDevice->CreateVertexDeclaration(decl,&m_pVertexDeclaration);
  if(FAILED(hrErr))
    {
    MessageBox(NULL,"CreateVertexDeclaration failed","CRendererDX9::Create",MB_OK|MB_ICONEXCLAMATION);
    return false;
    }

  DWORD dwFlags = 0; 
  dwFlags |= D3DXSHADER_DEBUG;
  LPD3DXBUFFER pCode   = NULL;
  LPD3DXBUFFER pErrors = NULL;
  hrErr = D3DXAssembleShaderFromFile("dx9/vshader.vsh",NULL,NULL,dwFlags,&pCode,&pErrors);
  if(pErrors)
    {
    char* szErrors = (char*)pErrors->GetBufferPointer();
    pErrors->Release();
    }
  if(FAILED(hrErr)) 
    {
    MessageBox(NULL,"vertex shader creation failed","CRendererDX9::Create",MB_OK|MB_ICONEXCLAMATION);
    return false;
    }

  char* szCode = (char*)pCode->GetBufferPointer();
  hrErr = m_pDevice->CreateVertexShader((DWORD*)pCode->GetBufferPointer(),&m_pVertexShader);
  pCode->Release();
  if(FAILED(hrErr))
    {
    MessageBox(NULL,"CreateVertexShader failed","CRendererDX9::Create",MB_OK|MB_ICONEXCLAMATION);
    return false;
    }

This procedure is explained in the documentation of the DirectX SDK, refer to it for more details about the different parameters of the functions etc. Just notice that the first block is used to define the vertices format, and that the pErrors variable enables you to get some useful information (the error text and the line number) in case the compilation of the shader fails. You can also see that the shader is in a separate text file named vshader.vsh, you can include it in your cpp file instead, in the following way :

TCHAR szVShader[] = _T(""
"vs.1.1\n"
"\n"
"// vertex inputs\n"
"\n"
"#define iPos        v0                                      // vertex position\n"
"#define iNormal     v1                                      // vertex normal\n"
"#define iTex0       v2                                      // base texture coordinates\n"
"\n"
"dcl_position        iPos\n"
"dcl_normal          iNormal\n"
"dcl_texcoord0       iTex0\n"
"\n"
"// constants\n"
"\n"
[...etc...]
"mov                 oD0.w, MatAlpha                         // preserve alpha");

The shader's compilation is then performed with the function :

    hrErr = D3DXAssembleShader(szVShader,sizeof(szVShader),NULL,NULL,dwFlags,&pCode,&pErrors);

Once the shader is created, and before you can use it, you need to initialize its constants :

  m4Total = m4Proj*m4View*m4World;
  m4Total.Transpose();
  m_pDevice->SetVertexShaderConstantF(10,(float*)&m4Total,4);         // trf matrix

  CVect4D v4NFactor(1.f);
  m_pDevice->SetVertexShaderConstantF(14,(float*)&v4NFactor,1);       // normal factor

  CVect4D v4Ambient(0.25f,0.25f,0.25f,1.f);
  m_pDevice->SetVertexShaderConstantF(15,(float*)&v4Ambient,1);       // ambient color

  CVect4D v4MatDiffuse(1.f,1.f,1.f,1.f);
  m_pDevice->SetVertexShaderConstantF(16,(float*)&v4MatDiffuse,1);    // material diffuse

      // lights

  CVect4D v4LightDiffuse1(0.f,0.f,1.f,1.f);
  m_pDevice->SetVertexShaderConstantF(20,(float*)&v4LightDiffuse1,1); // light1 diffuse

  m4World.Invert();
  CVect4D v4LightDir1(0.f,0.f,-1.f,0.f);
  v4LightDir1 = m4World*v4LightDir1;
  v4LightDir1.Normalize1();
  m_pDevice->SetVertexShaderConstantF(21,(float*)&v4LightDir1,1);     // light1 direction

  CVect4D v4LightDiffuse2(1.f,0.f,0.f,1.f);
  m_pDevice->SetVertexShaderConstantF(30,(float*)&v4LightDiffuse2,1); // light2 diffuse

  CVect4D v4LightDir2(0.f,0.f,1.f,0.f);
  v4LightDir2 = m4World*v4LightDir2;
  v4LightDir2.Normalize1();
  m_pDevice->SetVertexShaderConstantF(31,(float*)&v4LightDir2,1);     // light2 direction

The first parameter of SetVertexShaderConstantF is the index of the constant to be modified (for example, c10 for the matrix), then comes the address of an array of floats, and finally the number of consecutive registers to initialize (4 for the matrix, which takes registers c10 to c13 in the shader). Don't forget that each constant is made of 4 floats.

Everything is ready now, the vertex shader can be used to draw objects :

  m_pDevice->SetVertexDeclaration(m_pVertexDeclaration);
  m_pDevice->SetVertexShader     (m_pVertexShader);
  m_pDevice->SetPixelShader      (m_pPixelShader);

  m_pDevice->SetStreamSource     (0,m_pVB,0,sizeof(VERTEX_SIMPLE));
  m_pDevice->SetIndices          (  m_pIB);
  m_pDevice->SetTexture          (TextureIndex,m_pTexFront);

  D3DPRIMITIVETYPE Type = D3DPT_TRIANGLELIST;

  HRESULT hrErr = m_pDevice->DrawIndexedPrimitive(Type,0,0,4,0,2);    // 4 vtx, 2 tris
  if(FAILED(hrErr)) return hrErr;


HLSL vertex shader

Let's move to what really is the subject of this article, and see how the same shader written in HLSL looks like :

float4x4    Matrix;
float4      NFactor;
float4      Ambient;
float4      MatDiff;

float4      LightDiff1;
float4      LightDir1;

float4      LightDiff2;
float4      LightDir2;

struct VS_INPUT
{
    float4  Pos     : POSITION;
    float4  Normal  : NORMAL;
    float2  Tex0    : TEXCOORD0;
};

struct VS_OUTPUT
{
    float4  Pos     : POSITION;
    float4  Color   : COLOR;
    float2  Tex0    : TEXCOORD0;
};

////////////////////////////////////////

VS_OUTPUT VShade(VS_INPUT In)
{
    VS_OUTPUT Out = (VS_OUTPUT) 0; 

    Out.Pos       = mul(Matrix,In.Pos);
    Out.Tex0      = In.Tex0;
    float4 Normal = In.Normal*NFactor;                      // N or -N

    // directional light 1

    float4 Color1 = max(0,dot(Normal,-LightDir1)) *MatDiff*LightDiff1;

    // directional light 2

    float4 Color2 = max(0,dot(Normal,-LightDir2)) *MatDiff*LightDiff2;

    // final color

    Out.Color     = Color1+Color2+Ambient;
    Out.Color.a   = MatDiff.a;

    return Out;
}

The first lines declare the variables the main program can access. New types have been added compared to the C language : float2, float3 and float4 which correspond respectively to 2, 3 and 4 floats, and float4x4 which contains the 16 values of a 4x4 matrix (refer to the DirectX documentation for the complete list). Global variables only known by the shader can be declared with the static keyword, and constants can be defined too :

static       float4 GlobalVar;
static const float  SpecPower = 64.f;

After that two structures are declared, which represent the incoming and outgoing data of the shader. The names of these structures and of their fields can be freely chosen, these are the POSITION, NORMAL, TEXCOORD0, COLOR etc keywords that tell the compiler about the source and destination of the values and the links to establish with the v0, v1, v2... or oD0, oPos, oT0 registers we've seen in the assembly code.

In VS_INPUT, notice that the position and normal are defined as float4 values although the stream sent by the main program only contains 3 x, y and z values for each one. Shaders registers consist of 4 floats, so a w value is silently added when entering the shader ; it equals 1.f for the position as well as for the normal. Using float3 variables in VS_INPUT for the position and normal is perfectly valid, but it is actually easier to work with float4 values : this is because we need to transform the position to view space by multiplying it by the 4x4 matrix, which is not possible with a 3D-only vector.

VS_OUTPUT enables us to return 2D mapping coordinates, an homogeneous position (4 floats), and a rgba color (4 floats too).

Next comes the shader's code, and its main function. It does not have to be named "main", provided that the name is passed to the compiler so that it knows where to find the entry point of the program. There are several ways to return the results of the shader using VS_OUTPUT, the one used in the above code is the more natural in my opinion. Like in assembly, a vertex shader must at least return a position.

As you can see, this code is extremely simple to understand for somebody knowing the C language ; this is the big advantage of HLSL. The other benefit lies in the fact that registers r0, r1 etc are no more visible anywhere, which means various functions written in HLSL can easily be reused together without the risk of registers conflicts. Even if the above code did not contain any, it is in fact possible to write functions and call them the C way :


inline float4 DirectionalLight(float4 Normal,float4 Dir,float4 Diffuse)
{
    return max(0,dot(Normal,-Dir))*Diffuse;
}

VS_OUTPUT VShade(VS_INPUT In)
{
    VS_OUTPUT Out = (VS_OUTPUT) 0; 

    Out.Pos       = mul(Matrix,In.Pos);
    Out.Tex0      = In.Tex0;
    float4 Normal = In.Normal*NFactor;                      // N or -N

    Out.Color     = Ambient+DirectionalLight(Normal,LightDir1,LightDiff1)*MatDiff
                           +DirectionalLight(Normal,LightDir2,LightDiff2)*MatDiff;
    Out.Color.a   = MatDiff.a;

    return Out;
}

The inline keyword is not absolutely necessary, at the time being every function is inlined by the compiler. Like in the C language the function must be placed before the one calling it in the source, for its prototype to be known when the compiler reaches the call. I have stated in the introduction that there is no concept of pointer in HLSL, and so there is no reference either ; parameters are passed by values, which is actually unimportant because there are no real function calls since everything is inlined. Note that recursive functions (that is functions calling themselves) are not allowed.

In the above code some functions are used but their declarations are not visible anywhere : max, dot and mul. These are "intrinsics", they are part of the language itself because they're often needed in shaders. There are a lot more (between 70 and 80) described in the documentation of the DirectX SDK. Among them one finds the mathematical functions atan2, cos, sin, cross (cross product), exp, length (of a vector), log, normalize, pow, sqrt, tan...


Integrating the HLSL vertex shader

The code creating the vertex shader is the same as for the assembly version, except that the call to D3DXAssembleShaderFromFile is replaced by :

  hrErr = D3DXCompileShaderFromFile("dx9/vshader.fx",NULL,NULL,"VShade","vs_1_1",dwFlags,&pCode,&pErrors,&m_pVertexConstants);

This time the shader's code is in a .fx file, this extension is usually used for DirectX "effects". An effect (in the sense of "special effect") is a rendering method to be applied to objects (for example, a cartoon rendering), and made of one or more "techniques" : each one corresponds with a more or less refined version of the desired effect, and the most adequate version is used according to the available hardware in the user's computer. Each technique can have one or more rendering passes. In the DirectX documentation, HLSL is related to the programming of effects, and it seems that it can not be used out of this context ; this is not true, as the use of D3DXCompileShaderFromFile demonstrates.

The fourth parameter is the one giving the name of the shader's entry point, here "VShade". The next one indicates the targeted vertex shader version, which enables the compiler to perform some specific checkings and optimizations ; however, if you compile the shader in a version (for example, 3.0) not supported by the graphics card in your system, the compilation will succeed but the creation of the shader by CreateVertexShader will fail. Finally, the last parameter allows the compiler to return a constant table, which is needed to initialize the shader's variables.

Actually these variables do not correspond anymore with registers c0, c1, etc chosen by the programmer, but with registers selected by the compiler at its own convenience, and so it is no more possible to access them by their numbers. The new method consists in looking up in the constant table to get a handle representing the desired variable, and using this handle to modify the register(s) associated with this variable :

  if(m_pVertexConstants)
    {
    D3DXHANDLE handle;
    if(handle = m_pVertexConstants->GetConstantByName(NULL,"Matrix"))
      m_pVertexConstants->SetMatrix(m_pDevice,handle,(D3DXMATRIX*)&m4Total);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"NFactor"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4NFactor);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"Ambient"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4Ambient);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"MatDiff"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4MatDiffuse);

      // lights

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"LightDiff1"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4LightDiffuse1);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"LightDir1"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4LightDir1);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"LightDiff2"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4LightDiffuse2);

    if(handle = m_pVertexConstants->GetConstantByName(NULL,"LightDir2"))
      m_pVertexConstants->SetVector(m_pDevice,handle,(D3DXVECTOR4*)&v4LightDir2);
    }

This is not complicated, just notice the SetMatrix function used for the matrix, whereas SetVector is called for all other variables (which are defined as float4 in the shader).

The code preceding the call to DrawIndexedPrimitive does not change with respect to the use of the assembly vertex shader. This time again you can include the HLSL shader in the CPP file, and compile it with D3DXCompileShader.


HLSL pixel shader

HLSL is not limited to vertex shaders, here is a simple pixel shader example :

sampler   baseTex;

struct PS_INPUT
{
    float4  Color   : COLOR0;
    float2  Tex0    : TEXCOORD0;
};

struct PS_OUTPUT
{
    float4  Color   : COLOR;
};

////////////////////////////////////////

PS_OUTPUT PShade(PS_INPUT In)
{
    PS_OUTPUT Out = (PS_OUTPUT) 0; 

    Out.Color = In.Color * tex2D(baseTex,In.Tex0);
    return Out;
}

This shader uses the mapping coordinates (interpolated for each pixel from those of the vertices of the face it belongs to) it receives to sample a color from a texture, and multiplies this color by the diffuse component (also interpolated) of the pixel. The intrinsic function tex2D is the one reading from the texture, it takes a "sampler" and 2D mapping coordinates as parameters. A sampler is an object introduced in DirectX 9 to read a value from a texture according to different filtering options (minification, magnification and mipmapping) set with the SetSamplerState function. The distinction between a sampler and a texture stage has been added because it can be useful to sample several values from the same texture in a rendering pass, which means more samplers than textures are needed ; at the time being the number of texture stages is still limited to 8, on the other hand a card can have as much as 16 samplers.

The creation of the pixel shader in the main program does not contain anything worth mentioning :

  // pshader

  hrErr = D3DXCompileShaderFromFile("dx9/pshader.fx",NULL,NULL,"PShade","ps_1_1",dwFlags,&pCode,&pErrors,&m_pPixelConstants);
  if(pErrors)
    {
    char* szErrors = (char*)pErrors->GetBufferPointer();
    pErrors->Release();
    }
  if(FAILED(hrErr)) 
    {
    MessageBox(NULL,"pixel shader creation failed","CRendererDX9::Create",MB_OK|MB_ICONEXCLAMATION);
    return false;
    }

  szCode = (char*)pCode->GetBufferPointer();
  hrErr  = m_pDevice->CreatePixelShader((DWORD*)pCode->GetBufferPointer(),&m_pPixelShader);
  pCode->Release();
  if(FAILED(hrErr))
    {
    MessageBox(NULL,"CreatePixelShader failed","CRendererDX9::Create",MB_OK|MB_ICONEXCLAMATION);
    return false;
    }

But initializing a sampler is not exactly the same as initializing another variable :

    if(m_pPixelConstants && (handle = m_pPixelConstants->GetConstantByName(NULL,"baseTex")))
      {
      D3DXCONSTANT_DESC constDesc;
      UINT count = 1;
      m_pPixelConstants->GetConstantDesc(handle,&constDesc,&count);

      if(constDesc.RegisterSet == D3DXRS_SAMPLER)
        TextureIndex = constDesc.RegisterIndex;
      }

As before we get a handle from the constant table, but it is not used to modify the shader's variable : it allows us to obtain (in TextureIndex) the index of the texture stage corresponding with the sampler. This index is then used as usual to associate a texture with the texture stage like this :

  m_pDevice->SetTexture(TextureIndex,m_pTexFront);

In the above pixel shader, only one sampler is used, and there is no reason for the compiler to choose another texture stage than the first one (index 0). So replacing TextureIndex by zero in the call to SetTexture has a good chance to work. But if you write a shader with several samplers, bad surprises can arise ; it is a good idea to get used to the constant table even for simple shaders.

Note : if no texture is associated with the sampler used in the pixel shader, tex2D returns a black pixel. This behavior differs from the one of the fixed function pipeline, which considers the default texture is white so that for an untextured object texture*diffuse = diffuse. This means that to correctly use a shader containing a tex2D instruction your objects must have a texture, if their material does not hold one you can assign them a 1x1 white texture.

To end this section here is a function that converts the results of the graphic pipeline to shades of grey, and shows that you can use whatever calculations you want like in a vertex shader :

PS_OUTPUT PShade(PS_INPUT In)
{
    PS_OUTPUT Out    = (PS_OUTPUT) 0; 

    Out.Color        = In.Color * tex2D(baseTex,In.Tex0);
    float fIntensity = Out.Color.r*0.30f + Out.Color.g*0.59f + Out.Color.b*0.11f;
    Out.Color        = float4(fIntensity,fIntensity,fIntensity,1.f);

    return Out;
}


Comparison

A command line HLSL compiler (fxc.exe) is available in the DirectX 9 SDK. Besides the compilation of a shader to a binary format, it enables you to save the generated assembly code to a text file in order to examine it. This can be very instructive, in particular regarding the optimizations performed for the different shader versions (1.1, 2.0, 3.0...). You invoke fxc in the following way :

fxc /T:vs_2_0 /E:VShade /Zi /Fc:vshader.fxc vshader.fx
fxc /T:ps_2_0 /E:PShade /Zi /Fc:pshader.fxc pshader.fx

In this example, vshader.fx and pshader.fx are compiled using version 2.0, their main functions are respectively named VShade and PShade, and the generated code is written to vshader.fxc and pshader.fxc. Let's see the contents of vshader.fxc, which we can compare with the assembly version of this article's vertex shader :

//
// Generated by Microsoft (R) D3DX9 Shader Compiler
//
//  Source: vshader.fx
//  Flags: /E:VShade /T:vs_2_0 /Zi 
//

// Parameters:
//
//     float4x4 Matrix;
//     float4 NFactor;
//     float4 Ambient;
//     float4 MatDiff;
//     float4 LightDiff1;
//     float4 LightDir1;
//     float4 LightDiff2;
//     float4 LightDir2;
//
//
// Registers:
//
//     Name         Reg   Size
//     ------------ ----- ----
//     Matrix       c0       4
//     NFactor      c4       1
//     Ambient      c5       1
//     MatDiff      c6       1
//     LightDiff1   c7       1
//     LightDir1    c8       1
//     LightDiff2   c9       1
//     LightDir2    c10      1
//

    vs_2_0
    def c11, 0, 0, 0, 0
    dcl_position v0  // In<0,1,2,3>
    dcl_normal v1  // In<4,5,6,7>
    dcl_texcoord v2  // In<8,9>

#line 32 "C:\temp\fairyengine\vshader.fx"
    mul r0, v0.x, c0
    mad r2, v0.y, c1, r0
    mad r4, v0.z, c2, r2
    mad oPos, v0.w, c3, r4  // ::VShade<0,1,2,3>
    mul r1, v1, c4  // Normal<0,1,2,3>

#line 38
    dp4 r8.w, r1, -c8
    max r3.w, r8.w, c11.x
    mul r10.xyz, r3.w, c6
    mul r5.xyz, r10, c7  // Color1<0,1,2>

#line 42
    dp4 r5.w, r1, -c10
    max r5.w, r5.w, c11.x
    mul r7.xyz, r5.w, c6

#line 46
    mad r9.xyz, r7, c9, r5
    add oD0.xyz, r9, c5  // ::VShade<4,5,6>
    mov oD0.w, c6.w  // ::VShade<7>

#line 33
    mov oT0.xy, v2  // ::VShade<8,9>

// approximately 16 instruction slots used


// 0000:  fffe0200  0098fffe  47554244  00000028  _......_DBUG(___
// 0010:  00000244  00000000  00000001  00000048  D.______.___H___
[etc...]

First we see that the compiler has selected registers that are not the same as in our code to store the variables, which was likely and expected, and shows we can't initialize the variables with SetVertexShaderConstantF. On the contrary the vertices formats are strictly identical, because the compiler simply translates the VS_INPUT structure from the HLSL code.

The first four instructions from "line 32" correspond with m4x4 in the assembly code (m4x4 is a "complex" instruction which theoretically takes 4 cycles, that is to say the same time as the mul and 3 mad above), and the next one eventually inverts the normal. The instructions from "line 38" calculate the lighting contribution of the first directional light, and those of "line 42" the contribution of the second light. "Line 46" is the calculation of the final color : we see that the compiler has replaced a mul and a add in our assembly code by a single mad operation, which is clever. Finally "line 33" copies the mapping coordinates.

To sum up, the code generated by the compiler on this simple example is almost identical to the hand written one, and it even saves one instruction. Knowing that HLSL is going to gain new instructions, and that shaders are expected to become longer and more and more complex, it is easy to understand the advantage of learning this language, which should quickly replace the assembly code used so far.


Sample program

The sample program (536 Kb) that accompanies this article is very simple : the camera rotates around a quad lighted by a blue directional light in front of it and a red directional light behind it. The back side of the quad does not exist in the object's vertex and index buffers, it really is a two-sided object ; two commands in the "file" menu allow you to switch to wireframe and especially to disable the back side's rendering. Note that I take advantage of the rendering in two passes (one for each side) to change the texture of the object (it's an additional benefit) ; the mapping coordinates are the same for both sides though, which forces to flip the back texture horizontally so that it is displayed as expected. Another option would be to modify the mapping coordinates in the vertex shader itself, as it is done for the normals.

front facing quad
back facing quad
front texture
back texture


References

The HLSL documentation from the DirectX 9 SDK on MSDN.
"Taking It Higher with the High Level Shader Language", MSDN article. Mostly contains examples.
"Introduction to the DirectX 9 High Level Shading Language" : a must read excerpt from the book ShaderX2 (september 2003).


back to top