MSIL - the language of the CLR (Part 1)

Introduction

In this part I’m going to explain what MSIL is as well as the benefit of knowing it! Along the way (before we hit part 2) we will talk a little about what a managed module is composed of – this will stand you in good stead for part 2.
Although I’m putting off showing you any MSIL until part 2, I will show you some MSIL with regards to metadata to establish key points.

What is MSIL?

Microsoft Intermediate Language (MSIL) is what all high level languages that target the CLR compile down to, e.g. if you code some C# and then use the C# compiler (csc.exe) you don’t actually get as output a native representation of your code (whether that is x86, x64, IA64) what you get is an architecture neutral intermediate representation of that code - MSIL.

Note: MSIL, is also sometime referred to as Common Intermediate Language (CIL), or just plain Intermediate Language (IL).

Just as the language understood by your chosen high level language compiler is very specific, i.e. the C# compiler understands how to lex, parse and generate MSIL for C# code and similarly the VB.NET compiler does the same – all be it for the VB.NET language. The Common Language Runtime (CLR), however does not understand C# or VB.NET, instead the CLR knows how to play with MSIL.

Not all high level languages expose all the CLR’s functionality via their respective language constructs; in fact none of the high level languages take advantage of everything the CLR offers.
MSIL is the language of the CLR.

Why learn MSIL?

The advantages of learning MSIL are great – you will learn how to deconstruct an application to a more atomic form and further your understanding of how your code is being implemented at the CLR level.

As well as seeing how your code is being implemented you can also learn a great deal about debugging as well as code optimization from a MSIL representations of your code. No high level compiler is totally perfect, that’s not to say the C# or VB.NET compiler is not that good – the point is that few (if any!) compilers can perform the correct optimizations for all scenarios.

Learning MSIL will also give you a nice insight of a stack based evaluation runtime, in that operations and values are pushed onto the stack then popped off the stack when called. We will talk about this a little later.

The content of a managed module

• PE32 Header (or PE32+)
• CLR Header
• Metadata
• MSIL

PE32 Header

This is something that all Windows modules have, if the managed module has a PE32 header then the module can be ran on 32 and 64 bit versions of Microsoft Windows, if the header is PE32(+) then the module requires Microsoft Windows 64 bit. There are a few more things that a PE header defines but I’ll leave them out for brevity.

CLR Header

The most important thing about the CLR header is that it defines which version of the CLR the managed module requires. There are quite a lot of other things that the CLR header contains as well like the entry method etc.

Note: if you have a C++ programme using the /clr flag with the VC++ compiler when compiling your code will allow you to use features of the CLR. This is what C++/CLI does.

Metadata

The metadata of a managed assembly describes quite a lot about your assembly, including:

• Assembly version number
• Referenced assemblies (this will include mscorlib)
• Assembly information (this is usually to aid in reflection – this include the assembly title, company, product, version, whether or not the assembly is COM visible etc.)
• Module name, e.g. MyConsoleApplication.exe, or MyTestLibrary.dll.


Here is a brief exert of metadata from a dll that as well as referencing mscorlib also references System, and System.Core assemblies – at the bottom of the exert you can see the version of the assembly, and the name of the module.

// Metadata version: v2.0.50727
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
.ver 2:0:0:0
}
.assembly extern System.Core
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
.ver 3:5:0:0
}
.assembly extern System
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
.ver 2:0:0:0
}
.assembly Dsa
{
// ...
.ver 1:0:2815:30494
// ...
}
.module Dsa.dll

MSIL

This is quite simply the MSIL that your respective high level language compiler generates as a result of the compilation. We will talk about how you get from MSIL to something your processor can work with in the next part of this series.

Note:  we will be analysing MSIL in depth in the next part - this is just a taster.

.method public hidebysig instance int32 Add(!T 'value') cil managed
{
// Code size 171 (0xab)
.maxstack 4
.locals init ([0] class [mscorlib]System.Collections.Generic.EqualityComparer`1<!T> comparer,
[1] int32 CS$1$0000,
[2] bool CS$4$0001,
[3] int32 CS$0$0002,
[4] !T CS$0$0003)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldfld int32 class Dsa.DataStructures.ArrayListCollection`1<!T>::_count
IL_0007: ldarg.0
IL_0008: ldfld int32 class Dsa.DataStructures.ArrayListCollection`1<!T>::_capacity

// ... a lot of code not here for brevity!


IL_00a5: ldloc.3
IL_00a6: stloc.1
IL_00a7: br.s IL_00a9
IL_00a9: ldloc.1
IL_00aa: ret
} // end of method ArrayListCollection`1::Add

Summary

In this part I’ve given you the basic structure of a managed module, which you really need to know when before you touch any MSIL. In the next part we will look more at the compilation model, i.e. how you go from MSIL – something your processor doesn’t understand to something that it does. And of course we will code some examples up in MSIL. 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章