2) DLR - Scanner
In the first chapter I described common architecture of .NET language compiler. There are usually three parts - scanner, parser and generator of CIL code. But building a language on the DLR slightly differs because instead of CIL code is produced DLR abstract syntax tree (DLR AST).

Here is an example demonstrating how code string is processed by a compiler built on the DLR:

As you can see DLR AST is quite similar to a language specific AST built by the parser, the difference is that DLR AST is build from nodes (operations) provided by the DLR and so DLR understands it and can interpret it while language specific AST is a tree structure defined by the language developer.
You could also build DLR AST directly from tokens, but I do not recommend it. It’s more convenient to separate the generation of AST into two steps, where at first parser builds IPL AST. Then generator not only transforms IPL AST into DLR AST, but also it manages variables and variable scopes, converts values into appropriate type, etc.
Implemented language
In this tutorial I’m going to describe implementation of new language I call IPL. It is a simple language similar to ToyScript (also language intended for learning the DLR and provided together with the DLR source codes but lacking any documentation).
First version of IPL language (let’s call it IPL1) will be expression language, that means there will be no statements but only an expression evaluated into a single value.
First version of the IPL (let’s call it IPL1) will support several essential features:
- variables
- values of type int and string
- let-in-end blocks
- if-else conditions
- lambda functions
Lambda Functions
Lambda function is the same as lambda expression in the C# 3.0. If you aren’t familiar with this concept, it’s a simple function, where body is an expression that is evaluated and returned when lambda function is called.
add = fun x y => x + y
Calling add(5, 6) results in 11.
Let-in-end blocks
Let-in-end blocks are used to declare and use variables inside the block. Blocks can be nested and each block has its own variable scope. Implementation of let-in-end blocks will demonstrate how to implement variable scopes in the DLR.
Here are some examples:

Here is an implementation of factorial in IPL1:
let val x = 4
val fac = fun n => if n > 1 then n * self(n - 1)
else 1
in fac(x) end
First version of the IPL (let’s call it IPL1) will be expression language, that means there will be no statements but only an expression evaluated into a single value. In the next chapters the language is going to be extended by new features such as mentioned statements, functions, .NET interoperability (calling and using .NET libraries), etc.
Building IPL language
Lexer and parser generators
How to build a scanner and a parser? There are several ways, you could implement those parts by yourself, but this can be quite time expensive and you need some experience. Another, much simpler way, is to use lexer and parser generators. I prefer using generators because those tools are specialized on building scanners and parsers and they considerably simplify the implementation. In .NET there are several generators available, I prefer F# generators fslex (lexer) and fsyacc (parser).
Don’t be affraid if you’re not familiar with F#, actually there won’t be more like 20 lines of F# code together for scanner and the parser because fslex and fsyacc follow it’s own specification. Rest, the generator, will be implemented in the C# and while F# as well as C# are .NET languages, there won’t be a problem to use those parts together.
Let’s get started
As I already mentioned, scanner will be implemented using F# lexer generator fslex, that is part of the F# distribution. You can download F# on http://research.microsoft.com/fsharp/release.aspx ^.
If you have F# already installed, you can simply add the F# Lex (fslex) and F# Yacc (fsyacc) specification files into the project:

Scanner
Here is the scanner’s specification file scanner.fsl of the IPL1 language:
// scanner.fsl
{
open System
open Parser
open Lexing
}
let digit = ['0'-'9']
let char = ['a'-'z' 'A'-'Z']
let string = '"'([^'"'])*'"'
let whitespace = [' ' '\t' '\r' '\n']
rule scan = parse
// Keywords
| "else" { KeyELSE }
| "end" { KeyEND }
| "false" { KeyFALSE }
| "fun" { KeyFUN }
| "if" { KeyIF }
| "in" { KeyIN }
| "let" { KeyLET }
| "then" { KeyTHEN }
| "true" { KeyTRUE }
| "val" { KeyVAL }
// Operators
| "<" { OpLESS }
| "*" { OpSTAR }
| "-" { OpDASH }
| "+" { OpPLUS }
| "=" { OpEQUAL }
| "==" { OpEQUALEQUAL }
| "=<" { OpEQUALLESS }
| "=>” { OpEQUALGREATER }
| “>” { OpGREATER }
| “>=” { OpGREATEREQUAL }
| “/” { OpSLASH }
| “!=” { OpEXCLAMATIONEQUAL }
// Special
| “(” { SpLEFTPAREN }
| “)” { SpRIGHTPAREN }
| “,” { SpCOMMA }
// others
| digit+ { NUM (Int32.Parse(lexeme lexbuf)) }
| string { STRING ((lexeme lexbuf).Trim([|’\”‘|])) }
| char(char | digit)* { IDENT (lexeme lexbuf) }
| whitespace { scan lexbuf }
| eof { EOF }
Scanner constists of plenty of rules. For each keyword, operator, special character and value type of the language, there is one rule. Each rule starts with ‘|’ character, after that follows regular expression parsing the string and on the right side of the rule, there is a token produced by the scanner. Tokens NUM, STRING and IDENT are generated for integer and string values and identificators and they keep the parsed value (that is specified in the brackets, e.g. NUM(5), IDENT(“x”), etc).
Example:
This scanner parses a string “x = 5 + 3 * 6” into the list of tokens:
[IDENT(“x”), OpEQUAL, NUM(5), OpADD, NUM(3), OpMUL, NUM(6)]
Parser
While this chapter is focused only on the scanner implementation, still parser specification must be created. That’s because in the parser file, there are defined tokens and without tokens definitions scanner wouldn’t be complete.
Here is the parser.fsy file. There aren’t parsing rules yet, those will be added in the next chapter focused on the parser implementation:
// parser.fsy
%token NUM
%token STRING
%token IDENT
// Keywords
%token KeyELSE KeyEND KeyFALSE KeyFUN KeyIF KeyIN KeyLET KeyTHEN KeyTRUE
KeyVAL
// Operators
%token OpLESS OpSTAR OpDASH OpPLUS OpEQUAL OpEQUALEQUAL
OpEQUALLESS OpEQUALGREATER OpGREATER OpGREATEREQUAL OpSLASH
OpEXCLAMATIONEQUAL
// Special
%token SpLEFTPAREN SpRIGHTPAREN SpCOMMA
// others
%token EOF
// ** ignore rest of this file for now
%start start
%type start
%%
start: { “” }
Now scanner implementation is complete and both files can be compiled into .NET using the fslex.exe and fsyacc.exe tools. Those tools are in the C:\Program Files\FSharp-[version]\bin\ folder.
Compiling scanner.fsl and parser.fsy:
fslex.exe scanner.fsl
fsyacc.exe parser.fsy
Two files are generated - scanner.fs and parser.fs that have to be added into the project.

Running a scanner
Here is a function in F# that calls the scanner on given code string and returns list of tokens:
Project
Project source codes are available for download (link is at the end of the chapter). Besides scanner implementation and token definitions, there are also unit test in the ScannerTests.cs.
Conclusion
If the scanner implementation is still not clear to you, you can look at much simpler project which is a parser of arithmetic expressions (download here).

IPL1_01scanner.zip

