Saturday, October 21, 2006

Boo Compilation Steps

This post contains notes about how booc compiles boo programs. The boo compilation process is not complicated, but it is mostly undocumented and therefore can only be understood by inspecting the source code. Also, boo's developer (Rodrigo B. de Oliveira) is, sadly, allergic to comments. Here are my detailed observations - more to come later. See also the boo build process. A useful thing to know about the boo codebase is that (so far as I've seen) it follows the java conventions of "one public class per file", "name the file after the class", and "the namespace name corresponds to the actual path in the directory tree". For example, src\Boo.Lang.Compiler\Resources\EmbeddedFileResource.cs contains a class called EmbeddedFileResource in the namespace Boo.Lang.Compiler.Resources. A source file may contain other classes, structs and enums, but they are either internal to the file where they are declared (not used elsewhere), or they have a minor role. BTW: this convention holds for the C# part of the compiler, but I don't yet know about the boo part.


A list follows of things that booc does. To kick things off, Main() calls App.Run().

App.Run():
- I don't know what this is for:
AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(AssemblyResolve);
- CheckBooCompiler(): Emits warning if nonlocal Boo.Lang.Compiler.dll is being used
- _options = new CompilerParameters(false)
- ParseOptions():
- Builds "response file list" (App._responseFileList). It is empty when cmdline
args is just a src file name - _responseFileList is related to args that start
with '@'. The default response list is automatically loaded from booc.rsp by
AddDefaultResponseFile(), unless -noconfig is specified.
- Processes command-line arguments. Most cmdline args alter the configuration
of _options; some alter flags/variables in App itself.
- The list of input files is added (one at a time) with _options.Input.Add()
- Assigns the _options.Pipeline (-p option), or makes default pipeline,
CompileToFile(). In WSA mode,
_options.Pipeline[0] = new Boo.Lang.Parser.WSABooParsingStep();
- debug-steps adds App.StepDebugger as AfterStep event handler.
- prints header ("Boo Compiler version 0.7.6.2237 (CLR v2.0.50727.42)")
- fragility: assumes there are two standard library paths (which are moved to the bottom
of the lib path list)
- creates a Boo.Lang.Compiler.BooCompiler (.cs), "The compiler: a facade to the
CompilerParameters/CompilerContext/Pipeline subsystem."
- Unless -nostdlib specified, calls _options.LoadDefaultReferences() to add default references.
The default references are mscorlib, System, the currently loaded Boo.Lang, and the currently
loaded Boo.Lang.Compiler.
- LoadReferences(): loads assemblies previously requested with -r on cmd line; calls
_options.References.Add() to add each assembly
- CompilerContext context = compiler.Run();
This one line compiles the code and (I think) generates the output assembly, although
Warnings/errors are not printed yet. Run() with no args creates a new CompileUnit (the
top-level AST class) and passes it to
public CompilerContext Run(CompileUnit compileUnit)
- Prints warnings from context.Warnings (a Boo.Lang.Compiler.CompilerWarningCollection) and
errors from context.Errors (a Boo.Lang.Compiler.CompilerErrorCollection)

Boo.Lang.Compiler.BooCompiler.Run(CompileUnit compileUnit):
- Creates a new CompilerContext(_parameters, compileUnit) where _parameters is the
Coo.Lang.Compiler.CompilerParameters object with which the object was originally initialized.
- _parameters.Pipeline.Run(context): (Boo.Lang.Compiler.CompilerPipeline.Run())
- virtual stub: Prepare(context)
- calls RunStep(context, step) for each step in the pipeline. step is a parsing step,
e.g. BooParsingStep
- fires this.BeforeStep event
- calls step.Initialize()
- calls step.Run() & catches exception if any
- fires this.AfterStep event
- calls Dispose() on each step

-------------------------------------------------------------------------------
ASTs:
- Automatically generated classes are named Boo.Lang.Compiler.Ast.Impl.*Impl
- Each has a derived class named Boo.Lang.Compiler.Ast.*
- Boo.Lang.Compiler.Ast.Node is the base class for every node in the AST. Its data content is
protected LexicalInfo _lexicalInfo = LexicalInfo.Empty;
protected SourceLocation _endSourceLocation = LexicalInfo.Empty;
protected Node _parent;
protected string _documentation;
protected System.Collections.Hashtable _annotations = new System.Collections.Hashtable();
protected bool _isSynthetic;
- Top-level AST element is called CompileUnit. At this point I am assuming, since only one
CompileUnit is created by Boo.Lang.Compiler.BooCompiler.Run(), that all boo source files
given to booc are merged into the same CompileUnit object. Data:
protected ModuleCollection _modules;

-------------------------------------------------------------------------------
The standard set of compiler steps follows directly from the implementation of
CompileToFile and its base classes.

CompileToFile's entire implementation is
class CompileToFile : CompileToMemory { CompileToFile() { Add(new SaveAssembly()); } }
inherited from:
class CompileToMemory : Compile { CompileToMemory() { Add(new EmitAssembly()); } }
inherited from:
public class Compile : ResolveExpressions {
public Compile() {
Add(new UnfoldConstants());
Add(new OptimizeIterationStatements());

Add(new CheckIdentifiers());
Add(new StricterErrorChecking());

Add(new ExpandDuckTypedExpressions());

Add(new ProcessAssignmentsToValueTypeMembers());
Add(new ExpandProperties());
Add(new RemoveDeadCode());

Add(new CheckMembersProtectionLevel());

Add(new NormalizeIterationStatements());

Add(new ProcessSharedLocals());
Add(new ProcessClosures());
Add(new ProcessGenerators());

Add(new ExpandVarArgsMethodInvocations());

Add(new InjectCallableConversions());
Add(new ImplementICallableOnCallableDefinitions());

// TODO:
//Add(new InjectCastsAndConversions());
}
}
inherited from:
public class ResolveExpressions : Parse {
public ResolveExpressions() {
Add(new InitializeTypeSystemServices());
Add(new PreErrorChecking());

Add(new MergePartialClasses());

Add(new PreProcessExtensionMethods());
Add(new InitializeNameResolutionService());
Add(new IntroduceGlobalNamespaces());
Add(new TransformCallableDefinitions());
Add(new BindTypeDefinitions());
Add(new BindNamespaces());
Add(new BindBaseTypes());
Add(new BindAndApplyAttributes());
Add(new ExpandMacros());
Add(new IntroduceModuleClasses());
Add(new NormalizeStatementModifiers());
Add(new NormalizeTypeAndMemberDefinitions());

Add(new BindTypeDefinitions());
Add(new BindEnumMembers());
Add(new BindBaseTypes());

Add(new ResolveTypeReferences());

Add(new BindTypeMembers());
Add(new ProcessInheritedAbstractMembers());
Add(new CheckMemberNames());

Add(new ExpandAstLiterals());
Add(new ProcessMethodBodiesWithDuckTyping());
}
}
inherited from:
public class Parse : CompilerPipeline {
public Parse() { Add(NewParserStep()); }
...
...
}

The job of NewParserStep() is to dynamically load Boo.Lang.Parser.dll (using
System.Reflection.Assembly.Load()/LoadFrom()) and to create an instance of
Boo.Lang.Parser.BooParsingStep (using System.Activator). Fragility: the code
relies on the fact that the current assembly is named Boo.Lang.Compiler.dll.

All of the above classes are compiler pipelines, defined in
src\Boo.Lang.Compiler\Pipelines.

inherited from: CompilerPipeline, which serves as a collection of steps and
includes functions to run the steps (Run(), RunStep())