C#, C++ and Rotor (2)
November 06, 2005 10:30To see how the C# compiler works, what is better to go and look at the sources? However, if you have ever written a compiler, you can understand that figure out who calls who is not so simple. I decided to simplify my investigation a little with the help of.. a debugger =)
I set two breakpoints in two functions that I believed to be correlated, but that seemed to not came in touch nowhere in the code. It was strange, because the two are parse function and are in the same class.
I placed the first breakpoint in ParseNamespaceBody. As you can see in parser.cpp, this function will in turn call ParseNamaspace, ParseClass,
which in turn will call ParseMethod, ParseAccessor, ParseConstructor...
The second breakpoint was placed instead in ParseBlock. This is the function that parses a block of code, i.e. all the statements enclosed by two curly braces { }.
The interesting thing is that the two breakpoints are not hit subsequentely, but in different moments with different stack traces.
This is the stack trace for the first breakpoint:
0006efe8 531fa3ad cscomp!CParser::ParseNamespaceBody+0x74 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\parser.cpp @ 1627]
0006effc 53216891 cscomp!CParser::ParseSourceModule+0x8d [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\parser.cpp @ 1602]
0006f18c 5320c226 cscomp!CSourceModule::ParseTopLevel+0x691 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3331]
0006f19c 53175940 cscomp!CSourceData::ParseTopLevel+0x16 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcdata.cpp @ 171]
0006f234 53175bd7 cscomp!COMPILER::ParseOneFile+0x130 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1197]
0006f284 53174599 cscomp!COMPILER::DeclareOneFile+0x27 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1228]
0006f324 5317767a cscomp!COMPILER::DeclareTypes+0x89 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 821]
0006f4d0 53173987 cscomp!COMPILER::CompileAll+0x13a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1676]
0006f65c 5317d0dd cscomp!COMPILER::Compile+0xd7 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 704]
0006fe48 0040db29 cscomp!CController::Compile+0xdd [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\controller.cpp @ 225]
0006ff50 0040e7ac csc!main+0x349 [d:\cxc\sscli\clr\src\csharp\csharp\scc\scc.cpp @ 2270]
0006ff64 79c8138d csc!run_main+0x1c [d:\cxc\sscli\pal\win32\crtstartup.c @ 49]
0006ff88 0040e70b rotor_pal!PAL_LocalFrame+0x3d [d:\cxc\sscli\pal\win32\exception.c @ 600]
0006ffc0 7c816d4f csc!mainCRTStartup+0x7b [d:\cxc\sscli\pal\win32\crtstartup.c @ 66]
0006fff0 00000000 kernel32!BaseProcessStart+0x23
And this is the trace for the second one:
0006eed4 5321849c cscomp!CSourceModule::ParseInteriorNode+0xbc [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 4576]
0006ef7c 531e163b cscomp!CSourceModule::GetInteriorNode+0x32c [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3863]
0006ef98 531e14fb cscomp!CInteriorTree::Initialize+0x4b [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\inttree.cpp @ 80]
0006efc8 53217fe4 cscomp!CInteriorTree::CreateInstance+0x6b [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\inttree.cpp @ 61]
0006f068 5320c28a cscomp!CSourceModule::GetInteriorParseTree+0xd4 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3728]
0006f07c 5316d1e8 cscomp!CSourceData::GetInteriorParseTree+0x1a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcdata.cpp @ 187]
0006f1f0 5316c44a cscomp!CLSDREC::compileMethod+0x6c8 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6832]
0006f220 5316b728 cscomp!CLSDREC::CompileMember+0x9a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6569]
0006f268 5316c7b2 cscomp!CLSDREC::EnumMembersInEmitOrder+0x228 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6327]
0006f2f0 5316c084 cscomp!CLSDREC::compileAggregate+0x252 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6657]
0006f324 53177f1b cscomp!CLSDREC::compileNamespace+0xc4 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6523]
0006f4d0 53173987 cscomp!COMPILER::CompileAll+0x9db [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1922]
0006f65c 5317d0dd cscomp!COMPILER::Compile+0xd7 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 704]
0006fe48 0040db29 cscomp!CController::Compile+0xdd [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\controller.cpp @ 225]
0006ff50 0040e7ac csc!main+0x349 [d:\cxc\sscli\clr\src\csharp\csharp\scc\scc.cpp @ 2270]
0006ff64 79c8138d csc!run_main+0x1c [d:\cxc\sscli\pal\win32\crtstartup.c @ 49]
0006ff88 0040e70b rotor_pal!PAL_LocalFrame+0x3d [d:\cxc\sscli\pal\win32\exception.c @ 600]
0006ffc0 7c816d4f csc!mainCRTStartup+0x7b [d:\cxc\sscli\pal\win32\crtstartup.c @ 66]
0006fff0 00000000 kernel32!BaseProcessStart+0x23
As you can see, the lowest common ancestor is COMPILER::CompileAll. Then the traces diverge: the first one calls DeclareTypes, the second one compileNamespace. I started to guess what was happening, but I wanted to see it clearly.
So I examined the two functions and the functions called by them. The ParseMethod, ParseAccessor, ParseConstructor functions parse the declaration of classes, functions and properties. But then, when they reach the open curly brace, they stop. Or, better, they call a SkipBlock function that searches the body of a class/method/etc for other declaration (inner classes, for example) but skips the statements. They record the start of the method (the position af the first curly) and return.
Digging into ParseBlock, I saw that it will in turn call the function responsible to generate the intermediate code, a parse tree made of STATEMENTNODEs
STATEMENTNODE **ppNext = &pBlock->pStatements;
while (CurToken() != TID_ENDFILE)
{
if ((iClose != -1 && Mark() >= iClose) ||
(iClose == -1 &&
(CurToken() == TID_CLOSECURLY ||
CurToken() == TID_FINALLY ||
CurToken() == TID_CATCH)))
break;
if (CurToken() == TID_CLOSECURLY) {
Eat (TID_OPENCURLY);
NextToken();
continue;
}
*ppNext = ParseStatement (pBlock);
ppNext = &(*ppNext)->pNext;
ASSERT (*ppNext == NULL);
}
The ParseStatement function will drive the parsing calling every time the right function among
CallStatementParser
ParseLabeledStatement
ParseDeclarationStatement
ParseExpressionStatement
So this second phase will finally parse the statements. The pertinent code is in the ParseInteriorNode function:
m_pParser->Rewind (pMethod->iCond);
pMethod->pBody = (BLOCKNODE *)m_pParser->ParseBlock (pMethod);
Now, we can finally see what happens and why a C# compiler works without forward declarations or without the separation of declaration and implementation: it is a two-phase parser. The first phase collects informations about symbols (classes and function names) and builds the symbol table.
The second phase uses the symbol table (already complete) for name resolution, than builds the complete parse tree.