Friday, December 18, 2015

[IDL 2 CPP] Let Bison/Flex Generate C++ Parser

Although C++ isn't a very modern language, C could make you upset even more. I'm developing a translator which could translate IDL (Interactive Data Language) to C++, and had enough of the ancient C because I need various complex data struct to hold all kind of information that needed when doing the translation. So it's absolutely a good idea that let Bison/Flex to generate a C++ parser. But the way of enabling C++ for Bison/Flex is hard indeed. Bison/Flex lack documents that tells you how to develop a complete C++ parser. When you turn on the C++ option for Bison/Flex, tremendous compiling errors are waiting for you. I read so many articles, and want to say thank you to all these authors. And I will share you the whole cook book of letting Bisong/Flex generating C++ Parser.

Download Win Flex Bison
I tried two windows versions. This is the good one to support C++ parser.
http://sourceforge.net/projects/winflexbison/

Option for Bison
These two options tell Bison to generate a C++ parser. The file lalr1.cc is in the win_flex_bison downloaded package.

%language "c++"
%skeleton "lalr1.cc"

Setup Parser's Class Name
The following option give a name to the parser class.

%define parser_class_name "idl_parser"

The parser class would look like as following

  /// A Bison parser.
  class idl_parser
  {

Pass Essential Arguments to the Parser
You have to pass scanner which would provide yylex, to the parser. The parser could call yylex and obtain a token each time. "translator" could hold some important data, but depends on your needs. "arg_yyin" is the input source code file.

%parse-param {yy::IdlTranslator& translator}
%parse-param {yy::IdlScanner& scanner}
%parse-param {std::ifstream* arg_yyin}
%parse-param {std::ofstream* outputStream}

The parser ctor looks like as following.

idl_parser (yy::IdlTranslator& translator_yyarg, yy::IdlScanner& scanner_yyarg, std::ifstream* arg_yyin_yyarg, std::ofstream* outputStream_yyarg);

All arguments passed in, the parser will store them in member variables.

    /* User arguments.  */
    yy::IdlTranslator& translator;
    yy::IdlScanner& scanner;
    std::ifstream* arg_yyin;
    std::ofstream* outputStream;

Tell the Parser the Prototype of the yylex
"yylval" is used to return token string from scanner to parser. "yylloc" is an argument which could tell the current location in the source file. This parameter is enabled by %locations . If you didn't use %locations, then there would be no yylloc.

The following macro definition usually be defined in scanner header file. And included by bison .y file and flex .l file.

#ifndef YY_DECL
# define YY_DECL                                        \
    yy::idl_parser::token_type                         \
    yy::IdlScanner::lex(yy::idl_parser::semantic_type* yylval, yy::idl_parser::location_type *yylloc)

And in your grammar file (.y) define yylex macro.

#define yylex scanner.lex

The parser will call lex as following

YYCDEBUG << "Reading a token: ";
yychar = yylex (&yylval, &yylloc);

By defining the yylex macro, parser actually calls scanner.lex().

Implementing the Scanner
Pay attention to the first part of the following code please. It redefines yyFlexLexer, so the class yyFlexLexer in file "FlexLexer.h" becomes IdlFlexLexer. It very important, it avoids many conflicts. But it's really a poor technique.

The following is part of the scanner's header file.

#ifndef __FLEX_LEXER_H
#define yyFlexLexer IdlFlexLexer
#include "FlexLexer.h"
#undef yyFlexLexer
#endif

#include "idl.tab.hh"


namespace yy
{
    class IdlScanner : public IdlFlexLexer
    {
    public:
        IdlScanner(std::ifstream* arg_yyin);

        virtual ~IdlScanner();

        virtual idl_parser::token_type lex(
            idl_parser::semantic_type* yylval,
            yy::idl_parser::location_type *yylloc
            );
    };
}

Have a Look at the Implementation of IdlScanner::lex()
The file lex.???.cc is the lexer implementation file. The file is generated by flex. In this file, the macro YY_DECL just defined before, is used here for the implementation code.

YY_DECL
{
register yy_state_type yy_current_state;
register char *yy_cp, *yy_bp;
register int yy_act;
...

How does the Input File Passed into Scanner
The scanner class is derived from class yyFlexLexer which is defined in FlexLexer.h. The input file "std::ifstream* in" is actually passed to the parent class via ctor. Remember yyFlexLexer was defined as IdlFlexLexer.

    IdlScanner::IdlScanner(std::ifstream* in)
        : IdlFlexLexer(in)
    {
    }

The implementation of yyFlexLexer ctor is in file lex.???.cc, which is generated by flex. The input file arg_yyin is stored in yyin, a member variable of yyFlexLexer. yyin is used to be a global variable in C lexer version. If yyin is NULL, then the lexer would use stdin as input.

yyFlexLexer::yyFlexLexer( std::istream* arg_yyin, std::ostream* arg_yyout )
{
yyin = arg_yyin;
yyout = arg_yyout;
yy_c_buf_p = 0;
yy_init = 0;
yy_start = 0;
yy_flex_debug = 0;
yylineno = 1; // this will only get updated if %option yylineno

Invoke the Parser
First create the scanner with the input file, and then pass the scanner to the parser. When you call parser.parse(), the parser will call scanner.lex() to get tokens one by one.

        std::ifstream *inputFile = new std::ifstream();
        inputFile->open("some source file here");

        std::ofstream *outputFile = new std::ofstream();
        outputFile->open("some output file here");

        IdlScanner scanner(inputFile);

        idl_parser parser(*this, scanner, inputFile, outputFile);

        int parse_ret = parser.parse();