Calculator Evaluator

🚧 This is a work in progress, so it may not be complete or finalised.

We are going to create a small project, that can lex, parse and evaluate an expression for a basic calculator. We will try and keep this nice and easy and so we are going to merge a few things together for simplicity. You can see the full example at the end.

Getting Started

With this example, we can just use a single script that I am going to call calculator.c3. If you want to use a project, you can do c3c init calculator, move into the directory and open src/main.c3 in your editor.

To begin, we will import everything the project requires up front, so we don't need to worry about this later. We will also write up our main function too:

module project::calculator;

import std::io;
import std::core::string;
import std::core::mem;
import std::core::mem::allocator;
import std::collections::range;

def CharRange = range::Range(<char>);
CharRange range = { '0', '9' };

fn void main() {
    String expression = "2 + 3 * 4 / 5";
    float! result = calculate(expression);
    if (catch excuse = result) {
        io::printfn("Failed with error: %s", excuse);
        return;
    }
    io::printfn("Result = %s", result);
}

We import all the necessary modules required for the project, which includes allocators, string functions, io for printing and the range type just to make lexing numbers simpler. In our main function, we create an expression variable, which will be the code our evaluator runs. Next we pass it to the calculate function, which returns an optional float, as this can fail. We handle the error case, then print the result if all goes well. This will make more sense once we get to implementing the code.

I've opted to write my own lexer and parser, as using a library is overkill. Writing your own lexer and parser is also fun and can also be quite simple.

Tokens

Tokens are a small structure that are used to annotate pieces of code. In a programming language, these can be things like a keyword, operators, identifiers etc. Since we're making a calculator, this structure will be pretty simple, as we only have operators and numbers.

enum TokenKind : char {
    NUMBER,
    PLUS,
    MINUS,
    STAR,
    SLASH,
    EOF,
}

struct Token {
    TokenKind kind;
    String lexeme;
}

Our TokenKind is a simple enum, which will denote what the token represents. We also include EOF as a way to know we're done. The Token is also quite simple in this case, we have the kind and the lexeme. The lexeme will be a slice that we will use to parse the float for our numbers. We can also use this to print nicer error messages if we wanted to.

The "Lexer"

Our lexer is the system that will take our source and convert it into a stream of tokens. This is the first merge we will do with our types, where it will actually act as the parser as well. For such a small example like our calculator, this is fine, but you might want to have a lexer and parser seperately for more serious projects. Here is how we will define the lexer/parser combo:

struct Parser {
    String source;
    usz ip;
    Token current;
}

Quite a small structure too, as it takes our source, the ip for keeping track of where we are in the source, then a Token that we will use later for parsing. Let's create a small helper function to create a parser for us:

fn Parser new_parser(String source) {
    return { source, 0, { EOF, "" } };
}

We simply pass the source in, then 0 out the rest of the fields. We put a dummy token in for the current token field, as we will set it later. Before we write the lexing function, we will setup a few methods to make lexing easier.

fn char Parser.peek(&self) @inline {
    if (self.at_end()) {
        return '\0';
    }
    return self.source[self.ip];
}

fn void Parser.advance(&self) @inline  {
    self.ip++;
}

fn bool Parser.at_end(&self) @inline  {
    return self.ip >= self.source.len;
}

fn void Parser.skip_whitespace(&self) {
    // Basic whitespace
    while (!self.at_end() && self.peek() == ' ') {
        self.advance();
    }
}

These are some pretty simple methods, to help us get the current char, advance, check if we're at the end of the source and for skipping whitespace.

In a more complex lexer, we would account for tabs, newlines, breaks etc in our skip_whitespace. We might also have variable peek to look-ahead as well as viewing the current char.

Now for the meat of our lexer, the function that will generate a token. For our calculator, we can write this pretty easily.

fn Token! Parser.get_token(&self) {
    self.skip_whitespace();
    if (self.at_end()) {
        return { EOF, "" };
    }

    switch (self.source[self.ip]) {
        // Operators
        case '+':
            self.advance();
            return { PLUS, "+" };

        case '-':
            self.advance();
            return { MINUS, "-" };

        case '*':
            self.advance();
            return { STAR, "*" };

        case '/':
            self.advance();
            return { SLASH, "/" };

        // Numbers
        case '0'..'9': return self.make_number()!;

        // Unknown character found
        default: return EvalErr.BAD_SYNTAX?;
    }
}

This will skip whitespace, if we're at the end, then generate an EOF token. Then we check the current char and see whether it's an operator, or if it's a number. Otherwise, we will return a fault of BAD_SYNTAX. You may also notice the make_number method and we will implement that shortly, but let us create the fault.

fault EvalErr {
    BAD_SYNTAX,
}

Nothing too crazy for our fault here. We will also re-use this in our parser, when we get an unexpected symbol. Back to the lexer! We can now look at lexing a number:

fn Token! Parser.make_number(&self) {
    usz start = self.ip;
    self.advance(); // Increment as we know the first character

    // Consume numbers
    while (!self.at_end() && range.contains(self.peek())) {
        self.advance();
    }

    if (self.peek() == '.') {
        // Decimal number
        self.advance();

        // Consume numbers again
        while (!self.at_end() && range.contains(self.peek())) {
            self.advance();
        }
    }

    return { NUMBER, (String)self.source[start..self.ip-1] };
}

Almost the same size as our get_token method itself! We capture the current index of our source and then advance, as we have already checked the first character. We then advance as long as there is a number as the current char. Then we check for a . to support basic decimal numbers, then advance again for trailing numbers. After that, we then slice our source to capture the number, using the start variable we created for our anchor.

That's the end of our lexing methods and now we head over to parsing.

Parsing Expressions

This is where the fun begins! Parsing is the step where our syntax matters. For a programming language, this is how we might collect variables, functions, structures etc into an AST. We will still do this for our calculator, but our AST will be pretty simple. All we need for the calculator, is a binaryop and a literal.

As an exercise, you can also implement unary expressions, like negative numbers.

Since our AST is simple, we can define it in a few lines of code.

enum NodeType : char {
    BINARY,
    LITERAL,
}

struct BinaryOp {
    Token op;
    Node *lhs;
    Node *rhs;
}

struct Node {
    NodeType type;
    union data {
        BinaryOp binary;
        float number;
    }
}

We create an enum so we can tag our Node to know what it contains. Our BinaryOp is a node that represents something like 1 + 2. Then we have our number which is just a float that we will collect from our token.

As before with our lexer, we will create a helper method for consuming tokens:

fn Token! Parser.consume(&self, TokenKind kind) {
    if (self.current.kind == kind) {
        Token current = self.current;
        self.current = self.get_token()!;
        return current;
    }

    // Basic error
    io::printfn("Error: Expected '%s' but received '%s'", kind, self.current.kind);
    return EvalErr.BAD_SYNTAX?;
}

If the current token in our parser matches the kind provided, it will consume the token, fetch the next token and return the old token. This is so we can capture the token and use it later. If the token doesn't match, then we print a basic error message and return a fault. This is the same fault used before in our get_token.

This will be a large section of code, but it will implement the entire parsing of expressions.

fn Node*! Parser.parse(&self) {
    return self.term()!;
}

fn Node*! Parser.term(&self) {
    Node *node = self.factor()!;

    while (self.current.kind == PLUS || self.current.kind == MINUS) {
        Token op = self.consume(self.current.kind)!;
        node = mem::new(Node, {
            NodeType.BINARY,
            {
                .binary = {
                    op,
                    node,
                    self.factor()!,
                },
            },
        });
    }

    return node;
}

fn Node*! Parser.factor(&self) {
    Node *node = self.primary()!;

    while (self.current.kind == STAR || self.current.kind == SLASH) {
        Token op = self.consume(self.current.kind)!;
        node = mem::new(Node, {
            NodeType.BINARY,
            {
                .binary = {
                    op,
                    node,
                    self.primary()!,
                },
            },
        });
    }

    return node;
}

fn Node*! Parser.primary(&self) {
    // We only have one value, so we make sure it's a number and return
    if (self.current.kind != NUMBER) {
        io::printfn("Error: Expected number but received '%s'", self.current.kind);
        return EvalErr.BAD_SYNTAX?;
    }

    Token t = self.consume(NUMBER)!;

    return mem::new(Node, {
        NodeType.LITERAL,
        {
            .number = t.lexeme.to_float()!!,
        },
    });
}

We are using a recursive descent parser, so that means we call one function after another and recursively (basically). So we start in parse, which then calls term, then factor then primary. So on a call to parse, we start in parse and end up in primary, then work our way back up the call stack. In both term and factor we have similar code checking for operators, then setting the node to a BinaryOp. Notice how we pass node in as the second argument, then call the next function down eg. term will call factor. This is used for parsing precedence, so we can evaluate our expression correctly.

That is now the end of the parser. A simple helper function for consuming tokens, then the parsing functions.

Evaluating our AST

To start our evaluation, we will implement the calculate function we saw in main. This is our entry that will handle parsing and evaluating.

fn float! calculate(String source) {
    DynamicArenaAllocator dynamic_arena;
    defer dynamic_arena.free();
    
    dynamic_arena.init(1024, allocator::heap());

    // Create our parser
    Parser parser = new_parser(source);
    // Get initial token
    parser.current = parser.get_token()!;

    mem::@scoped(&dynamic_arena) {
        // Parse and evaluate
        Node *root = parser.parse()!;
        return evaluate(root);
    };
}

To keep our memory handling simple, we use the DynamicArenaAllocator which will expand as we need and free all the memory at the end. Notice the mem::@scoped(&dynamic_arena). This is where our mem::new calls end up allocating to. Let's dive into evaluate:

fn float evaluate(Node *node) {
    switch (node.type) {
        case NodeType.BINARY: return evaluate_binary(node);
        case NodeType.LITERAL: return evaluate_literal(node);
        default: unreachable("UNKNOWN NODE");
    }
}

This function is pretty safe, as the unreachable means we probably haven't implemented something. Our calculator only has two nodes, so our cases are covered here. Something to note, is that every evaluate_* function returns a float. Since this is our main value type in the calculator, we keep things simple.

In a scripting language using tree-walk evaluation (which is what this implementation is), we might use something like a Value type to express more complex types like strings, bools and functions.

We can take a look at the simplest function: evaluate_literal

fn float evaluate_literal(Node *node) @inline
    => node.data.number;

Since our AST node for the literal holds the value, we can simply return the number. Next is the evaluate_binary function, which will do the work of operators:

fn float evaluate_binary(Node *node) {
    BinaryOp *bin = &node.data.binary;
    switch (bin.op.kind) {
        case PLUS:  return evaluate(bin.lhs) + evaluate(bin.rhs);
        case MINUS: return evaluate(bin.lhs) - evaluate(bin.rhs);
        case STAR:  return evaluate(bin.lhs) * evaluate(bin.rhs);
        case SLASH: return evaluate(bin.lhs) / evaluate(bin.rhs);
        default: unreachable("UNKNOWN OPERATOR");
    }
}

For sanity reasons, we take reference to the binary field so we can juse use a variable throughout. We switch on the operator's token kind, then return the evaluation of lhs <op> rhs. If we were to implement another operator, our unreachable will trigger to let you know at run-time.

And that's it! We've implemented a basic parser and evaluator for a calculator! Below is the result of running our code and the full source in case something is broken. If you want to try more examples, change the expression variable to something else and see what happens.

If you want to do more, try to implement negative numbers and the % operator. Languages are cool and if you thought this was fun, take a look at Crafting Interpreters by Bob Nystrom.

Result

Running the project:

$ c3c run

Using the compiler directly:

$ c3c compile-run --run-once calculator.c3

Output:

Program linked to executable 'calculator'.
Launching ./calculator
Result = 4.400000
Program completed with exit code 0.

Final Code

module project::calculator;

import std::io;
import std::core::string;
import std::core::mem;
import std::core::mem::allocator;
import std::collections::range;

def CharRange = range::Range(<char>);
CharRange range = { '0', '9' };

fn void main() {
    String expression = "2 + 3 * 4 / 5";
    float! result = calculate(expression);
    if (catch excuse = result) {
        io::printfn("Failed with error: %s", excuse);
        return;
    }
    io::printfn("Result = %s", result);
}

fault EvalErr {
    BAD_SYNTAX,
}

struct Parser {
    String source;
    usz ip;
    Token current;
}

enum TokenKind : char {
    NUMBER,
    PLUS,
    MINUS,
    STAR,
    SLASH,
    EOF,
}

struct Token {
    TokenKind kind;
    String lexeme;
}

enum NodeType : char {
    BINARY,
    LITERAL,
}

struct BinaryOp {
    Token op;
    Node *lhs;
    Node *rhs;
}

struct Node {
    NodeType type;
    union data {
        BinaryOp binary;
        float number;
    }
}

fn Parser new_parser(String source) {
    return { source, 0, { EOF, "" } };
}

fn char Parser.peek(&self) @inline {
    if (self.at_end()) {
        return '\0';
    }
    return self.source[self.ip];
}

fn void Parser.advance(&self) @inline  {
    self.ip++;
}

fn bool Parser.at_end(&self) @inline  {
    return self.ip >= self.source.len;
}

fn void Parser.skip_whitespace(&self) {
    // Basic whitespace
    while (!self.at_end() && self.peek() == ' ') {
        self.advance();
    }
}

fn Token! Parser.get_token(&self) {
    self.skip_whitespace();
    if (self.at_end()) {
        return { EOF, "" };
    }

    switch (self.source[self.ip]) {
        // Operators
        case '+':
            self.advance();
            return { PLUS, "+" };

        case '-':
            self.advance();
            return { MINUS, "-" };

        case '*':
            self.advance();
            return { STAR, "*" };

        case '/':
            self.advance();
            return { SLASH, "/" };

        // Numbers
        case '0'..'9': return self.make_number()!;

        // Unknown character found
        default: return EvalErr.BAD_SYNTAX?;
    }
}

fn Token! Parser.make_number(&self) {
    usz start = self.ip;
    self.advance(); // Increment as we know the first character

    // Consume numbers
    while (!self.at_end() && range.contains(self.peek())) {
        self.advance();
    }

    if (self.peek() == '.') {
        // Decimal number
        self.advance();

        // Consume numbers again
        while (!self.at_end() && range.contains(self.peek())) {
            self.advance();
        }
    }

    return { NUMBER, (String)self.source[start..self.ip-1] };
}

fn Token! Parser.consume(&self, TokenKind kind) {
    if (self.current.kind == kind) {
        Token current = self.current;
        self.current = self.get_token()!;
        return current;
    }

    // Basic error
    io::printfn("Error: Expected '%s' but received '%s'", kind, self.current.kind);
    return EvalErr.BAD_SYNTAX?;
}

fn Node*! Parser.parse(&self) {
    return self.term()!;
}

fn Node*! Parser.term(&self) {
    Node *node = self.factor()!;

    while (self.current.kind == PLUS || self.current.kind == MINUS) {
        Token op = self.consume(self.current.kind)!;
        node = mem::new(Node, {
            NodeType.BINARY,
            {
                .binary = {
                    op,
                    node,
                    self.factor()!,
                },
            },
        });
    }

    return node;
}

fn Node*! Parser.factor(&self) {
    Node *node = self.primary()!;

    while (self.current.kind == STAR || self.current.kind == SLASH) {
        Token op = self.consume(self.current.kind)!;
        node = mem::new(Node, {
            NodeType.BINARY,
            {
                .binary = {
                    op,
                    node,
                    self.primary()!,
                },
            },
        });
    }

    return node;
}

fn Node*! Parser.primary(&self) {
    // We only have one value, so we make sure it's a number and return
    if (self.current.kind != NUMBER) {
        io::printfn("Error: Expected number but received '%s'", self.current.kind);
        return EvalErr.BAD_SYNTAX?;
    }

    Token t = self.consume(NUMBER)!;

    return mem::new(Node, {
        NodeType.LITERAL,
        {
            .number = t.lexeme.to_float()!!,
        },
    });
}

fn float evaluate(Node *node) {
    switch (node.type) {
        case NodeType.BINARY: return evaluate_binary(node);
        case NodeType.LITERAL: return evaluate_literal(node);
        default: unreachable("UNKNOWN NODE");
    }
}

fn float evaluate_binary(Node *node) {
    BinaryOp *bin = &node.data.binary;
    switch (bin.op.kind) {
        case PLUS:  return evaluate(bin.lhs) + evaluate(bin.rhs);
        case MINUS: return evaluate(bin.lhs) - evaluate(bin.rhs);
        case STAR:  return evaluate(bin.lhs) * evaluate(bin.rhs);
        case SLASH: return evaluate(bin.lhs) / evaluate(bin.rhs);
        default: unreachable("UNKNOWN OPERATOR");
    }
}

fn float evaluate_literal(Node *node) @inline
    => node.data.number;

fn float! calculate(String source) {
    DynamicArenaAllocator dynamic_arena;
    defer dynamic_arena.free();
    
    dynamic_arena.init(1024, allocator::heap());

    // Create our parser
    Parser parser = new_parser(source);
    // Get initial token
    parser.current = parser.get_token()!;

    mem::@scoped(&dynamic_arena) {
        // Parse and evaluate
        Node *root = parser.parse()!;
        return evaluate(root);
    };
}