The Vision
In this post we will:
- Setup a custom language which can
- evaluate basic arithmetic expressions such as (5 + (5*2)/10)
- it will only support number operands ( integer or decimal )
- it will only support [() * / - +] operators
- As part of it we will setup
- grammar for the language ( using ANTLR )
- parsing of the language ( using ANTLR )
- execution logic for the language ( using Go )
The Prep
- Java should be installed on the system ( for using ANTLR library )
- Go should be installed on the system
- Basic knowledge of Depth First Search
I will not be covering the installation of these as it varies from system to system.
I am using Java Oracle JDK 21 with sdkman, and Go 1.22 on Mac OS.
To confirm Java & Go are installed properly, use “java -version” & “go version” in your terminal
The Plan
With our input as a simple arithmetic expression, we will pass it through three main components
- Lexer: reads our arithmetic expression as raw text, and cuts it up into identifiable tokens like numbers and operators
- Parser: takes in the token and constructs the logical structure of things to be executed. ( also known as Abstract Syntax Tree )
- Visitor: takes in the logical structure, and executes them however we want them to.
To do all this from scratch would take quite a bit of time, thankfully we have ANTLR, a library which will handle the generation of lexers and parsers given our language’s grammar in a specific format. So our work boils down to two things.
- Creating our grammar file - [Lang.g4] in the diagram
- Implementing a visitor that executes the logic - [LangVisitor.go] in the diagram
The Setup
Create your folder for the project
Inside root directory -> open terminal and type
go mod init customlang
Download ‘complete ANTLR java binaries jar’ from here ( 2mb ), rename it to ‘antlr.jar’
Inside root directory -> paste the antlr.jar file
Create ‘src’ directory in your root -> this will contain the logic for your project.
Inside src, create a file named ‘Lang.g4’ -> this will contain the grammar of your custom language
Your directory should look like this now:
antlr.jar
go.mod
src
|--Lang.g4
The Grammar
Our grammar will consist of lexer and parser logic in Lang.g4
We will first start with the lexer:
// Lexer
OpenParen: '(';
CloseParen: ')';
Plus: '+';
Minus: '-';
Multiply: '*';
Divide: '/';
// Ignore whitespaces
WhiteSpaces: [\t ]+ -> channel(HIDDEN);
// can be a decimal | integer
DecimalLiteral:
DecimalIntegerLiteral '.' [0-9]*
| DecimalIntegerLiteral;
// this looks weird, but its done to ignore numbers like 0002, 0005 ( leading zeros )
fragment DecimalIntegerLiteral: '0' | [1-9] [0-9_]*;
The comments should help, it supports a standard OR operator with ‘0’ (can see its use in decimalLiteral ). We also tell the lexer to ignore tabs & indentations
Next, the tricky parser:
// Parser
unit: OpenParen bracketContent = unit CloseParen
| left = unit ( Divide ) right = unit
| left = unit ( Multiply) right = unit
| left = unit ( Plus) right = unit
| left = unit ( Minus) right = unit
| base = DecimalLiteral
;
What we tell here is that our expression can be either (unit) or unit/unit, unit+unit etc. Where unit can be a decimal number, or any expression itself. Think of it in a recursive manner and it might help.
The ordering is done following BODMAS rules. The higher up the rule is ( eg: unit Divide unit ), the more preference it is given during parsing / creation of the execution tree
To understand this better we can use the antlr visualizer for the expression: 5 + (5/10)*2
The above is a diagram is the execution tree that our visitor is going to traverse. ( the structure/ tree is also known as Abstract Syntax Tree ).
The Build
Create a file named build.sh in the root directory.
This will be our script to build the antlr files ( lexer, parser and base visitor ).
paste the below into the file:
java -jar antlr.jar -Dlanguage=Go -Xexact-output-dir -o build -package BaseLang -visitor src/Lang.g4
The above line tells the system to use the antlr.jar file on the grammar file we created and create the lexer, parser and visitors in go lang with the package name of BaseLang.
run the script ( ./build.sh in terminal on MacOs )
if you face a “no permissions” error on Mac while running the script, run ‘chmod +x build.sh’ and try running it again.
You will now see a ‘build’ folder in your root directory with the files we need generated by antlr.
Finally, run go mod tidy to install all the dependencies needed from generating the new files.
The Execution
Inside the src folder, create a file named ‘lang.go’. This will contain the logic to traverse the tree and execute the code.
paste the below contents into the file
package lang
import (
parser "customlang/build"
"strconv"
"github.com/antlr4-go/antlr/v4"
)
type LangVisitor struct {
parser.BaseLangVisitor
}
func (v *LangVisitor) Visit(tree antlr.ParseTree) interface{} {
// traverse the tree
return tree.Accept(v)
}
func (v *LangVisitor) VisitUnit(ctx *parser.UnitContext) interface{} {
if ctx.OpenParen() != nil && ctx.CloseParen() != nil {
// [brackets] case
return ctx.GetBracketContent().Accept(v)
} else if ctx.GetBase() != nil {
// [plain number] case
num, _ := strconv.ParseFloat(ctx.DecimalLiteral().GetText(), 64)
return num
} else {
// [/, *, +, -] case
// get left and right operands for the operation
left := ctx.GetLeft().Accept(v).(float64)
right := ctx.GetRight().Accept(v).(float64)
if ctx.Divide() != nil {
return left / right
} else if ctx.Multiply() != nil {
return left * right
} else if ctx.Plus() != nil {
return left + right
} else if ctx.Minus() != nil {
return left - right
}
}
return nil
}
What we do here is we embed the visitor that antlr created, and build upon it. Anytime .Accept(v) is called, that node of the tree is traversed
GetLeft().Accept() and GetRight().Accept() will make calls to VisitUnit function again, and return the evaluations required for the arithmetic operation at hand ( in the style of depth first search )
Finally add a function to the same file that takes in the expression, executes the lexer, parser and visitor and returns the logic.
func Execute(code string) float64 {
executor := &LangVisitor{}
inputStream := antlr.NewInputStream(code)
lexer := parser.NewLangLexer(inputStream)
commonTokenStream := antlr.NewCommonTokenStream(lexer, 0)
parser := parser.NewLangParser(commonTokenStream)
tree := parser.Unit()
return executor.Visit(tree).(float64)
}
To tie it all together and run it, create a ‘main.go’ file in the root folder.
package main
import (
lang "customlang/src"
"fmt"
)
func main() {
expression := "(5+5.5)/2 + 3"
ans := lang.Execute(expression)
fmt.Println(ans)
}
Type the below into the terminal and you should be able to see the result.
go run main.go
Result:
8.25
Just change the expressions in main.go file to test them.
Conclusions
This can be used to build custom languages that might be applicable to use cases like Jira Query Language, Salesforce style custom operations, Excel sheet languages. This combined with Golang could be a neat way to build a very optimized binary executable that can take in string and return results.