Skip to content

Custom language with ANTLR and GoLang

Published: at 02:05 AM

The Vision

In this post we will:

  1. Setup a custom language which can
  1. As part of it we will setup

The Prep

  1. Java should be installed on the system ( for using ANTLR library )
  2. Go should be installed on the system
  3. Basic knowledge of Depth First Search

I will not be covering the installation of these as it varies from system to system.

I am using Java Oracle JDK 21 with sdkman, and Go 1.22 on Mac OS.

To confirm Java & Go are installed properly, use “java -version” & “go version” in your terminal


The Plan

With our input as a simple arithmetic expression, we will pass it through three main components

  1. Lexer: reads our arithmetic expression as raw text, and cuts it up into identifiable tokens like numbers and operators
  2. Parser: takes in the token and constructs the logical structure of things to be executed. ( also known as Abstract Syntax Tree )
  3. Visitor: takes in the logical structure, and executes them however we want them to.

To do all this from scratch would take quite a bit of time, thankfully we have ANTLR, a library which will handle the generation of lexers and parsers given our language’s grammar in a specific format. So our work boils down to two things.

  1. Creating our grammar file - [Lang.g4] in the diagram
  2. Implementing a visitor that executes the logic - [LangVisitor.go] in the diagram

The Setup

Create your folder for the project

Inside root directory -> open terminal and type

go mod init customlang

Download ‘complete ANTLR java binaries jar’ from here ( 2mb ), rename it to ‘antlr.jar’

Inside root directory -> paste the antlr.jar file
Create ‘src’ directory in your root -> this will contain the logic for your project.
Inside src, create a file named ‘Lang.g4’ -> this will contain the grammar of your custom language

Your directory should look like this now:

antlr.jar
go.mod
src
|--Lang.g4

The Grammar

Our grammar will consist of lexer and parser logic in Lang.g4

We will first start with the lexer:

// Lexer
OpenParen: '(';
CloseParen: ')';
Plus: '+';
Minus: '-';
Multiply: '*';
Divide: '/';

// Ignore whitespaces
WhiteSpaces: [\t ]+ -> channel(HIDDEN);

// can be a decimal | integer
DecimalLiteral:
	DecimalIntegerLiteral '.' [0-9]*
	| DecimalIntegerLiteral;

// this looks weird, but its done to ignore numbers like 0002, 0005 ( leading zeros )
fragment DecimalIntegerLiteral: '0' | [1-9] [0-9_]*;

The comments should help, it supports a standard OR operator with ‘0’ (can see its use in decimalLiteral ). We also tell the lexer to ignore tabs & indentations

Next, the tricky parser:

// Parser
unit: OpenParen bracketContent = unit CloseParen
| left = unit ( Divide ) right = unit
| left = unit ( Multiply) right = unit
| left = unit ( Plus) right = unit
| left = unit ( Minus) right = unit
| base = DecimalLiteral
;

What we tell here is that our expression can be either (unit) or unit/unit, unit+unit etc. Where unit can be a decimal number, or any expression itself. Think of it in a recursive manner and it might help.

The ordering is done following BODMAS rules. The higher up the rule is ( eg: unit Divide unit ), the more preference it is given during parsing / creation of the execution tree

To understand this better we can use the antlr visualizer for the expression: 5 + (5/10)*2

The above is a diagram is the execution tree that our visitor is going to traverse. ( the structure/ tree is also known as Abstract Syntax Tree ).

The Build

Create a file named build.sh in the root directory. This will be our script to build the antlr files ( lexer, parser and base visitor ).

paste the below into the file:

java -jar antlr.jar -Dlanguage=Go -Xexact-output-dir -o build -package BaseLang -visitor src/Lang.g4

The above line tells the system to use the antlr.jar file on the grammar file we created and create the lexer, parser and visitors in go lang with the package name of BaseLang.

run the script ( ./build.sh in terminal on MacOs )

if you face a “no permissions” error on Mac while running the script, run ‘chmod +x build.sh’ and try running it again.

You will now see a ‘build’ folder in your root directory with the files we need generated by antlr.

Finally, run go mod tidy to install all the dependencies needed from generating the new files.

The Execution

Inside the src folder, create a file named ‘lang.go’. This will contain the logic to traverse the tree and execute the code.

paste the below contents into the file

package lang

import (
	parser "customlang/build"
	"strconv"

	"github.com/antlr4-go/antlr/v4"
)

type LangVisitor struct {
	parser.BaseLangVisitor
}

func (v *LangVisitor) Visit(tree antlr.ParseTree) interface{} {
	// traverse the tree
	return tree.Accept(v)
}

func (v *LangVisitor) VisitUnit(ctx *parser.UnitContext) interface{} {
	if ctx.OpenParen() != nil && ctx.CloseParen() != nil {

		// [brackets] case
		return ctx.GetBracketContent().Accept(v)

	} else if ctx.GetBase() != nil {

		// [plain number] case
		num, _ := strconv.ParseFloat(ctx.DecimalLiteral().GetText(), 64)
		return num

	} else {

		// [/, *, +, -] case
		// get left and right operands for the operation
		left := ctx.GetLeft().Accept(v).(float64)
		right := ctx.GetRight().Accept(v).(float64)

		if ctx.Divide() != nil {
			return left / right
		} else if ctx.Multiply() != nil {
			return left * right
		} else if ctx.Plus() != nil {
			return left + right
		} else if ctx.Minus() != nil {
			return left - right
		}
	}

	return nil
}

What we do here is we embed the visitor that antlr created, and build upon it. Anytime .Accept(v) is called, that node of the tree is traversed

GetLeft().Accept() and GetRight().Accept() will make calls to VisitUnit function again, and return the evaluations required for the arithmetic operation at hand ( in the style of depth first search )

Finally add a function to the same file that takes in the expression, executes the lexer, parser and visitor and returns the logic.

func Execute(code string) float64 {
	executor := &LangVisitor{}

	inputStream := antlr.NewInputStream(code)
	lexer := parser.NewLangLexer(inputStream)
	commonTokenStream := antlr.NewCommonTokenStream(lexer, 0)
	parser := parser.NewLangParser(commonTokenStream)
	tree := parser.Unit()

	return executor.Visit(tree).(float64)
}

To tie it all together and run it, create a ‘main.go’ file in the root folder.

package main

import (
	lang "customlang/src"
	"fmt"
)

func main() {
	expression := "(5+5.5)/2 + 3"
	ans := lang.Execute(expression)
	fmt.Println(ans)
}

Type the below into the terminal and you should be able to see the result.

go run main.go

Result:

8.25

Just change the expressions in main.go file to test them.

Conclusions

This can be used to build custom languages that might be applicable to use cases like Jira Query Language, Salesforce style custom operations, Excel sheet languages. This combined with Golang could be a neat way to build a very optimized binary executable that can take in string and return results.

Github Link

https://github.com/Manjunatha-b/blog-simple-lang