Abstract Syntax Trees for fun and profit
7th February, 2019
14 min read
Part One - an overview
This is part one of a series of articles about abstract syntax trees and their use in javascript. The scope of this article is a quick introduction to ASTs, babel plugins and some simple 'toy' examples.
I will present the information and instruction using description, code/json and a diagram wherever possible in a bid to make the subject matter easier to comprehend for a wider range of types of learners.
Scope of this article
This is a very light introduction to abstract syntax trees and the transformation of very simple code. Future articles will deal with real-world code, investigate static analysis, and look at approaches to creating babel plugins that could be useful in your own codebase, also - codemods.
Code
A github repo accompanies this article https://github.com/mlennox/abstractsyntaxforfunandprofit
what are abstract syntax trees
These useful data structures represent the abstract structure of source code regardless of the language. This is possible because despite the syntactic differences, all languages have a very large overlap in terms of the code structure they express: variable assignment, conditions, logic branching etc.
An abstract syntax tree can be used to facilitate static analysis of code, rewriting code, compiling code (transform from one language to another), or very commonly in web development - transpiling code (transform from one language to another with similar level of abstraction ie. typescript to javascript, or es6+ to es5).
In this article I'll show some examples of simple source code presented as abstract syntax trees, and also give a working example (see the repo) by building simple babel plugins to transform basic code
Purity of abstraction
The AST examples I'll be showing are not pure abstractions as they contain metadata relating to the source code and the elements are named to reflect javascript syntax. In all respects they are abstract syntax trees and closely follow the EStree spec
The JSON representations in this article were generated by the AST explorer listed in the useful resources
section below.
Useful references
Javascript AST viewer - https://astexplorer.net/
Developer docs for babel plugin development - babel plugin handbook
Babel type reference - https://babeljs.io/docs/en/next/babel-types.html
AST examples
I'll provide some examples here to help visualise the resulting structure when code is parsed into an abstract syntax tree.
The first will change instances of var
to const
and the second will transform an array into an object.
Simple variable assignment
If we take the simple javascript code snippet below and process it with an AST parser.
const willIt = true;
The resulting AST can be expressed in a number of ways, most usefully as JSON . The snippet of code above transformed to an AST is represented by the following JSON.
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 20,
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 20
}
},
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 19,
"loc": {
"start": {
"line": 1,
"column": 6
},
"end": {
"line": 1,
"column": 19
}
},
"id": {
"type": "Identifier",
"start": 6,
"end": 12,
"loc": {
"start": {
"line": 1,
"column": 6
},
"end": {
"line": 1,
"column": 12
},
"identifierName": "willIt"
},
"name": "willIt"
},
"init": {
"type": "BooleanLiteral",
"start": 15,
"end": 19,
"loc": {
"start": {
"line": 1,
"column": 15
},
"end": {
"line": 1,
"column": 19
}
},
"value": true
}
}
],
"kind": "const"
}
],
"sourceType": "module"
}
The JSON is composed of a series of nodes each with a type
property. The JSON below strips all but the type
properties from the JSON above.
{
"type": "Program"
"body": {
"type": "VariableDeclaration"
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier"
},
"init": {
"type": "BooleanLiteral"
}
},
]
}
}
You'll also notice each node contains location data that refers to the position of the associated expression in the source code.
{
"type": "VariableDeclaration",
"start": 0,
"end": 20,
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 20
}
},
}
Also note, and this is the key point of interest for us, the variable declaration signifies the kind of variable - in this case a const
.
{
"type": "VariableDeclaration",
"kind": "const"
}
The graphical representation of the hierarchical nature of the tree is much easier to grasp.
Assigning an object
let favouriteBelt = {
material: "leather",
length: 40
};
The JSON in this case is much more complex. I've omitted some of the properties for clarity.
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "favouriteBelt"
},
"init": {
"type": "ObjectExpression",
"properties": [
{
"type": "ObjectProperty",
"key": {
"type": "Identifier",
"name": "material"
},
"value": {
"type": "StringLiteral",
"value": "leather"
}
},
{
"type": "ObjectProperty",
"key": {
"type": "Identifier",
"name": "length"
},
"value": {
"type": "NumericLiteral",
"value": 40
}
}
]
}
}
],
"kind": "let"
}
And the graphical representation.
You can see that the hierarchy breaks down into familiar arrangements of nodes despite the relative increase in complexity compared to the simple value assignment.
Transforming code
Hopefully, now you have some idea of what an AST looks like and how it relates to the source code. Next, I'll show how you can transform the source code using the AST. The familiar babel
library provides all the tools necessary to parse, transform and re-generate source code so for simplicity the examples provided will be babel plugins.
One caveat, due to the way babel references plugins, these particular plugins can't easily be integrated into your codebase - ideally you would need a publish npm package. The alternative would be to write scripts to move the 'plugin' to a location accessible to babel.
Babel plugins and traversing the AST
Babel plugins use the visitor pattern, an abstraction which facilitates adding extra functionality to objects without requiring a refactor of the original object. The idea is that the object can 'accept' a visitor function that can alter the properties and, as we'll see, the structure of the object.
As the AST is traversed each node gets passed to the babel plugin, a simplified version of which is shown below - an object literal implementing a visitor
property which consists of an object of methods named to match the node it should process. The example here has implemented a visitor that will act on all ArrayExpression
nodes.
const ourCustomVisitor = {
visitor: {
ArrayExpression(path, state) {
// do stuff
},
},
};
When the AST is traversed, data about all corresponding nodes will be passed into the corresponding handler method - the order in which they are passed in, how at what point in the hierarchy and how previous transformations might affect the code are concepts that need to be addressed with real-world code, but the simple, 'flat' examples in this article are chosen to focus on basic concepts.
It is not true to say that the each matching node itself is passed to the handler, each node handler is passed two parameters, path
(which does contain the node) and state
,which are detailed below.
Path
The path is an object that represents the link between nodes. As you alter the AST babel will update the paths between all nodes.
If we take the following example of an ObjectProperty
and the child StringLiteral
value
{
type: "ObjectProperty",
value: {
type: "StringLiteral",
value: "gummi bears"
}
}
The path that represents the relationship between the nodes would be:
{
"parent": {
"type": "ObjectProperty",
:
},
"node": {
"type": "StringLiteral",
"value": "gummi bears"
:
},
:
}
In this case node
is the current element being handled in a StringLiteral
handler in the plugin's visitor:
{
visitor: {
StringLiteral(path) {
// path is:
// {
// "parent": {
// "type": "ObjectProperty",
// :
// },
// "node": {
// "type": "StringLiteral",
// "value": "gummi bears"
// :
// }
// :
// }
}
}
}
Metadata and methods
The path also contains metadata and methods to allow deleting, adding, or updating nodes within the tree.
In the arrayToObject
example in the accompanying repo we use path.replaceWith
inside an ArrayExpression
handler to replace the node defining an array with a node defining an object.
ArrayExpression(path) {
// ArrayExpression has a property 'elements' that contains the array elements
const objectProps = path.node.elements.map((element, index) => {
return new t.objectProperty(new t.stringLiteral(`${index}`), element);
});
path.replaceWith(new t.objectExpression(objectProps));
}
State
This contains details of the plugin - including the visitor declaration, pre and post methods. It also contains details of the file being parsed, the state of the AST etc. These can all be accessed within the plugin. The most commonly used state property is opts
.
Options
If you are running the plugin as part of your babel stack, rather than through the 'runner' in the associated repo, you can provide options to the babel plugins using your .babelrc
file
{
plugins: [
["customPlugin", {
"doIt": true,
"decorate": "paint"
}]
]
}
These options will be available in the plugin from state.opts
.
state.opts === {
doIt: true,
decorate: "paint"
}
Plugin Examples
Keep in mind these are very simple examples that use a single variable assignment meaning we don't need to worry about scope, depth of code blocks etc. Future examples in other articles will use more complex code.
A good starting template for babel plugins is shown below
module.exports = function({ types: t }) {
return {
visitor: {
// add handlers here
},
};
};
Convert var
to const
In this example, I want to build a simple babel plugin to replace any instance of var
with const
in the example code - only var
should be affected.
// this 'var' should be replaced with a 'const'
var itemOne = ['items', 'things', 'gizmos', 'widgets'];
// this will stay a 'let'
let itemTwo = ['tchotchke', 'stuff', 'yokes'];
The AST for the itemOne
variable assignment is presented below. The AST below has all the location information removed for clarity.
{
"type": "VariableDeclaration",
"kind": "var"
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "itemOne"
},
"init": {
"type": "ArrayExpression",
"elements": [
{
"type": "StringLiteral",
"value": "items"
},
{
"type": "StringLiteral",
"value": "things"
},
{
"type": "StringLiteral",
"value": "gizmos"
},
{
"type": "StringLiteral",
"value": "widgets"
}
]
}
}
],
"leadingComments": [
{
"type": "CommentLine",
"value": " this 'var' should be replaced with a 'const'",
}
]
}
The node we are interested in is the top-level node VariableDeclaration
, so lets add a handler for that in the babel plugin
module.exports = function({ types: t }) {
return {
visitor: {
VariableDeclaration(path) {
},
},
};
};
We need to recall that the path
is not the node, but the relationship between nodes and metadata etc. To get at the VariableDeclaration
node we reference path.node
.
Let's have a quick look at the AST again, focussing on the point of interest for us
{
"type": "VariableDeclaration",
"kind": "var",
:
}
We want to update the kind
of variable declaration from a var
to const
. The only other valid option is of course let
. Babel will let you update that to anything you like, which seems like an oversight, I'm actually not sure why they don't throw an error, or limit the values in some way.
The updated plugin that updates the variable declaration to const
and ensures that only var
will be affected. I've removed the types
destructuring as I don't use it in this plugin.
module.exports = function() {
return {
visitor: {
VariableDeclaration(path) {
if (path.node.kind === 'var') {
path.node.kind = 'const';
}
},
},
};
};
You can run this example yourself from the accompanying repo. Assuming you have installed the dependencies with npm install
the command to run the transformation is
node compile.js varToConst vars.source.js
Try messing with the code, adding console.log
to see the structure of the path, change the code in vars.source.js
to see how the result is affected.
Object from Array
While this is slightly more complex than the 'var to const' example, it is still fairly simple. I'll include some diagrams to be sure the transformation is clear.
First, the source code that we will transform.
// we'll convert this from an array to an object literal
// that uses the position in the list as the key
const coins = ['thrupenny', { name: 'penny', value: 'a penny, ya dope' }, 2];
Once the transformation is complete we want to end up with the following.
const coins = {
"0": 'thrupenny',
"1": { name: 'penny', value: 'a penny, ya dope' },
"2": 2
};
This means that we will need to replace the ArrayExpression
with an ObjectExpression
and convert each of the elements
of the ArrayExpression
into an ObjectProperty
.
The AST of the source code is below, with some properties removed for clarity.
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "coins"
},
"init": {
"type": "ArrayExpression",
"elements": [
{
"type": "StringLiteral",
"value": "thrupenny"
},
{
"type": "ObjectExpression",
"properties": [
{
"type": "ObjectProperty",
"key": {
"type": "Identifier",
"name": "name"
},
"value": {
"type": "StringLiteral",
"value": "penny"
}
},
{
"type": "ObjectProperty",
"key": {
"type": "Identifier",
"name": "value"
},
"value": {
"type": "StringLiteral",
"value": "a penny, ya dope"
}
}
]
},
{
"type": "NumericLiteral",
"value": 2
}
]
}
}
],
"kind": "const"
}
Also, a simplified diagram of the AST showing each element - the ObjectExpression
in the second element has also been simplified for clarity.
I'm interested in the elements
of the ArrayExpression
. I will take each element and construct an ObjectProperty
that uses a StringLiteral
of the array index of the element as the Identifier
and uses the element itself as the value. Focussing on the first element in the array
// const coins = ['thrupenny', { name: 'penny', value: 'a penny, ya dope' }, 2];
{
"type": "StringLiteral",
"value": "thrupenny"
},
The index is zero, so the ObjectProperty - constructed here using babel.types
- looks like
const firstArrayElement = path.node.elements[0];
const firstObjectProperty = new t.objectProperty(new t.stringLiteral(`0`), firstArrayElement);
Although the other elements are different types, the approach is the same. The elements don't need any extra processing to convert them to a different type so we can convert the Array elements into Object properties in one step, using Array.map
const objectProps = path.node.elements.map((element, index) => {
return new t.objectProperty(new t.stringLiteral(`${index}`), element);
});
A simplified diagram of the resulting AST is shown below. The blue elements have all been created by the code outlined above:
The last step is to replace the ArrayExpression
node with an ObjectExpression
constructed using the new array of ObjectProperty
. Luckily the path
includes a number of methods to help in transforming the AST, including replaceWith(replacementNode)
which swaps the current node for the node provided as a param.
Constructing the ObjectExpression
is simple
const objectExpression = new t.objectExpression(objectProps);
Then I can use the replaceWith
method to swap out the ArrayExpression
for the new ObjectExpression
path.replaceWith(objectExpression);
Which will generate the expected result
const coins = {
"0": 'thrupenny',
"1": { name: 'penny', value: 'a penny, ya dope' },
"2": 2
};
You can run this example yourself from the accompanying repo. Assuming you have installed the dependencies with npm install
the command to run the transformation is
node compile.js arrayToObject array.source.js
Other blog posts in this series
AST selectors rule
April 28th 2019 - 10 min read
My previous article on abstract syntax trees ran through a quick, but relatively broad, overview of syntax trees and how to manipulate them. This second article will show you how to use a basic…