Age | Commit message (Collapse) | Author |
|
Since the next step to pipa programming language is about having control
flow statements we could benefit ourselves by having a block node to
control scope. Now, functions has a block node, instead of an vector as
body. As you can see through the ast-dump:
FunctionDecl name='main'
└─ body:
└─ Block
└─ ReturnStmt
└─ Literal type=i32 value='69'
This same node kind can be used for parsing if, for and while blocks.
I could use ast_block_t as body for functions but instead, I opted to
use an ast_node_t. This brings the flexibility to, in the future, having
another function body kinds, such as arrow functions if we want to:
fn add(a: i32, b: i32): i32 => a + b;
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
After this commit, this is a valid expression:
1 || 2 && 3 > 4 < 5 >= 6 <= 7 == 8 != 9
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
When assign a variable or returning a value it now ensures that the
expression matches the expected type.
To make this possible a %result_type% field was added to ast_node_t and
this field is used whenever to make the comparison.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit introduces a new type for booleans. There is no code
generation for this type yet. The intention of this commit is to enable
flow control in the near future.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The followed logic operators were added to lexer:
TOKEN_EQUAL ==
TOKEN_NOT !
TOKEN_NOT_EQUAL !=
TOKEN_GT >
TOKEN_GT_EQUAL >=
TOKEN_LT <
TOKEN_LT_EQUAL <=
TOKEN_AND &&
TOKEN_OR ||
Bitwise operators were also added
TOKEN_BITWISE_AND &
TOKEN_BITWISE_OR |
TOKEN_BITWISE_SHIFT_LEFT <<
TOKEN_BITWISE_SHIFT_RIGHT >>
TOKEN_BITWISE_XOR ^
TOKEN_BITWISE_NOT ~
TOKEN_EQUAL '=' was renamed TOKEN_ASSIGN, and now TOKEN_EQUAL is used
for the logical comparator '=='.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit introduces a few changes in pipalang syntax. Now, both
functions and variables requires keywords to be defined.
before:
main(): i32 {
a: i32 = 2;
return a;
}
now:
fn main(): i32 {
let a: i32 = 2;
return a;
}
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
When a error occurs during a block parser the vector that stores the
nodes had been destroyed but it's nodes don't. This commit fixes this by
replacing the %vector_destroy% with %ast_node_destroy_vector%.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
All parsers have been following the patterns bellow:
ast_node_t *node = ast_node_new();
ast_node_init_node_kind(node, ...args);
return node;
Bringing a uncessessary distraction when reading. The pattern bellow was
replaced by:
return ast_node_new_node_kind(...args);
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit makes variable assignment parser to allocate memory for the
node. It also moves the node initialization to the ast.c to follow our
standard for node initialization.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Since it is possible to look a future token without consuming it, it was
possible to split the block parser into small chunks of code.
There is the performance drawback, because now the parser makes multiple
lookups to the same token. However IMO that it is not a big concern
given the small computation required to get a token. Also it can be
easily addressed by computing all token in advance.
Memory Leak:
During the refactor I found some extra memory leaks related to not
released scopes. So then, more than just printing a message I introduced
an assert on scope.c to make sure developers will get this feedback asap
because our testing framework suppress messages from stderr when the
test passes.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Previously, during block declaration, the parser consumed the token
which caused some parsers (such as return and variable declaration) to
not be self-contained and to depend on the callee to start the parser.
In this commit, I've refactored the parser to only look for future
tokens using lookahead, and delegate the consumption to child parser
functions. This results in a more modular and self-contained parser that
improves the overall maintainability and readability of the code.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
During the refactoring process, I identified a memory leak where the
return argument was allocated but not freed in case of an error.
It also introduces the concept of keyword tokens. Where return is now a
keyword simplifying the parser.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
In many situations, the parser is responsible for reserving memory for
nodes, particularly during function body parsing. This commit introduces
a new standard where parser functions not only allocate memory for
ast_nodes, but also return them. In case of a parser error, a NULL
pointer is returned.
This standard will be extended to other parsers in future commits,
ensuring consistency throughout the codebase.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit introduces variable assignment making it possible to
change a variable value. Example:
myvar: i32 = 1;
myvar = 2;
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlos@maniero.me>
|
|
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch adds the variable compilation and uses a scope (a stack of
map) to lookup for identities.
Today we use a vector + ref_entry structs in order to achieve the scope
implementation. The ref_entry lacks memory management, we are still no
sure who will be the owner of the pointer.
We also want to replace the scope a hashtable_t type as soon as we get
one.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We are parsing variables/functions and checking if they are defined on
scope. Otherwise we fail the parsing with a nice message.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
The refactoring also replace a if statement by switch statement.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The AST was using a string view to distinguish the operation kind. An
enum was created for this purpose simplifying code generation.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
The +, -, *, and / tokens used to be TOKEN_OP, but the TOKEN_OP has been
removed and a token for each operation has been introduced. Python's
token names were followed: https://docs.python.org/3/library/token.html
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
We want to keep the code style consistent, this first commit adds a
.clang-format in order to "document" our style code.
This patch also adds a target *linter* to Makefile which will complain
if we have any style issue on test and src dirs.
I have run the follow command to create the .clang-format file:
$ clang-format -style=mozilla -dump-config > .clang-format
And I also made some adjusts to .clang-format changing the following
properties:
PointerAlignment: Right
ColumnLimit: 120
Commands executed to fix the current styling:
$ find . -name *.h | xargs clang-format -i
$ find . -name *.c | xargs clang-format -i
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit adds support for variables and identifiers in the function
body of the parser, stored as a vector.
However, at this point, identifier resolution is not fully implemented,
and we currently accept identifiers without checking if they can be
resolved. This is a known limitation that will be addressed in a future
commit once hash-tables are added to the parser.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch implements the AST creation for arithmetic expressions.
NOTE:
The implementation works only for integer numbers.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Reviewed-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Since we want to extend our code to support multiple kind of expression
it does not make sense that the return statement always return a number.
For now on, return statement has an ast_node_t as argument, meaning that
it could be anything. The literal_node_t was also implemented in order
to keep the application behavior.
Following the C's calling convention the literal values are stored at
%eax and the return takes this argument to do anything it is needed.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, when an error occurred during parsing, the application
would exit, making it difficult to test the parser and limiting the
compiler's extensibility. This commit improves the parser's error
handling by allowing for continued execution after an error, enabling
easier testing and increased flexibility.
The parser is prepared to handle multiples errors, although the
current implementation always returns a single error, it may be
useful given multiples functions where we can show errors by context.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviwed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, the abstract syntax tree (AST) used static types, meaning
that an ast_function_t would always have a ast_return_stmt_t as
its body. However, this assumption is not always true, as we may have
void functions that do not have a return statement. Additionally, the
ast_return_stmt_t always had a number associated with it, but this too
is not always the case.
To make this possible, I need to perform a few changes in the whole
project. One of the main changes is that there is no longer the
inheritance hack. That mechanism was replaced by composition and
pointers where required for recursive type reference.
It is important to mention that I decided to use union type to implement
the composition. There is two main advantages in this approach:
1. There is only one function to allocate memory for all kind of nodes.
2. There is no need to cast the data.
In summary, this commit introduces changes to support dynamic typing
in the AST, by replacing the inheritance hack with composition and
using union types to simplify memory allocation and type casting.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
In the future we want to have the possibility of traverse the tree and
pretty print it or generate binary for other platform like LLVM or
transpile to C.
This solution also implements the gas assembly x86_64 Linux code
generation by using the visitor interface.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This change fixes the memory leak when token got created.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
In order to find out where a parsing error occurred, this patch
introduces the exactly location following the format 'file:row:col'.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This is a very limited parser implementation which parses a single
function with return type i32 and body containing a return number statement.
The parser doesn't show the 'filepath:row:col' when it fails, a future
improvement would be display it to easy find where the compilation
problem is located.
The ast_nodes are taking the token.value ownership (which is a really
bad design since not all token.value ownership has been taken causing
memory leaking) but we never free them. For a future fix we could use a
string_view instead since we never change the original source code. The
string_view will also improve the performance a lot avoiding unnecessary
heap memory allocation.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|