Age | Commit message (Collapse) | Author |
|
We have been always parsing a single function. Since we want to have
multiple functions in a near future, this patch introduces an namespace
that represents an entire file.
To ensure a function is defined inside a namespace, a helper function
was created. Today our ast_node structure is highly exposed, and this is
something that Johnny and I have been discussed. So then, this is a
first step to try to protected the code generation from our ast tree.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Before this commit, function declarations were making syscalls to
interrupt the flow. This is a fair approach considering all our examples
just have a main function. But won't work if the namespace has more then
a single function.
The return now always sets the return value on RAX and jumps to the
function return label.
The function return label, will restore RBP and jump back to callee's
next instruction using RET instruction.
Function labels are kept, which means that a function called my_fn will
have the assembly label my_fn, so then, they can have INTEROP with other
languages.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit abstract the complexity of an entry so then, the users of
the ref map does not need to understand how is it implemented.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
When the assignment value is a literal, it just assigns zero or one to
the variable stack's location. If the value is an expression, it
compiles the expression and assign zeros and ones based on expression
result.
|
|
Now if statements are complete! The function
%gas_assembly_generator_compile_condition% is generic and will be used
for any other flow-control statment. The only requirement to it work is
having two labels: One to jump when the condition is true, and another
one when the condition is false.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
The comparators && and || should have precedence over others comparators
(> < >= <= == !=).
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
If statements are now working, the only exception is for the comparators
|| and && that will be addressed in a further commit. Checks tested:
fn main(): i32 {
let n: i32 = 11;
if (n == 11) {
if n != 12 {
if n < 12 {
if n <= 11 {
if n > 10 {
if n >= 11 {
return 42;
}
}
}
}
}
}
return n;
}
To compile the && and || a precedence issue must be addressed: they must
have the highest precedence, witch is not working now:
1 == 2 || 3 != 2
The or should be the higher level of the tree in the example above.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit parses a if statement following the grammar bellow:
if boolean_expression {
n_epressions;
}
No else neither code generation was implemented.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Since the next step to pipa programming language is about having control
flow statements we could benefit ourselves by having a block node to
control scope. Now, functions has a block node, instead of an vector as
body. As you can see through the ast-dump:
FunctionDecl name='main'
└─ body:
└─ Block
└─ ReturnStmt
└─ Literal type=i32 value='69'
This same node kind can be used for parsing if, for and while blocks.
I could use ast_block_t as body for functions but instead, I opted to
use an ast_node_t. This brings the flexibility to, in the future, having
another function body kinds, such as arrow functions if we want to:
fn add(a: i32, b: i32): i32 => a + b;
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
After this commit, this is a valid expression:
1 || 2 && 3 > 4 < 5 >= 6 <= 7 == 8 != 9
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
When assign a variable or returning a value it now ensures that the
expression matches the expected type.
To make this possible a %result_type% field was added to ast_node_t and
this field is used whenever to make the comparison.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This is a simple implementation of a general propose single-linked list.
There is only the *prepend* function implemented at this moment. We can
alway revisit the code and implement new missing functionality on demand.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit introduces a new type for booleans. There is no code
generation for this type yet. The intention of this commit is to enable
flow control in the near future.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The followed logic operators were added to lexer:
TOKEN_EQUAL ==
TOKEN_NOT !
TOKEN_NOT_EQUAL !=
TOKEN_GT >
TOKEN_GT_EQUAL >=
TOKEN_LT <
TOKEN_LT_EQUAL <=
TOKEN_AND &&
TOKEN_OR ||
Bitwise operators were also added
TOKEN_BITWISE_AND &
TOKEN_BITWISE_OR |
TOKEN_BITWISE_SHIFT_LEFT <<
TOKEN_BITWISE_SHIFT_RIGHT >>
TOKEN_BITWISE_XOR ^
TOKEN_BITWISE_NOT ~
TOKEN_EQUAL '=' was renamed TOKEN_ASSIGN, and now TOKEN_EQUAL is used
for the logical comparator '=='.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
In C, literal integers default to a 32-bit size for arithmetic
operations. Unfortunately, this was causing incorrect values to be
assigned to our uint64_t variables, leading to unexpected behavior.
To resolve this issue, we have updated our code to explicitly set the
literal size using the "ULL" suffix (unsigned long long).
It's important to note that this implementation has a limitation of 64
levels of indentation. Beyond this point, we may encounter a 64-bit
overflow. However, at present, we don't anticipate the need to visualize
trees that exceed this depth. If this requirement arises in the future,
we can explore solutions like dynamically creating new numbers to
accommodate larger tree sizes.
Overall, this change ensures that our code is functioning correctly and
improves the reliability of our codebase.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Parsing can be a complex process, and it's not always easy to get a
clear picture of what's happening with the AST. This commit adds a new
feature to the CLI that allows us to pretty-print the AST (outputs to
stdout), making it easier to visualize the tree structure and understand
how the parser is working.
The new --ast-dump option generates a human-readable representation of
the AST, including node types, values, and child relationships. This
information can be invaluable for debugging and understanding the
parser's behavior.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Reviewed-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit introduces a few changes in pipalang syntax. Now, both
functions and variables requires keywords to be defined.
before:
main(): i32 {
a: i32 = 2;
return a;
}
now:
fn main(): i32 {
let a: i32 = 2;
return a;
}
Signed-off-by: Carlos Maniero <carlos@maniero.me>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Munit uses the follow format:
Error: file.c:line: error message
But this error format does not works well with vim make program.
This commit changes the error to the follow format:
file.c:line: Error: error message
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
When looking ahead, there was no check ensuring we reach EOF.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
I found easier to understand the application entrypoint by calling it
main.c.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
When a error occurs during a block parser the vector that stores the
nodes had been destroyed but it's nodes don't. This commit fixes this by
replacing the %vector_destroy% with %ast_node_destroy_vector%.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
All parsers have been following the patterns bellow:
ast_node_t *node = ast_node_new();
ast_node_init_node_kind(node, ...args);
return node;
Bringing a uncessessary distraction when reading. The pattern bellow was
replaced by:
return ast_node_new_node_kind(...args);
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit makes variable assignment parser to allocate memory for the
node. It also moves the node initialization to the ast.c to follow our
standard for node initialization.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Since it is possible to look a future token without consuming it, it was
possible to split the block parser into small chunks of code.
There is the performance drawback, because now the parser makes multiple
lookups to the same token. However IMO that it is not a big concern
given the small computation required to get a token. Also it can be
easily addressed by computing all token in advance.
Memory Leak:
During the refactor I found some extra memory leaks related to not
released scopes. So then, more than just printing a message I introduced
an assert on scope.c to make sure developers will get this feedback asap
because our testing framework suppress messages from stderr when the
test passes.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Previously, during block declaration, the parser consumed the token
which caused some parsers (such as return and variable declaration) to
not be self-contained and to depend on the callee to start the parser.
In this commit, I've refactored the parser to only look for future
tokens using lookahead, and delegate the consumption to child parser
functions. This results in a more modular and self-contained parser that
improves the overall maintainability and readability of the code.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
During the refactoring process, I identified a memory leak where the
return argument was allocated but not freed in case of an error.
It also introduces the concept of keyword tokens. Where return is now a
keyword simplifying the parser.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
In many situations, the parser is responsible for reserving memory for
nodes, particularly during function body parsing. This commit introduces
a new standard where parser functions not only allocate memory for
ast_nodes, but also return them. In case of a parser error, a NULL
pointer is returned.
This standard will be extended to other parsers in future commits,
ensuring consistency throughout the codebase.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This also removes the identifier node since it was replaced by
variable.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit introduces variable assignment making it possible to
change a variable value. Example:
myvar: i32 = 1;
myvar = 2;
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlos@maniero.me>
|
|
|
|
The only way to get the next token was by consuming it. So then, our
parser starts to become hard to understand, once sometimes we just
want to take a look on the next token to understand what should be the
next kind of expression.
This commit introduces a new function that will help us to improve our
parser implementation.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Reviewed-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We were moving the stack data for variable reference to another stack
position ending up with two pointer to the same value.
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov %rax, -16(%rbp)
After this changes, we wont create a new temp space on stack if we don't
need it. See bellow the example after the optimization:
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -16(%rbp)
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Until now, every computation was pushed onto stack witch creates
unnecessary stack manipulation and makes the generated code hard to read
and understand.
Now, the latest computation is stored and could be either a literal or a
value on a register.
When it is a register we may need to push the value to stack to avoid
data loss. Now if it is a literal, hence, we can just set the value onto
a register.
example/main.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rax ; <- There is no reason to store data in rax
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/main.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rdi ; <- Fixed!
mov $60, %rax
syscall
pop %rbp
example/variables.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, %rax
mov %rax, -8(%rbp)
mov $32, %rax
mov %rax, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov -32(%rbp), %rcx
mul %rcx
mov %rax, -24(%rbp)
mov $1, %rax
mov %rax, -40(%rbp)
mov $33, %rax
mov %rax, -48(%rbp)
mov -24(%rbp), %rax
mov -48(%rbp), %rcx
sub %rcx, %rax
mov -40(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov %rax, -40(%rbp)
mov -32(%rbp), %rax
mov -40(%rbp), %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/variables.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, -8(%rbp)
mov $32, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rcx
mov $2, %rax
mul %rcx
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov $33, %rcx
sub %rcx, %rax
mov $1, %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rax
mov $2, %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
Less 8 instructions!
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch adds the variable compilation and uses a scope (a stack of
map) to lookup for identities.
Today we use a vector + ref_entry structs in order to achieve the scope
implementation. The ref_entry lacks memory management, we are still no
sure who will be the owner of the pointer.
We also want to replace the scope a hashtable_t type as soon as we get
one.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We are parsing variables/functions and checking if they are defined on
scope. Otherwise we fail the parsing with a nice message.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
After run `make CC=clang` we found the following problems:
error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The refactoring also replace a if statement by switch statement.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Prior to this change, ast_variable_declaration_t and
ast_function_declaration_t used a string_view as an identifier. However,
to support scoped identifiers, it is more appropriate to use an
ast_identifier_t as a reference.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Before accepting an identifier, the parser should check if that
identifier will be available. With this implementation it will be
possible. Take the following code example:
main(): i32 {
return my_exit_code;
}
The parser must return an error informing that *my_exit_code* is not
defined in the example above. The ast scope is a support module for
parser and ast, simplifying identifier resolution.
Once a curly bracket ({) is open the *scope_enter()* is called and when
it is closed (}) we pop the entire stack with *scope_leave()*.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
I decided to remove the visitor pattern due to the lack of Object
Oriented Programming support for C. Now if you want to navigate through
the AST, you should do it with switch case and recursion.
The code looks way simpler without visitor pattern.
I have added a CFLAG -Werror which validates if the switch statement
covers all branches for a given enum at compile time.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|