Age | Commit message (Collapse) | Author |
|
Previously, during block declaration, the parser consumed the token
which caused some parsers (such as return and variable declaration) to
not be self-contained and to depend on the callee to start the parser.
In this commit, I've refactored the parser to only look for future
tokens using lookahead, and delegate the consumption to child parser
functions. This results in a more modular and self-contained parser that
improves the overall maintainability and readability of the code.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
During the refactoring process, I identified a memory leak where the
return argument was allocated but not freed in case of an error.
It also introduces the concept of keyword tokens. Where return is now a
keyword simplifying the parser.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
In many situations, the parser is responsible for reserving memory for
nodes, particularly during function body parsing. This commit introduces
a new standard where parser functions not only allocate memory for
ast_nodes, but also return them. In case of a parser error, a NULL
pointer is returned.
This standard will be extended to other parsers in future commits,
ensuring consistency throughout the codebase.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This also removes the identifier node since it was replaced by
variable.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit introduces variable assignment making it possible to
change a variable value. Example:
myvar: i32 = 1;
myvar = 2;
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlos@maniero.me>
|
|
|
|
The only way to get the next token was by consuming it. So then, our
parser starts to become hard to understand, once sometimes we just
want to take a look on the next token to understand what should be the
next kind of expression.
This commit introduces a new function that will help us to improve our
parser implementation.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Reviewed-by: Carlos Maniero <carlos@maniero.me>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We were moving the stack data for variable reference to another stack
position ending up with two pointer to the same value.
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov %rax, -16(%rbp)
After this changes, we wont create a new temp space on stack if we don't
need it. See bellow the example after the optimization:
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -16(%rbp)
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Until now, every computation was pushed onto stack witch creates
unnecessary stack manipulation and makes the generated code hard to read
and understand.
Now, the latest computation is stored and could be either a literal or a
value on a register.
When it is a register we may need to push the value to stack to avoid
data loss. Now if it is a literal, hence, we can just set the value onto
a register.
example/main.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rax ; <- There is no reason to store data in rax
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/main.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rdi ; <- Fixed!
mov $60, %rax
syscall
pop %rbp
example/variables.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, %rax
mov %rax, -8(%rbp)
mov $32, %rax
mov %rax, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov -32(%rbp), %rcx
mul %rcx
mov %rax, -24(%rbp)
mov $1, %rax
mov %rax, -40(%rbp)
mov $33, %rax
mov %rax, -48(%rbp)
mov -24(%rbp), %rax
mov -48(%rbp), %rcx
sub %rcx, %rax
mov -40(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov %rax, -40(%rbp)
mov -32(%rbp), %rax
mov -40(%rbp), %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/variables.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, -8(%rbp)
mov $32, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rcx
mov $2, %rax
mul %rcx
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov $33, %rcx
sub %rcx, %rax
mov $1, %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rax
mov $2, %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
Less 8 instructions!
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch adds the variable compilation and uses a scope (a stack of
map) to lookup for identities.
Today we use a vector + ref_entry structs in order to achieve the scope
implementation. The ref_entry lacks memory management, we are still no
sure who will be the owner of the pointer.
We also want to replace the scope a hashtable_t type as soon as we get
one.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We are parsing variables/functions and checking if they are defined on
scope. Otherwise we fail the parsing with a nice message.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
After run `make CC=clang` we found the following problems:
error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The refactoring also replace a if statement by switch statement.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Prior to this change, ast_variable_declaration_t and
ast_function_declaration_t used a string_view as an identifier. However,
to support scoped identifiers, it is more appropriate to use an
ast_identifier_t as a reference.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Before accepting an identifier, the parser should check if that
identifier will be available. With this implementation it will be
possible. Take the following code example:
main(): i32 {
return my_exit_code;
}
The parser must return an error informing that *my_exit_code* is not
defined in the example above. The ast scope is a support module for
parser and ast, simplifying identifier resolution.
Once a curly bracket ({) is open the *scope_enter()* is called and when
it is closed (}) we pop the entire stack with *scope_leave()*.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
I decided to remove the visitor pattern due to the lack of Object
Oriented Programming support for C. Now if you want to navigate through
the AST, you should do it with switch case and recursion.
The code looks way simpler without visitor pattern.
I have added a CFLAG -Werror which validates if the switch statement
covers all branches for a given enum at compile time.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The AST was using a string view to distinguish the operation kind. An
enum was created for this purpose simplifying code generation.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
Since there is a guard-cause checking if the token is EOF there is no
need to check it again and again.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
The +, -, *, and / tokens used to be TOKEN_OP, but the TOKEN_OP has been
removed and a token for each operation has been introduced. Python's
token names were followed: https://docs.python.org/3/library/token.html
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
We want to keep the code style consistent, this first commit adds a
.clang-format in order to "document" our style code.
This patch also adds a target *linter* to Makefile which will complain
if we have any style issue on test and src dirs.
I have run the follow command to create the .clang-format file:
$ clang-format -style=mozilla -dump-config > .clang-format
And I also made some adjusts to .clang-format changing the following
properties:
PointerAlignment: Right
ColumnLimit: 120
Commands executed to fix the current styling:
$ find . -name *.h | xargs clang-format -i
$ find . -name *.c | xargs clang-format -i
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit adds support for variables and identifiers in the function
body of the parser, stored as a vector.
However, at this point, identifier resolution is not fully implemented,
and we currently accept identifiers without checking if they can be
resolved. This is a known limitation that will be addressed in a future
commit once hash-tables are added to the parser.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, we lacked a dynamic array for storing children elements in
our abstract syntax tree (AST). This commit introduces a new
implementation that dynamically adjusts its capacity as elements are
added, using a doubling strategy.
I considered two approaches for managing the vector's memory
allocation: allocating it on the heap, or providing a vector_init
function that allocates only the items array. Ultimately, I decided to
provide a vector_new function for instantiating the vector, as this
aligns with the expected usage pattern when there is a destroy function.
With this new implementation, we can efficiently store and manage AST
children, enabling more flexible and expressive tree structures.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We decided for using push and pop to simplify the implementation, we
want to revisit the approach latter.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch implements the AST creation for arithmetic expressions.
NOTE:
The implementation works only for integer numbers.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Reviewed-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Since we want to extend our code to support multiple kind of expression
it does not make sense that the return statement always return a number.
For now on, return statement has an ast_node_t as argument, meaning that
it could be anything. The literal_node_t was also implemented in order
to keep the application behavior.
Following the C's calling convention the literal values are stored at
%eax and the return takes this argument to do anything it is needed.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, when an error occurred during parsing, the application
would exit, making it difficult to test the parser and limiting the
compiler's extensibility. This commit improves the parser's error
handling by allowing for continued execution after an error, enabling
easier testing and increased flexibility.
The parser is prepared to handle multiples errors, although the
current implementation always returns a single error, it may be
useful given multiples functions where we can show errors by context.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviwed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, the abstract syntax tree (AST) used static types, meaning
that an ast_function_t would always have a ast_return_stmt_t as
its body. However, this assumption is not always true, as we may have
void functions that do not have a return statement. Additionally, the
ast_return_stmt_t always had a number associated with it, but this too
is not always the case.
To make this possible, I need to perform a few changes in the whole
project. One of the main changes is that there is no longer the
inheritance hack. That mechanism was replaced by composition and
pointers where required for recursive type reference.
It is important to mention that I decided to use union type to implement
the composition. There is two main advantages in this approach:
1. There is only one function to allocate memory for all kind of nodes.
2. There is no need to cast the data.
In summary, this commit introduces changes to support dynamic typing
in the AST, by replacing the inheritance hack with composition and
using union types to simplify memory allocation and type casting.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Link: https://lists.sr.ht/~johnnyrichard/pipalang-devel/%3C20230418165847.3798-1-carlosmaniero%40gmail.com%3E
|
|
We want to tokenizer arithmetic expressions.
We are handling exceptional cases with UNKNOWN token.
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
make the next token function small by extracting the
functions that make tokens.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Extracted logic for skipping empty characters into a
separate function. No change in lexer behavior.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
In the future we want to have the possibility of traverse the tree and
pretty print it or generate binary for other platform like LLVM or
transpile to C.
This solution also implements the gas assembly x86_64 Linux code
generation by using the visitor interface.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This is an attempt of reducing code duplication.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This change fixes the memory leak when token got created.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We are allocating heap memory to create tokens value, we can minimize the
number of allocations if we start using string_view.
We have other problems, right now the tokens value ownership are quite
unclear once the AST nodes also share the memory allocation done by
token_get_next_token function.
It's important to clarify we also have memory leaks on the current
implementation. Hence, we are going to start using string_view to make
the memory management easier. :^)
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
In order to find out where a parsing error occurred, this patch
introduces the exactly location following the format 'file:row:col'.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This is a very limited parser implementation which parses a single
function with return type i32 and body containing a return number statement.
The parser doesn't show the 'filepath:row:col' when it fails, a future
improvement would be display it to easy find where the compilation
problem is located.
The ast_nodes are taking the token.value ownership (which is a really
bad design since not all token.value ownership has been taken causing
memory leaking) but we never free them. For a future fix we could use a
string_view instead since we never change the original source code. The
string_view will also improve the performance a lot avoiding unnecessary
heap memory allocation.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
After enabling the warning flags, the compiler was firing the following
warnings:
warning: implicit declaration of function ‘strdup’; did you mean ‘strcmp’? [-Wimplicit-function-declaration]
token->value = strdup("(");
^~~~~~
strcmp
warning: assignment to ‘char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
token->value = strdup("(");
^
In order to fix these warnings above, I have decided to replace *strdup*
and *strndup* by *strcpy* and *strncpy* functions.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We want to have different folders for src and objs files.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|