Age | Commit message (Collapse) | Author |
|
Until now the below code was not valid for pipac.
fn main(): i32 {
return fib(13);
}
fn fib(n: i32): i32 {
if n <= 1 {
return n;
}
return fib(n - 1) + fib(n - 2);
}
Pipa's parser was adding a function to scope after they were fully
parsed which means that fib's self-reference would not work.
Also, functions were required to follow the be called in the order they
are declared for the same scope reason so, the main function was
required to be defined after fib.
And how it is working now?
When a TOKEN_NAME is not found in the scope, instead of returning an
error, an unknown token is created as placeholder. The parser stores the
node reference and the token it was trying to parse.
During type checks, if the parser detects an unknown node, instead of
returning an error, it stores in that node what was the expected type.
After the NS is fully parsed a reevaluation is made on those unknown
nodes by setting the lexer back on the node's token position and parsing
the TOKEN_NAME again.
Ps: There is a typo on the unknown token. It will be addressed in
another commit since this issue was not introduced by this change.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
For now function calls are following the C's calling convention, which
means they are using the following registers to pass functions'
arguments:
rdi, rsi, rdx, rcx, r8, r9
If a function has more then 6 parameters, the compilation will fail.
To enable function with more than 6 parameters we will need to save the
extra arguments on stack.
Naming:
parameters: function parameters are the variables a function receives.
arguments: Arguments are the values passed to a function when calling
it.
Calling mechanism:
When a function is called, all the expressions passed as argument are
evaluated, after the evaluation, the result is stored on the register
that represents its argument position, the first argument will be
stored on rdi, the second on rsi and so on.
Receiving mechanism:
When a function starts, the first thing it does is store all the
registers onto the stack. So rdi will be stored on -8(rbp), rsi on
-16(rbp) and so on. And, a ref_entry is created making the
relationship parameter-stack_offset.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This is an initial commit that enables function calls. At this point
only functions with no parameters is going to work.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
We have been always parsing a single function. Since we want to have
multiple functions in a near future, this patch introduces an namespace
that represents an entire file.
To ensure a function is defined inside a namespace, a helper function
was created. Today our ast_node structure is highly exposed, and this is
something that Johnny and I have been discussed. So then, this is a
first step to try to protected the code generation from our ast tree.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Before this commit, function declarations were making syscalls to
interrupt the flow. This is a fair approach considering all our examples
just have a main function. But won't work if the namespace has more then
a single function.
The return now always sets the return value on RAX and jumps to the
function return label.
The function return label, will restore RBP and jump back to callee's
next instruction using RET instruction.
Function labels are kept, which means that a function called my_fn will
have the assembly label my_fn, so then, they can have INTEROP with other
languages.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit abstract the complexity of an entry so then, the users of
the ref map does not need to understand how is it implemented.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
When the assignment value is a literal, it just assigns zero or one to
the variable stack's location. If the value is an expression, it
compiles the expression and assign zeros and ones based on expression
result.
|
|
Now if statements are complete! The function
%gas_assembly_generator_compile_condition% is generic and will be used
for any other flow-control statment. The only requirement to it work is
having two labels: One to jump when the condition is true, and another
one when the condition is false.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
If statements are now working, the only exception is for the comparators
|| and && that will be addressed in a further commit. Checks tested:
fn main(): i32 {
let n: i32 = 11;
if (n == 11) {
if n != 12 {
if n < 12 {
if n <= 11 {
if n > 10 {
if n >= 11 {
return 42;
}
}
}
}
}
}
return n;
}
To compile the && and || a precedence issue must be addressed: they must
have the highest precedence, witch is not working now:
1 == 2 || 3 != 2
The or should be the higher level of the tree in the example above.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit parses a if statement following the grammar bellow:
if boolean_expression {
n_epressions;
}
No else neither code generation was implemented.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
Since the next step to pipa programming language is about having control
flow statements we could benefit ourselves by having a block node to
control scope. Now, functions has a block node, instead of an vector as
body. As you can see through the ast-dump:
FunctionDecl name='main'
└─ body:
└─ Block
└─ ReturnStmt
└─ Literal type=i32 value='69'
This same node kind can be used for parsing if, for and while blocks.
I could use ast_block_t as body for functions but instead, I opted to
use an ast_node_t. This brings the flexibility to, in the future, having
another function body kinds, such as arrow functions if we want to:
fn add(a: i32, b: i32): i32 => a + b;
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This also removes the identifier node since it was replaced by
variable.
Signed-off-by: Carlos Maniero <carlos@maniero.me>
|
|
This commit introduces variable assignment making it possible to
change a variable value. Example:
myvar: i32 = 1;
myvar = 2;
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlos@maniero.me>
|
|
We were moving the stack data for variable reference to another stack
position ending up with two pointer to the same value.
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov %rax, -16(%rbp)
After this changes, we wont create a new temp space on stack if we don't
need it. See bellow the example after the optimization:
// a: i32 = 1;
mov $1, -8(%rbp)
// b: i32 = a;
mov -8(%rbp), %rax
mov %rax, -16(%rbp)
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Until now, every computation was pushed onto stack witch creates
unnecessary stack manipulation and makes the generated code hard to read
and understand.
Now, the latest computation is stored and could be either a literal or a
value on a register.
When it is a register we may need to push the value to stack to avoid
data loss. Now if it is a literal, hence, we can just set the value onto
a register.
example/main.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rax ; <- There is no reason to store data in rax
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/main.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $69, %rdi ; <- Fixed!
mov $60, %rax
syscall
pop %rbp
example/variables.pipa before this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, %rax
mov %rax, -8(%rbp)
mov $32, %rax
mov %rax, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov -32(%rbp), %rcx
mul %rcx
mov %rax, -24(%rbp)
mov $1, %rax
mov %rax, -40(%rbp)
mov $33, %rax
mov %rax, -48(%rbp)
mov -24(%rbp), %rax
mov -48(%rbp), %rcx
sub %rcx, %rax
mov -40(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov $2, %rax
mov %rax, -40(%rbp)
mov -32(%rbp), %rax
mov -40(%rbp), %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
example/variables.pipa after this commit:
.global _start
.text
_start:
push %rbp
mov %rsp, %rbp
mov $12, -8(%rbp)
mov $32, -16(%rbp)
mov -8(%rbp), %rax
mov %rax, -32(%rbp)
mov -16(%rbp), %rax
mov -32(%rbp), %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rcx
mov $2, %rax
mul %rcx
mov %rax, -24(%rbp)
mov -24(%rbp), %rax
mov $33, %rcx
sub %rcx, %rax
mov $1, %rcx
add %rcx, %rax
mov %rax, -32(%rbp)
mov -32(%rbp), %rax
mov $2, %rcx
xor %rdx, %rdx
div %rcx
mov %rax, %rdi
mov $60, %rax
syscall
pop %rbp
Less 8 instructions!
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This patch adds the variable compilation and uses a scope (a stack of
map) to lookup for identities.
Today we use a vector + ref_entry structs in order to achieve the scope
implementation. The ref_entry lacks memory management, we are still no
sure who will be the owner of the pointer.
We also want to replace the scope a hashtable_t type as soon as we get
one.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
We are parsing variables/functions and checking if they are defined on
scope. Otherwise we fail the parsing with a nice message.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
|
|
Prior to this change, ast_variable_declaration_t and
ast_function_declaration_t used a string_view as an identifier. However,
to support scoped identifiers, it is more appropriate to use an
ast_identifier_t as a reference.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
I decided to remove the visitor pattern due to the lack of Object
Oriented Programming support for C. Now if you want to navigate through
the AST, you should do it with switch case and recursion.
The code looks way simpler without visitor pattern.
I have added a CFLAG -Werror which validates if the switch statement
covers all branches for a given enum at compile time.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
The AST was using a string view to distinguish the operation kind. An
enum was created for this purpose simplifying code generation.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
|
|
We want to keep the code style consistent, this first commit adds a
.clang-format in order to "document" our style code.
This patch also adds a target *linter* to Makefile which will complain
if we have any style issue on test and src dirs.
I have run the follow command to create the .clang-format file:
$ clang-format -style=mozilla -dump-config > .clang-format
And I also made some adjusts to .clang-format changing the following
properties:
PointerAlignment: Right
ColumnLimit: 120
Commands executed to fix the current styling:
$ find . -name *.h | xargs clang-format -i
$ find . -name *.c | xargs clang-format -i
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
This commit adds support for variables and identifiers in the function
body of the parser, stored as a vector.
However, at this point, identifier resolution is not fully implemented,
and we currently accept identifiers without checking if they can be
resolved. This is a known limitation that will be addressed in a future
commit once hash-tables are added to the parser.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
We decided for using push and pop to simplify the implementation, we
want to revisit the approach latter.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Since we want to extend our code to support multiple kind of expression
it does not make sense that the return statement always return a number.
For now on, return statement has an ast_node_t as argument, meaning that
it could be anything. The literal_node_t was also implemented in order
to keep the application behavior.
Following the C's calling convention the literal values are stored at
%eax and the return takes this argument to do anything it is needed.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
Previously, the abstract syntax tree (AST) used static types, meaning
that an ast_function_t would always have a ast_return_stmt_t as
its body. However, this assumption is not always true, as we may have
void functions that do not have a return statement. Additionally, the
ast_return_stmt_t always had a number associated with it, but this too
is not always the case.
To make this possible, I need to perform a few changes in the whole
project. One of the main changes is that there is no longer the
inheritance hack. That mechanism was replaced by composition and
pointers where required for recursive type reference.
It is important to mention that I decided to use union type to implement
the composition. There is two main advantages in this approach:
1. There is only one function to allocate memory for all kind of nodes.
2. There is no need to cast the data.
In summary, this commit introduces changes to support dynamic typing
in the AST, by replacing the inheritance hack with composition and
using union types to simplify memory allocation and type casting.
Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
|
|
In the future we want to have the possibility of traverse the tree and
pretty print it or generate binary for other platform like LLVM or
transpile to C.
This solution also implements the gas assembly x86_64 Linux code
generation by using the visitor interface.
Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
|