pipac.git - Pipa programming language

Age	Commit message (Collapse)	Author
2023-05-18	util: Create hashmap data structure with FNV-1a 32-bit hashingHEAD master	Johnny Richard
	This path implements a hashmap data structure using the FNV-1a hashing function of 32 bits. However, there are a few limitations to consider: a) The table does not automatically expand when it becomes too large. b) The capacity of the hashmap must be a power of 2 to optimize performance, as we utilize bitwise shifting instead of modulus calculations. c) Currently, there are no functions available to destroy hashmap entries. Collisions are handled using a linked list, which is not the most optimal solution in terms of performance. However, it serves our current needs adequately. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-17	Merge commit 'ea7f65fe1250be8f49edcaaedd3410aed1401648' of ↵	Johnny Richard
	https://git.sr.ht/~carlosmaniero/pipac-clone
2023-05-11	gas: implement recursion and late evaluation	Carlos Maniero
	Until now the below code was not valid for pipac. fn main(): i32 { return fib(13); } fn fib(n: i32): i32 { if n <= 1 { return n; } return fib(n - 1) + fib(n - 2); } Pipa's parser was adding a function to scope after they were fully parsed which means that fib's self-reference would not work. Also, functions were required to follow the be called in the order they are declared for the same scope reason so, the main function was required to be defined after fib. And how it is working now? When a TOKEN_NAME is not found in the scope, instead of returning an error, an unknown token is created as placeholder. The parser stores the node reference and the token it was trying to parse. During type checks, if the parser detects an unknown node, instead of returning an error, it stores in that node what was the expected type. After the NS is fully parsed a reevaluation is made on those unknown nodes by setting the lexer back on the node's token position and parsing the TOKEN_NAME again. Ps: There is a typo on the unknown token. It will be addressed in another commit since this issue was not introduced by this change. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: implement function calls	Carlos Maniero
	For now function calls are following the C's calling convention, which means they are using the following registers to pass functions' arguments: rdi, rsi, rdx, rcx, r8, r9 If a function has more then 6 parameters, the compilation will fail. To enable function with more than 6 parameters we will need to save the extra arguments on stack. Naming: parameters: function parameters are the variables a function receives. arguments: Arguments are the values passed to a function when calling it. Calling mechanism: When a function is called, all the expressions passed as argument are evaluated, after the evaluation, the result is stored on the register that represents its argument position, the first argument will be stored on rdi, the second on rsi and so on. Receiving mechanism: When a function starts, the first thing it does is store all the registers onto the stack. So rdi will be stored on -8(rbp), rsi on -16(rbp) and so on. And, a ref_entry is created making the relationship parameter-stack_offset. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	tests: Replace parse function with parse ns for error handling	Carlos Maniero
	It was necessary to test function calls errors. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: Generate function call	Carlos Maniero
	This is an initial commit that enables function calls. At this point only functions with no parameters is going to work. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	namespaces: Add a namespace structure that represents a file	Carlos Maniero
	We have been always parsing a single function. Since we want to have multiple functions in a near future, this patch introduces an namespace that represents an entire file. To ensure a function is defined inside a namespace, a helper function was created. Today our ast_node structure is highly exposed, and this is something that Johnny and I have been discussed. So then, this is a first step to try to protected the code generation from our ast tree. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: Removes Linux entrypoint logic from function declaration	Carlos Maniero
	Before this commit, function declarations were making syscalls to interrupt the flow. This is a fair approach considering all our examples just have a main function. But won't work if the namespace has more then a single function. The return now always sets the return value on RAX and jumps to the function return label. The function return label, will restore RBP and jump back to callee's next instruction using RET instruction. Function labels are kept, which means that a function called my_fn will have the assembly label my_fn, so then, they can have INTEROP with other languages. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: Abstract refs with helper functions	Carlos Maniero
	This commit abstract the complexity of an entry so then, the users of the ref map does not need to understand how is it implemented. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: Compile boolean variable assignment	Carlos Maniero
	When the assignment value is a literal, it just assigns zero or one to the variable stack's location. If the value is an expression, it compiles the expression and assign zeros and ones based on expression result.
2023-05-10	gas: Implement && and \|\| for if statements	Carlos Maniero
	Now if statements are complete! The function %gas_assembly_generator_compile_condition% is generic and will be used for any other flow-control statment. The only requirement to it work is having two labels: One to jump when the condition is true, and another one when the condition is false. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	parser: Fixes boolean binary operation precedence	Carlos Maniero
	The comparators && and \|\| should have precedence over others comparators (> < >= <= == !=). Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-10	gas: Generate code for if statement	Carlos Maniero
	If statements are now working, the only exception is for the comparators \|\| and && that will be addressed in a further commit. Checks tested: fn main(): i32 { let n: i32 = 11; if (n == 11) { if n != 12 { if n < 12 { if n <= 11 { if n > 10 { if n >= 11 { return 42; } } } } } } return n; } To compile the && and \|\| a precedence issue must be addressed: they must have the highest precedence, witch is not working now: 1 == 2 \|\| 3 != 2 The or should be the higher level of the tree in the example above. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-09	parser: parses an if statement no code generation	Carlos Maniero
	This commit parses a if statement following the grammar bellow: if boolean_expression { n_epressions; } No else neither code generation was implemented. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-09	parser: create a block node type	Carlos Maniero
	Since the next step to pipa programming language is about having control flow statements we could benefit ourselves by having a block node to control scope. Now, functions has a block node, instead of an vector as body. As you can see through the ast-dump: FunctionDecl name='main' └─ body: └─ Block └─ ReturnStmt └─ Literal type=i32 value='69' This same node kind can be used for parsing if, for and while blocks. I could use ast_block_t as body for functions but instead, I opted to use an ast_node_t. This brings the flexibility to, in the future, having another function body kinds, such as arrow functions if we want to: fn add(a: i32, b: i32): i32 => a + b; Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-09	parser: parser boolean comparison expressions	Carlos Maniero
	After this commit, this is a valid expression: 1 \|\| 2 && 3 > 4 < 5 >= 6 <= 7 == 8 != 9 Signed-off-by: Carlos Maniero <carlos@maniero.me> Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-09	parser: Ensure the expression types	Carlos Maniero
	When assign a variable or returning a value it now ensures that the expression matches the expected type. To make this possible a %result_type% field was added to ast_node_t and this field is used whenever to make the comparison. Signed-off-by: Carlos Maniero <carlos@maniero.me> Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-09	utils: Create linked list data structure	Johnny Richard
	This is a simple implementation of a general propose single-linked list. There is only the prepend function implemented at this moment. We can alway revisit the code and implement new missing functionality on demand. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-09	parser: Add the bool type	Carlos Maniero
	This commit introduces a new type for booleans. There is no code generation for this type yet. The intention of this commit is to enable flow control in the near future. Signed-off-by: Carlos Maniero <carlos@maniero.me> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-06	lexer: Tokenize logical and bitwise operators	Carlos Maniero
	The followed logic operators were added to lexer: TOKEN_EQUAL == TOKEN_NOT ! TOKEN_NOT_EQUAL != TOKEN_GT > TOKEN_GT_EQUAL >= TOKEN_LT < TOKEN_LT_EQUAL <= TOKEN_AND && TOKEN_OR \|\| Bitwise operators were also added TOKEN_BITWISE_AND & TOKEN_BITWISE_OR \| TOKEN_BITWISE_SHIFT_LEFT << TOKEN_BITWISE_SHIFT_RIGHT >> TOKEN_BITWISE_XOR ^ TOKEN_BITWISE_NOT ~ TOKEN_EQUAL '=' was renamed TOKEN_ASSIGN, and now TOKEN_EQUAL is used for the logical comparator '=='. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-06	cli: Fix bitwise handling on --ast-dump	Johnny Richard
	In C, literal integers default to a 32-bit size for arithmetic operations. Unfortunately, this was causing incorrect values to be assigned to our uint64_t variables, leading to unexpected behavior. To resolve this issue, we have updated our code to explicitly set the literal size using the "ULL" suffix (unsigned long long). It's important to note that this implementation has a limitation of 64 levels of indentation. Beyond this point, we may encounter a 64-bit overflow. However, at present, we don't anticipate the need to visualize trees that exceed this depth. If this requirement arises in the future, we can explore solutions like dynamically creating new numbers to accommodate larger tree sizes. Overall, this change ensures that our code is functioning correctly and improves the reliability of our codebase. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-05	pretty-printer: Remove unused field	Carlos Maniero
	Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-05	cli: Add AST pretty-printing option (--ast-dump)	Johnny Richard
	Parsing can be a complex process, and it's not always easy to get a clear picture of what's happening with the AST. This commit adds a new feature to the CLI that allows us to pretty-print the AST (outputs to stdout), making it easier to visualize the tree structure and understand how the parser is working. The new --ast-dump option generates a human-readable representation of the AST, including node types, values, and child relationships. This information can be invaluable for debugging and understanding the parser's behavior. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Reviewed-by: Carlos Maniero <carlos@maniero.me>
2023-05-04	lexer: Allows snake_case token names	Carlos Maniero
	Signed-off-by: Carlos Maniero <carlos@maniero.me> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-04	parser: Introduce statement keywords	Carlos Maniero
	This commit introduces a few changes in pipalang syntax. Now, both functions and variables requires keywords to be defined. before: main(): i32 { a: i32 = 2; return a; } now: fn main(): i32 { let a: i32 = 2; return a; } Signed-off-by: Carlos Maniero <carlos@maniero.me> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-04	munit: Show the filename as the first when error	Carlos Maniero
	Munit uses the follow format: Error: file.c:line: error message But this error format does not works well with vim make program. This commit changes the error to the follow format: file.c:line: Error: error message Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-04	lexer: Avoiding computation after find an EOF	Carlos Maniero
	When looking ahead, there was no check ensuring we reach EOF. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	cli: Rename src/pipac.c to src/main.c	Johnny Richard
	I found easier to understand the application entrypoint by calling it main.c. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-05-03	parser: Fixes block parser memory leak	Carlos Maniero
	When a error occurs during a block parser the vector that stores the nodes had been destroyed but it's nodes don't. This commit fixes this by replacing the %vector_destroy% with %ast_node_destroy_vector%. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	ast: Replace init by allocation (new) functions	Carlos Maniero
	All parsers have been following the patterns bellow: ast_node_t *node = ast_node_new(); ast_node_init_node_kind(node, ...args); return node; Bringing a uncessessary distraction when reading. The pattern bellow was replaced by: return ast_node_new_node_kind(...args); Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Parser allocate memory for expressions	Carlos Maniero
	Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Variable assignment allocates their own node	Carlos Maniero
	This commit makes variable assignment parser to allocate memory for the node. It also moves the node initialization to the ast.c to follow our standard for node initialization. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Variable declaration allocates their own node	Carlos Maniero
	Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Split block into small functions	Carlos Maniero
	Since it is possible to look a future token without consuming it, it was possible to split the block parser into small chunks of code. There is the performance drawback, because now the parser makes multiple lookups to the same token. However IMO that it is not a big concern given the small computation required to get a token. Also it can be easily addressed by computing all token in advance. Memory Leak: During the refactor I found some extra memory leaks related to not released scopes. So then, more than just printing a message I introduced an assert on scope.c to make sure developers will get this feedback asap because our testing framework suppress messages from stderr when the test passes. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Use lookahead instead of consuming tokens	Carlos Maniero
	Previously, during block declaration, the parser consumed the token which caused some parsers (such as return and variable declaration) to not be self-contained and to depend on the callee to start the parser. In this commit, I've refactored the parser to only look for future tokens using lookahead, and delegate the consumption to child parser functions. This results in a more modular and self-contained parser that improves the overall maintainability and readability of the code. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Refactor return statement to return an ast_node	Carlos Maniero
	During the refactoring process, I identified a memory leak where the return argument was allocated but not freed in case of an error. It also introduces the concept of keyword tokens. Where return is now a keyword simplifying the parser. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	Parser: Make the parser function return the ast_node	Carlos Maniero
	In many situations, the parser is responsible for reserving memory for nodes, particularly during function body parsing. This commit introduces a new standard where parser functions not only allocate memory for ast_nodes, but also return them. In case of a parser error, a NULL pointer is returned. This standard will be extended to other parsers in future commits, ensuring consistency throughout the codebase. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	style: Improve ast node initialization	Carlos Maniero
	This also removes the identifier node since it was replaced by variable. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-01	parser: Implement variable assignment	Johnny Richard
	This commit introduces variable assignment making it possible to change a variable value. Example: myvar: i32 = 1; myvar = 2; Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlos@maniero.me>
2023-05-01	parser: Use peek and drop token when parsing expressions	Johnny Richard

2023-05-01	lexer: Peek next token	Johnny Richard
	The only way to get the next token was by consuming it. So then, our parser starts to become hard to understand, once sometimes we just want to take a look on the next token to understand what should be the next kind of expression. This commit introduces a new function that will help us to improve our parser implementation. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Reviewed-by: Carlos Maniero <carlos@maniero.me>
2023-04-30	style: Invert parameters order on parser_parse_type	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	build: Add Makefile to build pipa examples	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	gas: Optimize variable reference on assembly	Johnny Richard
	We were moving the stack data for variable reference to another stack position ending up with two pointer to the same value. // a: i32 = 1; mov $1, -8(%rbp) // b: i32 = a; mov -8(%rbp), %rax mov %rax, -24(%rbp) mov -24(%rbp), %rax mov %rax, -16(%rbp) After this changes, we wont create a new temp space on stack if we don't need it. See bellow the example after the optimization: // a: i32 = 1; mov $1, -8(%rbp) // b: i32 = a; mov -8(%rbp), %rax mov %rax, -16(%rbp) Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	style: Rename evaluation kinds on gas generator	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	gas: Optimize the stack utilization	Carlos Maniero
	Until now, every computation was pushed onto stack witch creates unnecessary stack manipulation and makes the generated code hard to read and understand. Now, the latest computation is stored and could be either a literal or a value on a register. When it is a register we may need to push the value to stack to avoid data loss. Now if it is a literal, hence, we can just set the value onto a register. example/main.pipa before this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $69, %rax ; <- There is no reason to store data in rax mov %rax, %rdi mov $60, %rax syscall pop %rbp example/main.pipa after this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $69, %rdi ; <- Fixed! mov $60, %rax syscall pop %rbp example/variables.pipa before this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $12, %rax mov %rax, -8(%rbp) mov $32, %rax mov %rax, -16(%rbp) mov -8(%rbp), %rax mov %rax, -32(%rbp) mov -16(%rbp), %rax mov -32(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov $2, %rax mov -32(%rbp), %rcx mul %rcx mov %rax, -24(%rbp) mov $1, %rax mov %rax, -40(%rbp) mov $33, %rax mov %rax, -48(%rbp) mov -24(%rbp), %rax mov -48(%rbp), %rcx sub %rcx, %rax mov -40(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov $2, %rax mov %rax, -40(%rbp) mov -32(%rbp), %rax mov -40(%rbp), %rcx xor %rdx, %rdx div %rcx mov %rax, %rdi mov $60, %rax syscall pop %rbp example/variables.pipa after this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $12, -8(%rbp) mov $32, -16(%rbp) mov -8(%rbp), %rax mov %rax, -32(%rbp) mov -16(%rbp), %rax mov -32(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov -32(%rbp), %rcx mov $2, %rax mul %rcx mov %rax, -24(%rbp) mov -24(%rbp), %rax mov $33, %rcx sub %rcx, %rax mov $1, %rcx add %rcx, %rax mov %rax, -32(%rbp) mov -32(%rbp), %rax mov $2, %rcx xor %rdx, %rdx div %rcx mov %rax, %rdi mov $60, %rax syscall pop %rbp Less 8 instructions! Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	gas: Compile variable expression with scope support	Johnny Richard
	This patch adds the variable compilation and uses a scope (a stack of map) to lookup for identities. Today we use a vector + ref_entry structs in order to achieve the scope implementation. The ref_entry lacks memory management, we are still no sure who will be the owner of the pointer. We also want to replace the scope a hashtable_t type as soon as we get one. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	polish: Remove unnecessary token creation when dropping token	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	ast: Rename variable and variable_declaration correctly	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	make: Add linter-fix target to Makefiles	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>