pipac.git - Pipa programming language

Age	Commit message (Collapse)	Author
2023-05-03	parser: Use lookahead instead of consuming tokens	Carlos Maniero
	Previously, during block declaration, the parser consumed the token which caused some parsers (such as return and variable declaration) to not be self-contained and to depend on the callee to start the parser. In this commit, I've refactored the parser to only look for future tokens using lookahead, and delegate the consumption to child parser functions. This results in a more modular and self-contained parser that improves the overall maintainability and readability of the code. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	parser: Refactor return statement to return an ast_node	Carlos Maniero
	During the refactoring process, I identified a memory leak where the return argument was allocated but not freed in case of an error. It also introduces the concept of keyword tokens. Where return is now a keyword simplifying the parser. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	Parser: Make the parser function return the ast_node	Carlos Maniero
	In many situations, the parser is responsible for reserving memory for nodes, particularly during function body parsing. This commit introduces a new standard where parser functions not only allocate memory for ast_nodes, but also return them. In case of a parser error, a NULL pointer is returned. This standard will be extended to other parsers in future commits, ensuring consistency throughout the codebase. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-03	style: Improve ast node initialization	Carlos Maniero
	This also removes the identifier node since it was replaced by variable. Signed-off-by: Carlos Maniero <carlos@maniero.me>
2023-05-01	parser: Implement variable assignment	Johnny Richard
	This commit introduces variable assignment making it possible to change a variable value. Example: myvar: i32 = 1; myvar = 2; Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlos@maniero.me>
2023-05-01	parser: Use peek and drop token when parsing expressions	Johnny Richard

2023-05-01	lexer: Peek next token	Johnny Richard
	The only way to get the next token was by consuming it. So then, our parser starts to become hard to understand, once sometimes we just want to take a look on the next token to understand what should be the next kind of expression. This commit introduces a new function that will help us to improve our parser implementation. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Reviewed-by: Carlos Maniero <carlos@maniero.me>
2023-04-30	style: Invert parameters order on parser_parse_type	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	gas: Optimize variable reference on assembly	Johnny Richard
	We were moving the stack data for variable reference to another stack position ending up with two pointer to the same value. // a: i32 = 1; mov $1, -8(%rbp) // b: i32 = a; mov -8(%rbp), %rax mov %rax, -24(%rbp) mov -24(%rbp), %rax mov %rax, -16(%rbp) After this changes, we wont create a new temp space on stack if we don't need it. See bellow the example after the optimization: // a: i32 = 1; mov $1, -8(%rbp) // b: i32 = a; mov -8(%rbp), %rax mov %rax, -16(%rbp) Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	style: Rename evaluation kinds on gas generator	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	gas: Optimize the stack utilization	Carlos Maniero
	Until now, every computation was pushed onto stack witch creates unnecessary stack manipulation and makes the generated code hard to read and understand. Now, the latest computation is stored and could be either a literal or a value on a register. When it is a register we may need to push the value to stack to avoid data loss. Now if it is a literal, hence, we can just set the value onto a register. example/main.pipa before this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $69, %rax ; <- There is no reason to store data in rax mov %rax, %rdi mov $60, %rax syscall pop %rbp example/main.pipa after this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $69, %rdi ; <- Fixed! mov $60, %rax syscall pop %rbp example/variables.pipa before this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $12, %rax mov %rax, -8(%rbp) mov $32, %rax mov %rax, -16(%rbp) mov -8(%rbp), %rax mov %rax, -32(%rbp) mov -16(%rbp), %rax mov -32(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov $2, %rax mov -32(%rbp), %rcx mul %rcx mov %rax, -24(%rbp) mov $1, %rax mov %rax, -40(%rbp) mov $33, %rax mov %rax, -48(%rbp) mov -24(%rbp), %rax mov -48(%rbp), %rcx sub %rcx, %rax mov -40(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov $2, %rax mov %rax, -40(%rbp) mov -32(%rbp), %rax mov -40(%rbp), %rcx xor %rdx, %rdx div %rcx mov %rax, %rdi mov $60, %rax syscall pop %rbp example/variables.pipa after this commit: .global _start .text _start: push %rbp mov %rsp, %rbp mov $12, -8(%rbp) mov $32, -16(%rbp) mov -8(%rbp), %rax mov %rax, -32(%rbp) mov -16(%rbp), %rax mov -32(%rbp), %rcx add %rcx, %rax mov %rax, -32(%rbp) mov -32(%rbp), %rcx mov $2, %rax mul %rcx mov %rax, -24(%rbp) mov -24(%rbp), %rax mov $33, %rcx sub %rcx, %rax mov $1, %rcx add %rcx, %rax mov %rax, -32(%rbp) mov -32(%rbp), %rax mov $2, %rcx xor %rdx, %rdx div %rcx mov %rax, %rdi mov $60, %rax syscall pop %rbp Less 8 instructions! Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	gas: Compile variable expression with scope support	Johnny Richard
	This patch adds the variable compilation and uses a scope (a stack of map) to lookup for identities. Today we use a vector + ref_entry structs in order to achieve the scope implementation. The ref_entry lacks memory management, we are still no sure who will be the owner of the pointer. We also want to replace the scope a hashtable_t type as soon as we get one. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	polish: Remove unnecessary token creation when dropping token	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	ast: Rename variable and variable_declaration correctly	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	parser: Registry identifiers on scope	Johnny Richard
	We are parsing variables/functions and checking if they are defined on scope. Otherwise we fail the parsing with a nice message. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-30	style: Add void to function without arguments	Johnny Richard
	After run `make CC=clang` we found the following problems: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-30	style: Add -Wmissing-declarations to CC CFLAGS	Johnny Richard
	The refactoring also replace a if statement by switch statement. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-29	ast: Introduce ast_identifier_t for named ast nodes	Carlos Maniero
	Prior to this change, ast_variable_declaration_t and ast_function_declaration_t used a string_view as an identifier. However, to support scoped identifiers, it is more appropriate to use an ast_identifier_t as a reference. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-29	scope: Add a scope stack for identifier resolutions	Carlos Maniero
	Before accepting an identifier, the parser should check if that identifier will be available. With this implementation it will be possible. Take the following code example: main(): i32 { return my_exit_code; } The parser must return an error informing that my_exit_code is not defined in the example above. The ast scope is a support module for parser and ast, simplifying identifier resolution. Once a curly bracket ({) is open the scope_enter() is called and when it is closed (}) we pop the entire stack with scope_leave(). Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-29	ast: Remove ast visitor pattern to simplify the code	Johnny Richard
	I decided to remove the visitor pattern due to the lack of Object Oriented Programming support for C. Now if you want to navigate through the AST, you should do it with switch case and recursion. The code looks way simpler without visitor pattern. I have added a CFLAG -Werror which validates if the switch statement covers all branches for a given enum at compile time. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-26	ast: Include a Binary Operation kind enum	Carlos Maniero
	The AST was using a string view to distinguish the operation kind. An enum was created for this purpose simplifying code generation. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
2023-04-26	lexer: Remove duplicated validation	Carlos Maniero
	Since there is a guard-cause checking if the token is EOF there is no need to check it again and again. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
2023-04-26	lexer: Split operation tokens into their own token	Carlos Maniero
	The +, -, *, and / tokens used to be TOKEN_OP, but the TOKEN_OP has been removed and a token for each operation has been introduced. Python's token names were followed: https://docs.python.org/3/library/token.html Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichar.com>
2023-04-25	style: Use clang-format as formatter and linter tool	Johnny Richard
	We want to keep the code style consistent, this first commit adds a .clang-format in order to "document" our style code. This patch also adds a target linter to Makefile which will complain if we have any style issue on test and src dirs. I have run the follow command to create the .clang-format file: $ clang-format -style=mozilla -dump-config > .clang-format And I also made some adjusts to .clang-format changing the following properties: PointerAlignment: Right ColumnLimit: 120 Commands executed to fix the current styling: $ find . -name .h \| xargs clang-format -i $ find . -name .c \| xargs clang-format -i Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-25	parser: Add support for variables and identifiers in function body	Carlos Maniero
	This commit adds support for variables and identifiers in the function body of the parser, stored as a vector. However, at this point, identifier resolution is not fully implemented, and we currently accept identifiers without checking if they can be resolved. This is a known limitation that will be addressed in a future commit once hash-tables are added to the parser. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-24	util: Implement dynamic vector array for storing AST children	Johnny Richard
	Previously, we lacked a dynamic array for storing children elements in our abstract syntax tree (AST). This commit introduces a new implementation that dynamically adjusts its capacity as elements are added, using a doubling strategy. I considered two approaches for managing the vector's memory allocation: allocating it on the heap, or providing a vector_init function that allocates only the items array. Ultimately, I decided to provide a vector_new function for instantiating the vector, as this aligns with the expected usage pattern when there is a destroy function. With this new implementation, we can efficiently store and manage AST children, enabling more flexible and expressive tree structures. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-21	gas: Generate arithmetics expressions	Carlos Maniero
	We decided for using push and pop to simplify the implementation, we want to revisit the approach latter. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-21	ast: Create an init function for ast_binary_operation_t	Carlos Maniero
	Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Co-authored-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-21	parser: Parse integers arithmetic expression	Johnny Richard
	This patch implements the AST creation for arithmetic expressions. NOTE: The implementation works only for integer numbers. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> Reviewed-by: Carlos Maniero <carlosmaniero@gmail.com>
2023-04-20	gas: Remove duplicated inst when generating exit SYSCALL	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-20	parser: Create the literal node type	Carlos Maniero
	Since we want to extend our code to support multiple kind of expression it does not make sense that the return statement always return a number. For now on, return statement has an ast_node_t as argument, meaning that it could be anything. The literal_node_t was also implemented in order to keep the application behavior. Following the C's calling convention the literal values are stored at %eax and the return takes this argument to do anything it is needed. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-20	parser: Stop exiting on parser error	Carlos Maniero
	Previously, when an error occurred during parsing, the application would exit, making it difficult to test the parser and limiting the compiler's extensibility. This commit improves the parser's error handling by allowing for continued execution after an error, enabling easier testing and increased flexibility. The parser is prepared to handle multiples errors, although the current implementation always returns a single error, it may be useful given multiples functions where we can show errors by context. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviwed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-20	ast: Allows recursive nodes	Carlos Maniero
	Previously, the abstract syntax tree (AST) used static types, meaning that an ast_function_t would always have a ast_return_stmt_t as its body. However, this assumption is not always true, as we may have void functions that do not have a return statement. Additionally, the ast_return_stmt_t always had a number associated with it, but this too is not always the case. To make this possible, I need to perform a few changes in the whole project. One of the main changes is that there is no longer the inheritance hack. That mechanism was replaced by composition and pointers where required for recursive type reference. It is important to mention that I decided to use union type to implement the composition. There is two main advantages in this approach: 1. There is only one function to allocate memory for all kind of nodes. 2. There is no need to cast the data. In summary, this commit introduces changes to support dynamic typing in the AST, by replacing the inheritance hack with composition and using union types to simplify memory allocation and type casting. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-18	style: Fix identation on lexer.c	Carlos Maniero
	Co-authored-by: Johnny Richard <johnny@johnnyrichard.com> Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Link: https://lists.sr.ht/~johnnyrichard/pipalang-devel/%3C20230418165847.3798-1-carlosmaniero%40gmail.com%3E
2023-04-18	lexer: Add tokenizer for OP and UNKNOWN tokens	Johnny Richard
	We want to tokenizer arithmetic expressions. We are handling exceptional cases with UNKNOWN token. Co-authored-by: Carlos Maniero <carlosmaniero@gmail.com> Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-18	lexer: Extract tokenization functions	Carlos Maniero
	make the next token function small by extracting the functions that make tokens. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-18	lexer: extract the lexer_drop_spaces	Carlos Maniero
	Extracted logic for skipping empty characters into a separate function. No change in lexer behavior. Signed-off-by: Carlos Maniero <carlosmaniero@gmail.com> Reviewed-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-18	ast: Create AST visitor to traverse the tree	Johnny Richard
	In the future we want to have the possibility of traverse the tree and pretty print it or generate binary for other platform like LLVM or transpile to C. This solution also implements the gas assembly x86_64 Linux code generation by using the visitor interface. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-16	lexer: Extract lexer_define_literal_token_props function	Johnny Richard
	This is an attempt of reducing code duplication. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-16	Start using string_view on lexer and parser	Johnny Richard
	This change fixes the memory leak when token got created. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-16	util: Create string_view tool to optimize memory usage	Johnny Richard
	We are allocating heap memory to create tokens value, we can minimize the number of allocations if we start using string_view. We have other problems, right now the tokens value ownership are quite unclear once the AST nodes also share the memory allocation done by token_get_next_token function. It's important to clarify we also have memory leaks on the current implementation. Hence, we are going to start using string_view to make the memory management easier. :^) Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-15	parser: Generate GAS 64-bit assembly for linux	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-15	cli: Remove irrelevant information when loading source	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-15	parser: Show filepath row and col when parsing fails	Johnny Richard
	In order to find out where a parsing error occurred, this patch introduces the exactly location following the format 'file:row:col'. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-15	parser: Create parser for function with return statements	Johnny Richard
	This is a very limited parser implementation which parses a single function with return type i32 and body containing a return number statement. The parser doesn't show the 'filepath:row:col' when it fails, a future improvement would be display it to easy find where the compilation problem is located. The ast_nodes are taking the token.value ownership (which is a really bad design since not all token.value ownership has been taken causing memory leaking) but we never free them. For a future fix we could use a string_view instead since we never change the original source code. The string_view will also improve the performance a lot avoiding unnecessary heap memory allocation. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-15	build: Enable warning and debug CFLAGS	Johnny Richard
	After enabling the warning flags, the compiler was firing the following warnings: warning: implicit declaration of function ‘strdup’; did you mean ‘strcmp’? [-Wimplicit-function-declaration] token->value = strdup("("); ^~~~~~ strcmp warning: assignment to ‘char ’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] token->value = strdup("("); ^ In order to fix these warnings above, I have decided to replace strdup* and strndup by strcpy and strncpy functions. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-14	lexer: Extract lexer.c and lexer.h from pipa.c	Johnny Richard
	Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
2023-04-14	build: Move *.c to src folder	Johnny Richard
	We want to have different folders for src and objs files. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>