aboutsummaryrefslogtreecommitdiff
path: root/doc/Documentation.md
blob: f04a3a210fe99bbdef9d9f66a4a0bdec3cb85cfd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Instructions
Variable length instructions. Instruction size ranges from 1 to 4 bytes.

## Instruction formats
Instructions can be in 7 different formats:
1. 1 byte: Opcode(u8)
2. 2 bytes: Opcode(u8) + register(AmalReg)
3. 3 bytes: Opcode(u8) + register(AmalReg) + register(AmalReg)
4. 3 bytes:\
4.1 Opcode(u8) + intermediate(u16)\
4.2 Opcode(u8) + data(u16)\
4.3 Opcode(u8) + label(i16)\
4.4 Opcode(u8) + register(AmalReg) + num_args(u8)
5. 4 bytes: Opcode(u8) + register(AmalReg) + register(AmalReg) + register(AmalReg)
6. 4 bytes:\
6.1 Opcode(u8) + register(AmalReg) + label(i16)\
6.2 Opcode(u8) + register(AmalReg) + intermediate(u16)\
6.3 Opcode(u8) + register(AmalReg) + data(u16)\
6.4 Opcode(u8) + flags(u8) + num_local_var_reg(u16)\
6.5 Opcode(u8) + index(u8) + index(u16)

## Registers
Registers have a range of 128. Parameters have the most significant bit set while local variables dont.
Registers have the scope of functions and reset after instructions reach a new function (AMAL_OP_FUNC_START).

If import index for call and calle is 0, then that means the function resides in the same file the function call
is being called from. Which means that import index 1 is actually import index 0 into the import list.

@AmalReg is an alias for u8.

# Compiler flow
(Tokenize&parse -> Resolve AST -> Generate SSA -> Generate bytecode) -> Generate program\
Each step except the last is done using multiple threads in parallel and the output of each step is used
in the next step. The last step is not done in parallel because the last step is combining all bytecode
and writing it to a file, which is an IO bottlenecked operation and it won't benefit from multithreading
and may even lose performance because of it.

# Bytecode
The layout of the full bytecode is: Header (X Intermediates X Strings X Functions X External_Functions X Exported_Functions X Imports X Instructions)*\
Where the X is a magic number to make it easier to find errors while decoding the bytecode.\
The value of the magic number is @AMAL_BYTECODE_SECTION_MAGIC_NUMBER

# Bytecode header
## Header layout
|Type|Field        |Description                                                                 |
|----|-------------|----------------------------------------------------------------------------|
|u32 |Magic number |The magic number used to identify an amalgam bytecode file.                 |
|u8  |Major version|The major version of the bytecode. Updates in this is a breaking change.    |
|u8  |Minor version|The minor version of the bytecode. Updates in this are backwards compatible.|
|u8  |Patch version|The patch version of the bytecode. Updates in this are only minor bug fixes.|
|u8  |Endian       |Endian of the program. 0 = little endian, 1 = big endian.                   |

The versions in the header only changes for every release, not every change.

# Bytecode intermediates
## Intermediates layout
|Type          |Field             |Description                                                                    |
|--------------|------------------|-------------------------------------------------------------------------------|
|u32           |Intermediates size|The size of all intermediates, in bytes.                                       |
|Intermediate[]|Intermediate data |Multiple intermediates, where the total size is defined by @Intermediates size.|

## Intermediate
|Type|Field|Description                                         |
|----|-----|----------------------------------------------------|
|u8  |Type |The type of the number. 0=integer, 1=float.         |
|u64 |Value|The type of the value depends on the value of @Type.|

# Bytecode strings
## Strings layout
|Type    |Field            |Description                                                       |
|--------|-----------------|------------------------------------------------------------------|
|u16     |Number of strings|The number of strings.                                            |
|u32     |Strings size     |The size of all strings, in bytes.                                |
|String[]|Strings data     |Multiple strings, where the total size is defined by @Strings size|

## String
|Type|Field|Description                                                                            |
|----|----|----------------------------------------------------------------------------------------|
|u16 |Size|The size of the string, in bytes. Excluding the null-terminate character.               |
|u8* |Data|The data of the string, where the size is defined by @Size. Strings are null-terminated.|

# Bytecode functions
## Functions layout
|Type      |Field     |Description                                                                           |
|----------|----------|--------------------------------------------------------------------------------------|
|u16       |num_funcs |The number of non-extern functions.                                                   |
|u32       |funcs_size|The size of all functions, in bytes.                                                  |
|Function[]|Functions |Multiple non-extern functions, where the number of functions is defined by @num_funcs.|

## Function
|Type|Field                    |Description                                                                                                             |
|----|-------------------------|------------------------------------------------------------------------------------------------------------------------|
|u32 |func_offset              |The offset in the program code (machine code) where the function starts. Is always 0 until the program has been started.|
|u8  |num_params               |The number of parameters.                                                                                               |
|u32 |params_num_pointers      |The number of pointers in the parameters.                                                                               |
|u32 |params_fixed_size        |The size of all non-pointer type parameters, in bytes.                                                                  |
|u8  |num_return_types         |The number of return values.                                                                                            |
|u32 |return_types_num_pointers|The number of pointers in the return types.                                                                             |
|u32 |return_types_fixed_size  |The size of all non-pointer type return types, in bytes.                                                                |

# Bytecode external functions
## External functions layout
|Type               |Field             |Description                                                                              |
|-------------------|------------------|-----------------------------------------------------------------------------------------|
|u16                |num_extern_func   |The number of external functions.                                                        |
|u32                |extern_funcs_size |The size of all external functions, in bytes.                                            |
|External function[]|External functions|Multiple external functions, where the number of functions is defined by @num_extern_func|

## External function
|Type|Field                    |Description                                                                                          |
|----|-------------------------|-----------------------------------------------------------------------------------------------------|
|u8  |num_params               |The number of parameters.                                                                            |
|u8  |num_return_types         |The number of return values.                                                                         |
|u8  |name_len                 |The length of the external function name, in bytes. Excluding the null-terminate character.          |
|u8  |flags                    |The flags for the external function. The values are defined in @amal_func_flag.                      |
|u32 |params_num_pointers      |The number of pointers in the parameters.                                                            |
|u32 |params_fixed_size        |The size of all non-pointer type parameters, in bytes.                                               |
|u32 |return_types_num_pointers|The number of pointers in the return types.                                                          |
|u32 |return_types_fixed_size  |The size of all non-pointer type return types, in bytes.                                             |
|u8[]|name                     |The name of the external function, where the size is defined by @name_len. Names are null-terminated.|

# Bytecode exported functions
## Exported functions layout
|Type               |Field             |Description                                                                              |
|-------------------|------------------|-----------------------------------------------------------------------------------------|
|u16                |num_export_func   |The number of exported functions.                                                        |
|u32                |export_funcs_size |The size of all exported functions, in bytes.                                            |
|Exported function[]|Exported functions|Multiple exported functions, where the number of functions is defined by @num_export_func|

## Exported function
|Type|Field             |Description                                                                                                               |
|----|------------------|--------------------------------------------------------------------------------------------------------------------------|
|u32 |instruction_offset|The offset in the instruction data where the exported function is defined. Is always 0 until the program has been started.|
|u8  |num_args          |The number of arguments the functions has.                                                                                |
|u8  |name_len          |The length of the exported function name, in bytes. Excluding the null-terminate character.                               |
|u8[]|name              |The name of the exported function, where the size is defined by @name_len. Names are null-terminated.                     |

# Bytecode imports
## Imports layout
|Type    |Field       |Description                                                              |
|--------|------------|-------------------------------------------------------------------------|
|u8      |num_imports |The number of imports.                                                   |
|u32     |imports_size|The size of all imports, in bytes.                                       |
|Import[]|Import      |Multiple imports, where the number of imports is defined by @num_imports.|

## Import
|Type|Field                |Description                                                                             |
|----|---------------------|----------------------------------------------------------------------------------------|
|u32 |function_index       |The index in the bytecode where function header begins for the imported file.           |
|u32 |extern_function_index|The index in the bytecode where the extern function header begins for the imported file.|

# Bytecode instructions
## Instructions layout
|Type         |Field            |Description                                                                |
|-------------|-----------------|---------------------------------------------------------------------------|
|u32          |Instructions size|The size of the instructions section, in bytes.                            |
|Instruction[]|Instructions data|The instructions data. Each instructions begins with an opcode, see #Opcode|

# Execution backend
Amalgam supports multiple execution backend and they can be implemented with minimal
effort. The only requirement is implementation of all the functions in executor/executor.h
and adding the source file with the implementation to the build script. See executor/interpreter/executor.c
as an example.\
These functions are then called by amalgam as amalgam parses the amalgam bytecode when `amal_program_run` is called.