高级语言 to LLVM 的解释层

最近在做编译原理课程设计的设计,看了很多到 LLVM 的编译器的想法,同时发现 Rust 类型体操作为黑魔法合集也能带给社区很多新鲜玩意,就把之前设计 Chocopy LLVM 层的一些小想法放在这,上科大的同学想玩可以加个piazza,invite code: CHOCOPY。有一部分参考 High Level Constructs to LLVM_IR, 范型的设计更多参考 rust 和 c。

Primitive Type

首先需要关注的是 Chocopy 的 primitive type。assembly 上的定义是

.globl $object$prototype
$object$prototype:
  .word 0                                  # Type tag for class: object
  .word 3                                  # Object size
  .word $object$dispatchTable              # Pointer to dispatch table
  .align 2

.globl $int$prototype
$int$prototype:
  .word 1                                  # Type tag for class: int
  .word 4                                  # Object size
  .word $int$dispatchTable                 # Pointer to dispatch table
  .word 0                                  # Initial value of attribute: __int__
  .align 2

.globl $bool$prototype
$bool$prototype:
  .word 2                                  # Type tag for class: bool
  .word 4                                  # Object size
  .word $bool$dispatchTable                # Pointer to dispatch table
  .word 0                                  # Initial value of attribute: __bool__
  .align 2

.globl $str$prototype
$str$prototype:
  .word 3                                  # Type tag for class: str
  .word 5                                  # Object size
  .word $str$dispatchTable                 # Pointer to dispatch table
  .word 0                                  # Initial value of attribute: __len__
  .word 0                                  # Initial value of attribute: __str__
  .align 2

.globl $object$dispatchTable
$object$dispatchTable:
  .word $object.__init__                   # Implementation for method: object.__init__

.globl $int$dispatchTable
$int$dispatchTable:
  .word $object.__init__                   # Implementation for method: int.__init__

.globl $bool$dispatchTable
$bool$dispatchTable:
  .word $object.__init__                   # Implementation for method: bool.__init__

.globl $str$dispatchTable
$str$dispatchTable:
  .word $object.__init__                   # Implementation for method: str.__init__

那么改写成llvm很容易,由于llvm是强类型的,所以先定义type才能定义数值。

%$int$prototype_type  = type  {
  i32,
  i32,
  %$int$dispatchTable_type*,
  i32 
}
@$int$prototype  = global %$int$prototype_type{
  i32 1,
  i32 4,
  %$int$dispatchTable_type* @$int$dispatchTable,
  i32 0
}
%$int$dispatchTable_type = type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$int$dispatchTable = global %$int$dispatchTable_type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}

%$bool$prototype_type  = type  {
  i32,
  i32,
  %$bool$dispatchTable_type*,
  i1 
}
@$bool$prototype  = global %$bool$prototype_type{
  i32 2,
  i32 4,
  %$bool$dispatchTable_type* @$bool$dispatchTable,
  i1 0
}
%$bool$dispatchTable_type = type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$bool$dispatchTable = global %$bool$dispatchTable_type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}

%$str$prototype_type  = type  {
  i32,
  i32,
  %$str$dispatchTable_type*,
  i32 ,
  i8* 
}
@$str$prototype  = global %$str$prototype_type{
  i32 3,
  i32 5,
  %$str$dispatchTable_type* @$str$dispatchTable,
  i32 0,
  i8* inttoptr (i32 0 to i8*)
}
%$str$dispatchTable_type = type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$str$dispatchTable = global %$str$dispatchTable_type {
  %$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}

其次是 list prototype,原定义是

.globl $.list$prototype
$.list$prototype:
  .word -1                                 # Type tag for class: .list
  .word 4                                  # Object size
  .word 0                                  # Pointer to dispatch table
  .word 0                                  # Initial value of attribute: __len__
  .align 2

这里有点tricky,最后一个__len__后面在调用conslist后会加一个VLA(Variable Length Array). 那这样的话在llvm的定义需要给每一个 [len x i32/str/list] 很烦,所以弄个了 union.conslist * ,简单明了,相应的原来的 conslist 也需要修改。

%$.list$prototype_type  = type  {
  i32,
  i32,
  %$union.type ,
  i32 ,
  %$union.conslist* 
}
@$.list$prototype  = global %$.list$prototype_type{
  i32 -1,
  i32 5,
  %$union.type {%$int$dispatchTable_type* undef,  %$bool$dispatchTable_type* undef,  %$str$dispatchTable_type* undef,  %$object$dispatchTable_type* undef},
  i32 0,
  %$union.conslist* inttoptr (i32 0 to %$union.conslist*)
}

List的话如果不涉及传参或者调用stdlib必须要conslist的情况会使用

stdlib

有一些函数的支持。

Class

type class

class A(object):
    a:int = 42
    def foo(self:"A", ignore:object) -> int:
        return self.a
    def bar(self:"A") -> int:
        print("A")
        return 0
class B(A):
    b:bool = True
    def __init__(self:"B"):
        print("B")
    def bar(self:"B") -> int:
        print("B")
        return 0
class C(B):
    c:bool = True
    def __init__(self:"C"):
        print("C")
    def foo1(self:"C") -> int:
        print("B")
        return 0
    def bar(self:"C") -> int:
        print("C")
        return 0
def t():
    def f():
        return 0
    return 0
d:str=input()
a:A=None
if d=="sb":
    a=C()
else:
    a=A()

print(a.bar())

dispatch table

.globl $A$dispatchTable
$A$dispatchTable:
  .word $object.__init__                   # Implementation for method: A.__init__
  .word $A.foo                             # Implementation for method: A.foo
  .word $A.bar                             # Implementation for method: A.bar

.globl $B$dispatchTable
$B$dispatchTable:
  .word $B.__init__                        # Implementation for method: B.__init__
  .word $A.foo                             # Implementation for method: B.foo
  .word $B.bar                             # Implementation for method: B.bar

.globl $C$dispatchTable
$C$dispatchTable:
  .word $C.__init__                        # Implementation for method: C.__init__
  .word $A.foo                             # Implementation for method: C.foo
  .word $C.bar                             # Implementation for method: C.bar
  .word $C.foo1                            # Implementation for method: C.foo1

Function

Frame Pointer & Stack Pointer

由于最新的llvm无法通过attributes #0 = { "frame-pointer"="all" }获得,所以没有在llvm和后端使用Frame,完全使用a1-8寄存器传参,之后用0(sp)

Nested Function

由于 Python 支持嵌套函数,同时也支持使用nonlocal关键词来获得,最快的方法当然是看看c艹的Lambda函数这么搞的。

Global & Non-Local

如果标记了Global,就会是定义在全局的Global 变量访问。Non-Local是需要用class传入函数,由Non-Local的顺序决定在class struct中的位置。由c++的[&]访问,便于修改在原位置。

%class.anon = type { i32* }

Lambda

捕获的外部变量需要自己分析出来,其实不难。

Optimization

现在支持的 Optimization。

静态分析,conslist=>list