文章目录[隐藏]
最近在做编译原理课程设计的设计,看了很多到 LLVM 的编译器的想法,同时发现 Rust 类型体操作为黑魔法合集也能带给社区很多新鲜玩意,就把之前设计 Chocopy LLVM 层的一些小想法放在这,上科大的同学想玩可以加个piazza,invite code: CHOCOPY。有一部分参考 High Level Constructs to LLVM_IR, 范型的设计更多参考 rust 和 c。
Primitive Type
首先需要关注的是 Chocopy 的 primitive type。assembly 上的定义是
.globl $object$prototype
$object$prototype:
.word 0 # Type tag for class: object
.word 3 # Object size
.word $object$dispatchTable # Pointer to dispatch table
.align 2
.globl $int$prototype
$int$prototype:
.word 1 # Type tag for class: int
.word 4 # Object size
.word $int$dispatchTable # Pointer to dispatch table
.word 0 # Initial value of attribute: __int__
.align 2
.globl $bool$prototype
$bool$prototype:
.word 2 # Type tag for class: bool
.word 4 # Object size
.word $bool$dispatchTable # Pointer to dispatch table
.word 0 # Initial value of attribute: __bool__
.align 2
.globl $str$prototype
$str$prototype:
.word 3 # Type tag for class: str
.word 5 # Object size
.word $str$dispatchTable # Pointer to dispatch table
.word 0 # Initial value of attribute: __len__
.word 0 # Initial value of attribute: __str__
.align 2
.globl $object$dispatchTable
$object$dispatchTable:
.word $object.__init__ # Implementation for method: object.__init__
.globl $int$dispatchTable
$int$dispatchTable:
.word $object.__init__ # Implementation for method: int.__init__
.globl $bool$dispatchTable
$bool$dispatchTable:
.word $object.__init__ # Implementation for method: bool.__init__
.globl $str$dispatchTable
$str$dispatchTable:
.word $object.__init__ # Implementation for method: str.__init__
那么改写成llvm很容易,由于llvm是强类型的,所以先定义type才能定义数值。
%$int$prototype_type = type {
i32,
i32,
%$int$dispatchTable_type*,
i32
}
@$int$prototype = global %$int$prototype_type{
i32 1,
i32 4,
%$int$dispatchTable_type* @$int$dispatchTable,
i32 0
}
%$int$dispatchTable_type = type {
%$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$int$dispatchTable = global %$int$dispatchTable_type {
%$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}
%$bool$prototype_type = type {
i32,
i32,
%$bool$dispatchTable_type*,
i1
}
@$bool$prototype = global %$bool$prototype_type{
i32 2,
i32 4,
%$bool$dispatchTable_type* @$bool$dispatchTable,
i1 0
}
%$bool$dispatchTable_type = type {
%$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$bool$dispatchTable = global %$bool$dispatchTable_type {
%$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}
%$str$prototype_type = type {
i32,
i32,
%$str$dispatchTable_type*,
i32 ,
i8*
}
@$str$prototype = global %$str$prototype_type{
i32 3,
i32 5,
%$str$dispatchTable_type* @$str$dispatchTable,
i32 0,
i8* inttoptr (i32 0 to i8*)
}
%$str$dispatchTable_type = type {
%$object$dispatchTable_type(%$object$dispatchTable_type)*
}
@$str$dispatchTable = global %$str$dispatchTable_type {
%$object$dispatchTable_type(%$object$dispatchTable_type)* @$object.__init__
}
其次是 list prototype,原定义是
.globl $.list$prototype
$.list$prototype:
.word -1 # Type tag for class: .list
.word 4 # Object size
.word 0 # Pointer to dispatch table
.word 0 # Initial value of attribute: __len__
.align 2
这里有点tricky,最后一个__len__
后面在调用conslist
后会加一个VLA(Variable Length Array). 那这样的话在llvm的定义需要给每一个 [len x i32/str/list]
很烦,所以弄个了 union.conslist *
,简单明了,相应的原来的 conslist
也需要修改。
%$.list$prototype_type = type {
i32,
i32,
%$union.type ,
i32 ,
%$union.conslist*
}
@$.list$prototype = global %$.list$prototype_type{
i32 -1,
i32 5,
%$union.type {%$int$dispatchTable_type* undef, %$bool$dispatchTable_type* undef, %$str$dispatchTable_type* undef, %$object$dispatchTable_type* undef},
i32 0,
%$union.conslist* inttoptr (i32 0 to %$union.conslist*)
}
List的话如果不涉及传参或者调用stdlib必须要conslist
的情况会使用
stdlib
有一些函数的支持。
Class
type class
class A(object):
a:int = 42
def foo(self:"A", ignore:object) -> int:
return self.a
def bar(self:"A") -> int:
print("A")
return 0
class B(A):
b:bool = True
def __init__(self:"B"):
print("B")
def bar(self:"B") -> int:
print("B")
return 0
class C(B):
c:bool = True
def __init__(self:"C"):
print("C")
def foo1(self:"C") -> int:
print("B")
return 0
def bar(self:"C") -> int:
print("C")
return 0
def t():
def f():
return 0
return 0
d:str=input()
a:A=None
if d=="sb":
a=C()
else:
a=A()
print(a.bar())
dispatch table
.globl $A$dispatchTable
$A$dispatchTable:
.word $object.__init__ # Implementation for method: A.__init__
.word $A.foo # Implementation for method: A.foo
.word $A.bar # Implementation for method: A.bar
.globl $B$dispatchTable
$B$dispatchTable:
.word $B.__init__ # Implementation for method: B.__init__
.word $A.foo # Implementation for method: B.foo
.word $B.bar # Implementation for method: B.bar
.globl $C$dispatchTable
$C$dispatchTable:
.word $C.__init__ # Implementation for method: C.__init__
.word $A.foo # Implementation for method: C.foo
.word $C.bar # Implementation for method: C.bar
.word $C.foo1 # Implementation for method: C.foo1
Function
Frame Pointer & Stack Pointer
由于最新的llvm无法通过attributes #0 = { "frame-pointer"="all" }
获得,所以没有在llvm和后端使用Frame,完全使用a1-8寄存器传参,之后用0(sp)
。
Nested Function
由于 Python 支持嵌套函数,同时也支持使用nonlocal关键词来获得,最快的方法当然是看看c艹的Lambda函数这么搞的。
Global & Non-Local
如果标记了Global,就会是定义在全局的Global 变量访问。Non-Local是需要用class传入函数,由Non-Local的顺序决定在class struct中的位置。由c++的[&]
访问,便于修改在原位置。
%class.anon = type { i32* }
Lambda
捕获的外部变量需要自己分析出来,其实不难。
Optimization
现在支持的 Optimization。
静态分析,conslist=>list