My view of Mojo's success and Computer Architecture: A quantitative approach's fallacy

文章目录[隐藏]

David Patterson's Computer Architecture
MOJO
Reference

David Patterson's Computer Architecture

I think TPU is wrong, RVV is wrong, Google WSC price calculation is deprecated, and X86 is not as bad as RISCV, so I guess we need to revisit Computer Architecture: A Quantitative Approach. The main fallacy is that most of the work added is David's own work, neither guiding anything in the arch space nor having a profound impact that endures the testimony of time. I think Arch should have codesign, but not VLIW, and should not redo the things that have been discussed a long time ago. The ideology of the whole book misled the architect into having new ideas to thrive in this golden age. I'm saying this because I found Thead's fallacy in RVV and many other fallacies, and programmers' view of this is based on those misleading books.

MOJO

Implemented in GO and codegen to MLIR with a standard library implemented in C++. I would say it is currently just a frontend of Python that codegen to MLIR with cohesion to Python FFI and CFFI, like what I did for ChocoPy-LLVM [6]. I think Chris' idea is to map the Python semantics, especially the memory model, to Rust or C++ so that every memory can be managed as RAII with shared ptr plus some workarounds without GC. Suddenly, I feel that the transition from LLVM to MLIR is a very natural thing. Instead of defining a set for AMX, AVX512, and NVVM separately, it's better to integrate them.

Static Analysis

Class is not implemented yet; no multi-inheritance
"Try" is needed for mapping to the C++ exception model.
To increase speed, use the grammar sugar for calling the vector MLIR in [4], and parallel call primitives. It's seamlessly connected and has easily been called to WASM/WebGPU.

Implementation of LLDB and MLIR

Debug with Location info

Basically, C++ with MLIR and mapping back DWARF to mojo.
C++ ABI
The current mapping to the debugger of LLDB is not ready.

MLIR lowering to GPU/CPU heterogeneous code

var y : __mlir_type.i1
if x:
    y = __mlir_op.`index.bool.constant`[value : __mlir_attr.`true`]()
else:
    y = __mlir_op.`index.bool.constant`[value : __mlir_attr.`false`]()

-mcpu=sapphirerapids with avx512

This image has an empty alt attribute; its file name is image-7-1024x739.png

-mcpu=amdgpu call from cpu to gpu

Currently, there's no MLIR code generated, and I don't want to do RE to dump that. You can write some MLIR implementation in amdgpu to force heterogeneous code.

Reference

https://github.com/victoryang00/CS131-discussion/blob/main/11-discussion.tex
https://github.com/modularml/mojo/issues/3
https://mlir.llvm.org/docs/Dialects/AMDGPU/
https://mlir.llvm.org/docs/Dialects/MathOps/
https://mlir.llvm.org/docs/Dialects/IndexOps/
https://github.com/Chocopy-LLVM/chocopy-llvm