Running a Program - CALL(Compiling, Assembling, Linking, and Loading)

Clarifications

Project 1 RISC-v emulator

  1. Risc-v isa does not define assembly syntax
    behave exactly like Venus
  2. All I-Type instructions (including sltiu)
    • do sign-extension
    • (in venus) input number is signed, even if hex

a good tool

Interpretation

  1. any good reason to interpret machine language in software? debbuger Venus
  2. translated/compiled code almost always more efficitent and therfore higher performance:
    • important for many applications, particularly operating systems.

Steps in compiling a C program

Psuedo-instruciton Replacement

Producing Machine Language

  • simple case
    • arithmetic, logical, shifts and so on
    • All necessary info is within the instruction already
  • branches can be optimized

parallel programing 23 parallel fft

One of the most import algorithm in the new century

Intro to the fft

The basic equation of the FFT

\(F(\omega)=F|f(t)|=\int ^{+\infty}_{-\infty}f(t)e^{-j\omega t}dt\)

Roots of unity

\[
\begin{array}{l}\text { An n'th root of unity is an } \omega \text { s.t. } \ \omega^{n}=1 \text { . } \ \text { There are n roots n'th roots of } \ \text { unity, and they have the form } \ e^{\frac{2 \pi i k}{n}}, \text { for } 0 \leq k \leq n-1 \text { . } \ \text {Write } \omega_{n}=e^{\frac{2 \pi i}{n}} \text { , so that the n'th } \ \text { roots of unity are } \omega_{n}^{0}, \omega_{n}^{1}, \ldots, \omega_{n}^{n-1} \end{array}\]

Some problems to differ DT,DFT,FFT,IFFT

They are Fourier Transform, Discrete Fourier Transform, Fast Fourier Transform and Inverse Fourier Transform.
The transform factor:
\(\text { Fact } 1 \omega_{n}^{n}=1 \text { . } \ \text { Fact } 2 \omega_{n}^{k+\frac{n}{2}}=-\omega_{n}^{k} \ \text { Fact } 3\left(\omega_{n}^{k}\right)^{2}=\omega_{n}^{2 k}=\omega_{n / 2}^{k}\)

Why we should have the DFT.
Because in theory, all the data stored in computer is Discrete. So we have to use the equation \(X(k)=\sum^{N-1}_0x(n)W^{kn}_N(k\in \mathbb{N})\)

The Transform factor is used to prove the
1) Periodicity
\(W_{N}^{m+l N}=W_{N}^{m},\)\) where \(\(: W_{N}^{-m}=W_{N}^{N-m}\)
2) Symmetry
\(W_{N}^{m+\frac{N}{2}}=-W_{N}^{m}\)
3) Contractability
\(W_{N / m}^{k}=W_{N}^{m k}\)
4) Special rotation factors
\(W_{N}^{0}=1\)

  1. Why Fourier Fast Algorithm (aka FFT)?

FFT is a DFT-based algorithm designed to accelerate the computational speed of DFT.

Given a degree \(n\) -1 polynomial
\(A(x)=a_{0}+a_{1} x+a_{2} x^{2}+\dots+a_{n-1} x^{n-1}\)
DFT computes \(A\left(\omega_{n}^{0}\right), A\left(\omega_{n}^{1}\right), \ldots, A\left(\omega_{n}^{n-1}\right)\)
since \(A(x)=a_{0}+x\left(a_{1}+x\left(a_{2}+\cdots\right) \ldots\right)\)
\(A(x)\) can be evaluated in \(\mathrm{O}(\mathrm{n})\) time and
\(\mathrm{O}(1)\) memory.

  • DFT can be computed trivially in \(\mathrm{O}\left(\mathrm{n}^{2}\right)\)
    time.

For the DFT formula computer implementation the complexity is o(N²), while the complexity by FFT calculation is reduced to: N×log2(N)

  1. What is the sequence split extraction in FFT?

The sequence splitting process of FFT is the extraction process, which is divided into: extraction by time and extraction by frequency.

1) Extraction by time (also called parity extraction)


2) Frequency, which we don’t apply here.

  1. How does FFT reduce the amount of computation?

In simple terms, mathematicians use the above mentioned properties of the rotation factor W such as periodicity, symmetry, etc. to simplify the formula. In algorithmic programming, new points are constantly counted using points that have already been counted, i.e., old points count new points.

Theoretically, FFT computes the DFT in O(n log n) time using divide and conquer.
\square Suppose n is a power of 2.
Let \(A^{[0]}=a_{0}+a_{2} x^{1}+a_{4} x^{2}+\cdots+a_{n-2} x^{\frac{n}{2}-1}\)
\(A^{[1]}=a_{1}+a_{3} x^{1}+a_{5} x^{2}+\cdots+a_{n-1} x^{\frac{n}{2}-1}\)
Then \(A(x)=A^{[0]}\left(x^{2}\right)+x A^{[1]}\left(x^{2}\right)\).
So can compute \(A\left(\omega_{n}^{0}\right), A\left(\omega_{n}^{1}\right), \ldots, A\left(\omega_{n}^{n-1}\right)\) by computing \(A^{[0]}\) and \(A^{[1]}\)
at \(\left(\omega_{n}^{0}\right)^{2},\left(\omega_{n}^{1}\right)^{2}, \ldots,\left(\omega_{n}^{n-1}\right)^{2}\), and multiplying some terms by
\(\omega_{n}^{0}, \omega_{n}^{1}, \ldots, \omega_{n}^{n-1}\).
But \(\left(\omega_{n}^{k+n / 2}\right)^{2}=\omega_{n}^{2 k+n}=\omega_{n}^{2 k}=\left(\omega_{n}^{k}\right)^{2}\) by Fact 1.
A So \(\left\{\left(\omega_{n}^{0}\right)^{2},\left(\omega_{n}^{1}\right)^{2}, \ldots,\left(\omega_{n}^{n-1}\right)^{2}\right\}=\left\{\left(\omega_{n}^{0}\right)^{2},\left(\omega_{n}^{1}\right)^{2}, \ldots,\left(\omega_{n}^{n / 2-1}\right)^{2}\right\},\) i.e. only need
to evaluate \(A^{[0]}\) and \(A^{[1]}\) at n/2 points instead of n.
Also, \(\left(\omega_{n}^{k}\right)^{2}=\omega_{n}^{2 k}=\omega_{n / 2}^{k}\)

Note: Simply splitting a multipoint sequence into a less point sequence without simplification is not a reduction in arithmetic volume!!! Splitting is only in the service of simplification, using the spin factor is the key to arithmetic reduction!!!

Time Complexity:
Thus, computing \(A(x)=A^{[0]}\left(x^{2}\right)+x A^{[1]}\left(x^{2}\right)\) for
\(x \in\left\{\omega_{n}^{0}, \omega_{n}^{1}, \ldots, \omega_{n}^{n-1}\right\}\) requires
\(\square\) Computing for \(A^{[0]}(x)\) and \(A^{[1]}(x)\) for \(x \in\)
\(\left\{\omega_{n / 2}^{0}, \omega_{n / 2}^{1}, \ldots, \omega_{n / 2}^{n / 2-1}\right\}\)

  • These are also DFT’s, so can be done recursively using two
    n/2-point FFT’s.
    \square For \(0 \leq k \leq \frac{n}{2}-1\)
    \[
    \begin{array}{l}\qquad A\left(\omega_{n}^{k}\right)=A^{[0]}\left(\omega_{n / 2}^{k}\right)+\omega_{n}^{k} A^{[1]}\left(\omega_{n / 2}^{k}\right) \ \begin{array}{l}\qquad A\left(\omega_{n}^{k+n / 2}\right)=A^{[0]}\left(\omega_{n / 2}^{k+n / 2}\right)+\omega_{n}^{k+n / 2} A^{[1]}\left(\omega_{n / 2}^{k+n / 2}\right) \ =A^{[0]}\left(\omega_{n / 2}^{k}\right)-\omega_{n}^{k} A^{[1]}\left(\omega_{n / 2}^{k}\right)\end{array}\end{array}
    \]

  1. Butterfly operation?

For a butterfly operation, you can understand it as an operation that is customizable by a graph.

The left side is the input and the right side is the output, for the letters on the horizontal line there are two cases.

1) A number on the left end line (none means C and D are 1).

2)The right end lines have the numerical representation, but if none, C & D are 0s.

The FFT takes out odd and even in accordance to time to change the original sequence. which have to sequentialize it to make the algorithm meet the required sequence. The new sequence is the reverse binary sequence of the original ones.

For example \((a_0 a_4 a_2 a_6 a_1 a_5 a_3 a_7)\) have the binary sequence \((000,100,010,110,001,101,011,111)\).
The reverse order can be simply treated as the change of 2 near binary number, which in this case is:\((a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7)\);

Which is \((000,001,010,011,100,101,110,111)—>(000,100,010,110,001,101,011,111)\).

The code for this transformation:

for(I=0;I<N;I++) // According to law 4, you need to perform inter-code inverse order for all elements of the array
{  
  /* Get the value of the inverse J of subscript I*/ 
  J=0;
  for(k=0;k<(M/2+0.5);k++) //k indicates operation
     {
        /* Reverse sequence operation*/
        m=1;//m is a binary number with a minimum of 1
        n=(unsigned int)pow(2,M-1);//n is a binary number with the Mth degree of 1.
        m <<= k; // for m move left k
        n>>= k; //shift k bits to the right of n
        F0=I & n;//I & n by the kth position of the first half of the extracted
        F1=I & m;//I with m-pressure bit corresponding to the lower part of the second half of the extracted F0
        if(F0) J=J | m; //J and m are in position or so that F0 corresponds to a low position of 1
        if(F1) J=J | n; //J and n are in the same position or so that F1 corresponds to a high level of 1
     }
   if(I<J)
    {
      Temp=A[I];
      A[I] =A[J];
      A[J]=Temp;
    }                                
}

The butter fly operation:
Now let’s imagine that if we want to program the FFT algorithm, the most basic implementation of the FFT algorithm is the butterfly operation, for any butterfly such as.

When N=8.

As can be seen from the above figure, to perform the butterfly operation, we have to solve the following problems.

  • Interval B between two input data

  • Determination of the rotation factor W, including.

    • Determination of the L-level rotation index P.
    • Determination of the type of Level L W.
    • The interval between the same W in level L.

\(\left\{\begin{array}{l}T_{R}=X_{R}(\mathrm{j}+B) \cos \frac{2 \pi}{N} p+X_{I}(j+B) \sin \frac{2 \pi}{N} p \cdots(1) \ T_{\mathrm{I}}=X_{I}(j+B) \cos \frac{2 \pi}{N} p-X_{R}(j+B) \sin \frac{2 \pi}{N} p \cdots(2) \ \mathrm{A}_{\mathrm{R}}(\mathrm{j})=X_{R}(\mathrm{j})+T_{R} \ \mathrm{A}_{\mathrm{I}}(\mathrm{j})=X_{I}(\mathrm{j})+T_{\mathrm{I}} \ \mathrm{A}_{\mathrm{R}}(\mathrm{j}+\mathrm{B})=X_{R}(\mathrm{j})-T_{R}(5) \ \mathrm{A}_{\mathrm{I}}(\mathrm{j}+\mathrm{B})=X_{I}(\mathrm{j})-T_{\mathrm{I}}(6)\end{array}\right.\)

for(L=1; L<=M;L++) //FFT butterfly level L from 1--M
{  
  /* L-level operations*/  
  //First calculate the interval B = 2^(L-1);
  B = 1;
  B = (int)pow(2,L-1);
  for(j=0; j<=B-1;j++)
  {
    /* Homogeneous butterfly operation*/  
    // First increment k=2^(M-L)
    k = (int)pow(2,M-L);
    // Calculate the rotation index p in increments of k, then p = j*k
    P=1;
    P=j*k;
    for(i=0; i<=k-1;i++)
    {
      /* Perform butterfly operations*/
      //Array calibrated to r
      r=1;
      r=j+2*B*i;
      Tr=dataR[r+B]*cos(2.0*PI*p/N) + dataI[r+B]*sin(2.0*PI*p/N);
      Ti=dataI[r+B]*cos(2.0*PI*p/N) - dataR[r+B]*sin(2.0*PI*p/N);
      dataR[r+B]=dataR[r]-Tr;
      dataI[r+B]=dataI[r]-Ti;
      dataR[r]=dataR[r]-Tr; dataI[r+B]=dataI[r]-Ti; dataI[r]-Ti; dataR[r]=dataR[r]+Tr;
      dataI[r]=dataI[r]]+Ti;
    }
  }
}

IFFT is the reverse of the above operation.

  1. What if we take it on the mesh or hypercube to make it scalable on gpu oprations?

Hpercube:

  • Consider the binary exchange algorithm on a hypercube architecture.
  • Each processor connected to d others, which differ in each digit of ID.
    • Communication only with neighbors, send n/p values each time.
    • since d = log p rounds of communication, communication time \(T_{c}=\) \(t_{s} \log p+t_{w} \frac{7}{p} \log p .\)
  • Each stage does n/p computation.
    • Total computation time \(T_{p}=\frac{t_{c} n}{p} \log n\).
  • Efficiency is \(E=\frac{T_{p}}{T_{p}+T_{c}}=1 /\left(1+\frac{t_{s} p \log p}{t_{c} n \log n}+\frac{t_{w} \log p}{t_{c} \log n}\right)\)
  • Define \(K\) such that \(E=1 /(1+1 / K) .\)
  • For isoefficiency, want last two terms in denominator to be constant.
  • \(\frac{t_{s} p \log p}{t_{c} n \log n}=\frac{1}{K}\) implies \(n \log n=W=K \frac{t_{s}}{t_{c}} p \log p .\)
  • \(\frac{t_{w} \log p}{t_{c} \log n}=\frac{1}{K}\) implies \(\log n=K_{t_{c}}^{t_{w}} \log p,\) so \(n=p^{K t_{w} / t_{c}},\) so
  • \(W=K \frac{t_{w}}{t_{c}} p^{K t_{w} / t_{c}} \log p\)

The efficiency for this case depends on \(t_c,t_s,t_w\), the wait time is the tradeoffs between two different lines. which is:

From the Efficiency law
, we have once \(\frac{Kt_w}{t_c}>1\), the increasing time is polynomial with regard to the processor count.

Mese:

2D transpose FFT: The best:

Reference

  1. Prof Fan’s PPT
  2. Implement FFT in C by GoKing
  3. Implement FFT on QPU rpi3b+ by Andrew Holme

Some invitation with ICE Zhang @Pennstate

Good to see that the emerging youngsters of Generation Z in China. When it comes to the Zhihu Question "What are 00s anxious about?" ICE Zhang points out a variety of great people. My answer was OIer, IMOer and those who have a lot of time digging one single target without any interference will temporarily take up the "bill board". But with time, people will get to realize this world need hero, a person who can see through the greatest evolution of the world and provide surging advise.

Zhang maybe that kind of person. I would say it's really hard to invite such a guy to Shanghai. But hopefully, he has a great amount of Gay Joy in Shanghai, mostly PingCap employees and his friends and his friends's friends in minority. I would say, before my internship, I never jump into the networking with strangers. But I found it funny. I dived into Persistent Memory stuff which is ideal for me. One man does not build the Rome. I need others' info and their progress to push myself. Only through others' progress can see my limit and inability and lack of determination.

I'm informed that the gay with ICE yesterday was even not enter her adulthood. The Turing Award Winner compose his poem at early 20s. I would say, I start late and I don't think I could dig as far as her. I'm a poor Math guy, even may fail at Daniel's Abstract Mathematic. I could have been captivated by Group, Ring and Field. I could have made systematic view of analysis and geometry. I could have write more elegant code to let other people endorse type theory. I can not because I've got only one body and one stream of mind.

I'm in favor of pursuing my own heart more. I find myself limited time in this place. I have to make all the stuff come true!

关于和龙哥的比赛

我和龙哥以前大概还有大概一样的兴趣,只是貌似他自从icpc退役又回归以后,就在没有管过比赛。他也是个执拗的人,我无能为力。只是我还算比较负责。

MIPS 龙芯杯

我辜负了哈老师,还是我菜,我想自己做的,先mark一下明年找学弟好好做吧。我选一下CA2和EE的课。我自己做个多核cpu。

动力:自己造机、看了mesh之后的心动。

编译器

chibin 直接放弃,龙哥不想搞,yezhe无力。我就算了。

无人机比赛

我持续到底吧。

工作以后的作为

打了一个量化交易的比赛,说实话感受到了自己的无能,虽然写代码的能力不错,可是数学比别人上学太多了。果然一个不是拿IMO的物竞大佬不是一个好程序员,可能是最近厉害的人见的太多了。

量化交易比赛

是个对Future Bund 的仿真程序,有点我自己想写的高频模拟市场的味道,可是就是没时间。一开始我们着重于做alpha,主要就是对有的几个feature做Linear regression。也许是那个写的人非常刚愎自用,其实我没有写很多代码,我就帮忙弄弄环境,跑跑back testing, 优化优化速度,还有重构代码。那几天和龙哥闹矛盾,导致我没睡好,所以写出来的代码挺多bug。

两天后交了Alpha,被牛津大佬 Michael Ng 草虐,我们的研判是速度太慢,于是就只搞Market Making。后来又被Quickie的IOC 搞了。咳咳,我们又拿了他们的代码修理了一番,最后拿了个第三。

据许return 说,Alpha is the most important。是我们太菜了。

CPU Affinity

无事发生,就是做做kernel conf 的自动化,写个算法就行。

和其他学校同学的比较。

《夜航船记》说的好,在赴京赶考的路上,总是一个慢慢发现自己牛逼的过程。我觉得我们学校以及我过去十余年最好的培养就是实践。也是jump招我的原因吧。可是我从别人身上学到了很多,至少说的话都很精确,不会产生明显的谬误,且挺实干的,可是有努力的也有天赋型的,有狂野的denghz和东方也有稳重的Polytechinique 的yiran。

我有优点也有缺点吧。只是有的时候有点锉。

有关未来的出路

如果return 我绝壁三年提前毕业。如果没的话,好好考博士。

其yu就靠共勉。

GeekPie2020谜题项目-设计稿

须知

  1. 每个Stage最多包含5个页面;你需要描述每个网页的大致内容、包含的文件、上下级页面(无则写无);5个页面可以是并行的

  2. 每个Stage需要有2~5种已知通关方案,需要给出“过剩”的线索

  3. 如果使用了头脑风暴中的idea,请在[Meeting0]作相应的记号

  4. 一定要确保谜题逻辑上连贯; 请充分考虑开发实现难度。

  5. 请务必将此链接以及有关内容保密

Stage1 [Author: 张启煊|邱龙田]

Ø 页面

序号页面名称上级页面下级页面页面内容描述/包含文件描述
1-1宣传图/加入游戏wps8fjxMy
1-2加入游戏宣传图Stage2开启按钮恭喜你通过宣传图中的线索顺利来到了活动的注册页面;与此同时,我们也需要简单检验一下您的身份。1. 请为自己取一个ID[输入框]2. 我校信息学院、生命学院、物质学院的主题色分别是?[3*7=21种颜色选3]3. 我校的校训是?[输入框][提交]return[0]: 答案错误return[1-输入苟…]: 说错话是要负责任的呀!答案错误return[2-通过]: 恭喜你注册成功!Stage2的网址就在你的答案中
1-3Stage2开启按钮加入游戏Stage21.恭喜你达到了Stage1的终点!请给自己取一个ID让大家知晓…2. 一个按钮,点击则提示“不是时候”,除非他把系统时间设定为宣传图那一分钟
1-4
1-5

Ø 破解方案1:把时钟与鸽子嘴重合得到网址前缀;正确答出四道题,其中第四道题包含“不”“无”“没”或者什么都不写;根据颜色号码找到新的网址;将系统时间设置为宣传图那一分钟;开启Stage2

Stage2 [Author: 叶者|张龙文]

Ø 页面

序号页面名称上级页面下级页面页面内容描述/包含文件描述
2-1
2-2
2-3
2-4
2-5

Ø 破解方案1:

Ø 破解方案2: geekpie{Bridges_are_sacks._Is_there_anything_wrong_with_speaking_like_this!_Now,_tell_me.}

Ø 破解方案3: geekpie{It_would_be_nice_if_someone_like_me_disappeared.}

破解方案4: geekpie{symb0ls_ar3_jus7_symbo1s}

压缩包密码:3342613097+某管理员

Ø 破解方案5: geekpie{what_a_cruel_person} geekpie{Guomie_Nasai}

以下为Stage2草稿.jpg

题目顺序:

(1) -> (2) -> (5) -> (3) -> (4)

​ ----->-----/

(1) 3个(或多个)选择题 某一个的题干包括跳过两个字 只有点击跳过可以通关

破解方法 1 xjb点 2 查看网页源代码 [是不是太简单了]是的

(2) 拼图 文件名为也为flag分段 按拼图顺序 仿照https://github.com/ustclug/hackergame2018-writeups/tree/master/official/card 用上科大校徽做背景 题解记得credit

破解方法 1 拼图后输入图片内内容 2 拼图排序后输入文件名拼接内容

(3) 找不同 多个像素点的颜色区别 答案是一系列坐标(大概3~5个)图片为bmp

(4) zip file password: geekpie某管理员的qq号+昵称 答案"{}+某管理员".format(qq号)

(5) RGB to UTF-8 2份线索 图片使用bmp防止压缩损耗:

线索1:UTF-8解密后信息 -> 跳一关

线索2:略微不同颜色的文字描边 破解方式:ps 选择颜色 如下

wps93OTEN
wpsZtzCbV

Stage3 [Author: 井皓天|杨易为]

主页面:Stage 3 - 腾讯文档

Ø 页面

序号页面名称上级页面下级页面页面内容描述/包含文件描述
3-1失踪的快递/XXXXXX YYYYYYYY 53Z2湖北省武汉市某同学家。从得到一个快递号,没来的去取,上面的身份证号已经模糊,需要猜出才能取得快递。快递编号 → 投递城市 → 身份证前六位 → X某个 QQ 群里的「管理员」→ 生日 or 百度第一次删除Google是什么时候 → Y校验码可以枚举出 Z 的几个可能,要加上性别才能唯一确定(可以暴力)之后配合某种载体(快递订单图片,其上的时间转 UNIX 时间戳之后会被用到)引导到一个域名为 身份证号.xxxxxx.onion 的网站(自备洋葱)
3-2快递的秘密失踪的快递救人的药/门牌号里面有合同,指向另一个洋葱网站有个教务系统网站(每隔5s 换一次地址,指向救人的药/门牌号)下载得到一个ppt(离散题目)文件* 解压 pptx 可以跳到 3
3-3救人的药ACTG 转化学式转药的名称(瑞德西韦),还需要输入快递订单的时间(UNIX时间戳)
3-4罪魁祸首
3-5门牌号ACTG(维吉尼亚密码)转上科大门牌号+几点钟方向,在上科大地图中找到关键位置

/------->----\

(1)-> (2) -> (5) -> (3) -> (4)

​ ------------->----------/

以下是stage3的草稿

背景:湖北省武汉市某同学家

从得到一个快递号,没来的去取,上面的身份证号已经模糊,需要猜出才能取得快递。

给出一个身份证号

XXXXXX YYYYYYYY 53Z2

快递编号 → 投递城市 → 身份证前六位 → X

某个 QQ 群里的「管理员」→ 生日 or

百度第一次认识Google是什么时候 → Y

校验码可以枚举出 Z 的几个可能,要加上性别才能唯一确定(可以暴力)

之后配合某种载体(快递订单图片,其上的时间转 UNIX 时间戳之后会被用到)

引导到一个域名为 身份证号.xxxxxx.onion 的网站(自备洋葱)

百度地图上科大街景 https://j.map.baidu.com/85/05w

可以用的ACTG训练能用的药。

Scala 逆变和协变

source:
trait List[+T]{}
当类型 B 是类型 A 的子类型时,则 List[B]也可以认为是 List[A}的子类型,即 List[B]可以泛化 为 List[A]。也就是被参数化类型的泛化方向与参数类型的方向是一致的,所以称为协变 (covariance)

逆变定义形式如:trait List[-T]{} 当类型 B 是类型 A 的子类型,

则 Queue[A]反过来可以认为是 Queue[B}的子类型。

也就是被 参数化类型的泛化方向与参数类型的方向是相反的,所以称为逆变(contravariance)

summary

scala 环境配置

export SCALA_HOME=/root/Downloads/spark-2.4.3-bin-hadoop2.7/
export HADOOP_HOME=/root/Downloads/hadoop-2.7.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=Master
SPARK_LOCAL_DIRS=/root/Downloads/spark-2.4.3-bin-hadoop2.7
SPARK_DRIVER_MEMORY=512M