Program Analysis - vickieGPT’s blog

Helper Class

CHACallNode

I use a Node to record the caller callee and which invoke method the Call node is calling. The invoke method is recorded in an integer which is the offset for the following enum

enum InvokeMorphism {
    InterfaceInvokeExpr,
    VirtualInvokeExpr,
    SpecialInvokeExpr,
    StaticInvokeExpr,
    JStaticInvokeExpr,
    JSpecialInvokeExpr,
    JVirtualInvokeExpr,
    JInterfaceInvokeExpr
}

The toString method are overridden for linting the line and the <CHAInput: void main(java.lang.String[])>-><CHAInput: int addOne(int)>.

ReachableMethods

This method maintains Set<SootMethod>. The toString method to linting the line Reachable.

CHAAnalysis

This class maintains the CallerCalleeMap for the latter use by virtual calls, the cha_class contains the information of Class FastHierachy, entries will store the init entries and will put it into the worklist queue, it will add to worklist if the function is invoked and reachable from the stand view from method main. unitSootMethodMap is to store the unit method relations and will init in the first place and will be used in the call printing.

May 23, 2022May 23, 2022

[Program Analysis] Soundiness and soundness

In defense to a situation where no program analysis in the program is soundness, so given term soundness for that program claims that they can cover most cases.

Reflection in Java

First on Java reflection: Class + Method + Field Metaobject

One solution: String Constant analysis + Pointer Analysis
List and Array propagation

JNI Call

One Solution: Transcode from C to Java
scanning on the binary

April 28, 2022June 12, 2022

[Program Analysis] Static Analysis for Security

Like symbolic execution, Taint-Analysis is an important tool for analyzing code security vulnerabilities and detecting attack methods. Taint-Analysis can effectively detect a large number of security vulnerabilities in Web applications, such as cross-site scripting attacks, SQL injection, etc. The taint propagation technique is an important topic in the field of Taint-Analysis, which is combined with the static program analysis technique to obtain more efficient and accurate analysis results by analyzing the interdependencies between program variables without running the code and modifying the code.

Taint analysis tracks how tainted data flow through the program and observes if they can flow to locations of interest (call sinks).

April 27, 2022June 12, 2022

[Program Analysis] Inter-procedural Dataflow Analysis

The IFDS published at POPL'1995 is a leaf-frog jump after the classical intra-procedural analysis. Since it 1. linear time for specific individual problem 2. linear time for the locally separable problem(gen/kill bitvec) 3. Nonlinear time for other common problems.

CFL reachability

Feasible and Realizable Paths

Paths in CFG that do not correspond to actual executions is Infeasible paths. If we can prevent CFG from being contaminated by infeasible paths as much as possible and make the analysis flow graph edges more concise, we are bound to greatly improve the analysis speed. But in practice, such infeasible paths are often undecidable.

Realizable Paths

The paths in which "returns" are matched with corresponding "calls".Thus we can make the target wider. If we call one path of, the returned edge can correctly pattern matching the call-edge, we call the

B is CFL reachable from A if there's a path (a series of edges) between two nodes A and B and all the label of the edges on this path are legal words defined by a pre-specified context-free language.

Partially Balanced-Parenthesis Problem

The search for realizable paths is abstracted into the typical bracket matching problem. Partially here means that there must be an (i match for (i. In other words, for every return-edge there is a call-edge match, but every call-edge does not necessarily have a return value (e.g., realizable but not infeasiable paths). The problem is modeled as follows: all edges on the control-flow graph are given a label, and for a call statement i, the call-edge is labeled (i and its return-edge is labeled) i, while all other edges are labeled e.

A path is a realizable path if thee path' word is in the language L(realizable).

IFDS

IFDS is for interprocedural data flow analysis with distributive flow functions over finite domains. It transforms a large class of data flow analysis problems into graph reachability problems by providing polynomial-time accuracy compared to the iterative data flow framework with the following constraints:

Many general dataflow analysis problems are subset problems but interprocedural is full program analysis. The distributive flow function within the finite domain is required for IFDS.

MRP

IFDS uses meet-over-all-realizable-path

to measure its analysis accuracy.
According to the definition of $M O P_{n}=\sqcup(p \in P a t h s(s t a r t, n)) p f_{p}(\perp)$, for each node point $n, M O P_{n}$ denotes the union/intersection operation of all paths from the start point to the node point n on the CFG.
And according to the definition of MRP $M O P_{n}=\sqcup(p \in P a t h s(s t a r t, n)) p f_{p}(\perp)$ , for each node point $n, M R P_{n}$ denotes all realizable paths from the start point to the node point n on the CFG (the label of these paths constitutes the word that matches the realizable label of these paths conforms to the context-independent grammar of the realizalble language) for the union/intersection operation. The result is also more accurate than $M O P_{n}$. $M R P_{n} \leqslant M O P_{n}$.

Algorithm

IFDS Algorithm:

Input: Program and the Data Flow Analysis $O$

Reference

https://haotianmichael.github.io/2021/06/06/NJU%E9%9D%99%E6%80%81%E7%A8%8B%E5%BA%8F%E5%88%86%E6%9E%90-5-IFDS/
https://apps.dtic.mil/sti/pdfs/ADA332493.pdf
https://minds.wisconsin.edu/bitstream/1793/60188/1/TR1386.pdf

April 27, 2022June 12, 2022

[Program Analysis] Intra-procedural Dataflow Analysis

The global non-related optimization is based on Dataflow Analysis. The mainstream compiler intermediate representation like LLVM/ Gravvm is based on SSA to do the Dataflow Analysis while Soot on JVM is utilizing Lattice (The classical definition) to do the intraprocedural.

10-discussion 下载

Reference

https://haotianmichael.github.io/2021/05/04/NJU%E9%9D%99%E6%80%81%E7%A8%8B%E5%BA%8F%E5%88%86%E6%9E%90-1-Data-Flow-Analysis/

April 27, 2022June 10, 2022

[Program Analysis] Pointer Analysis

The CHA Analysis overly pursues the speed and gives some false positive cases. For must analysis like constant propagation, it's not acceptable. Pointer Analysis computes an over-approximation of the set of which memory locations (objects) a pointer can point to.

Continue reading "[Program Analysis] Pointer Analysis"

April 24, 2022April 25, 2022

[Program Analysis] LiveVariableAnalysis in Soot

Setup of the Soot

We need to first set up the Soot variables. In target command line tools, the config is passed by args, e.g. java -cp soot-2.5.0.jar soot.Main -pp -f J -cp . Hello. The prepend file is set and the jar interpreter will pass the variable $CLASSPATH. In my implementation, these are passed into Options.v(). Noticeably, I have to switch off the optimization for java ByteCode, or it will do Dead Code Elimination to remove the intermediate process.

CallGraph for Debug

I iterate the CFG before and after the Liveness analysis to see the problem.

for (Unit unit : body.getUnits()) {
       System.out.println(unit.toString() + "[entry]" +lva.getFlowBefore(unit));
       System.out.println(unit.toString() + "[exit]" +lva.getFlowBefore(unit));
}

LiveVariables

LiveVariableFlowSet

I record all the FlowSet<Local> to a enhanced AbstractBoundedFlowSet<Local> with liveVariable hash set. So that it has clone and record feature.

BackwardFlowAnalysis

The Analysis extends BackwardFlow. The merge logic is union and copy logic is copy. The flowThrough part is first calculates the kill set and makes the difference of kills and destSet; then the gens are merely the union of the iterated kills.

Input and output of file

The soot.Main only accept className, so I remove the "./" and ".class" to pass. And the directory is created in the same folder of the class file to write the output.

March 24, 2022April 28, 2022

[Program Analysis] Intraprocedural Analysis

CHA analysis is used to make too conservative assumptions to the method call in the Intraprocedural Analysis. All the results of the analysis should be. save. According to the Lattice Theory, the must and may analysis should be less precise. So the Interprocedural Analysis to see the data flow in the BB and the Call Graph to see the data flow propagation between functions and raise the precision of the analysis is very important.

Continue reading "[Program Analysis] Intraprocedural Analysis"

February 21, 2022February 23, 2022

[Program Analysis] Control Flow Graph

想了一下，这课最好的上法还是顺便学一下soot这个框架，感觉和LLVM有类似也有不同，由于生成的是jvm栈机以及可以调用jni这种ffi related stuff。是故边学边玩。

Continue reading "[Program Analysis] Control Flow Graph"

February 15, 2022February 21, 2022

[Program Analysis] Intermediate Representation

我又来学软工了😄，倒腾java的种种，今天就在乱搞soot，跑一个命令行真脑瘫。不过zhe次不像暑研要写在Phosphor上的ASM visitor，只需处理三地址码的jimple就好。

 java -cp "/Users/yiweiyang/.m2/repository/org/soot-oss/soot/4.2.1/soot-4.2.1.jar:/Users/yiweiyang/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/Users/yiweiyang/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/Users/yiweiyang/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/Users/yiweiyang/.m2/repository/de/upb/cs/swt/axml/2.0.0/axml-2.0.0.jar:/Users/yiweiyang/.m2/repository/com/google/guava/guava/19.0/guava-19.0.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm/9.2/asm-9.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-commons/9.2/asm-commons-9.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-tree/9.2/asm-tree-9.2.jar:/Users/yiweiyang/.m2/repository/de/upb/cs/swt/heros/1.2.2/heros-1.2.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-util/9.2/asm-util-9.2.jar" soot.Main --soot-class-path /Users/yiweiyang/project/test_soot/target/classes:/Users/yiweiyang/.sdkman/candidates/java/current/jre/lib/rt.jar  Bear

Continue reading "[Program Analysis] Intermediate Representation"

Tag: Program Analysis

[Program Analysis] CHA Analysis in Soot

Helper Class

CHACallNode

ReachableMethods

CHAAnalysis

[Program Analysis] Soundiness and soundness

Reflection in Java

JNI Call

[Program Analysis] Static Analysis for Security

[Program Analysis] Inter-procedural Dataflow Analysis

CFL reachability

Feasible and Realizable Paths

Realizable Paths

Partially Balanced-Parenthesis Problem

IFDS

MRP

Algorithm

Reference

[Program Analysis] Intra-procedural Dataflow Analysis

Reference

[Program Analysis] Pointer Analysis

[Program Analysis] LiveVariableAnalysis in Soot

Setup of the Soot

CallGraph for Debug

LiveVariables

LiveVariableFlowSet

BackwardFlowAnalysis

Input and output of file

[Program Analysis] Intraprocedural Analysis

[Program Analysis] Control Flow Graph

[Program Analysis] Intermediate Representation