Parameterized Program Slicing for LLVM


    This project aims to propose a novel context-sensitive slicing method, parameterized program slicing, which works as a dataflow analysis on LLVM (Low-Level Virtual Machine). It encompasses both backward and forward slicing.  Instead of re-analysing a procedure multiple times to find its slices for each callling context, the proposed method calculates a single parameterised slice which can be instantiated at call sites avoiding re-analysis. The parameterised slicing method is flexible, modular and effectively demand-driven; it is implemented in LLVM to perform slicing on its intermediate representation (IR). For most medium-sized programs, it can extract both backward and forward program slices for all single variables in moderate time, without using any data compression optimizations such as BDD (Binary Decision Diagram).

   Program slicing has two main uses: 1) program comprehension, for which slicing at the source-code level is vital and 2) program analysis and optimization as part of a compiler. We address the latter using LLVM as the vehicle, i.e. program slicing of LLVM IR. In fact, IR slice results could be easily applied to generate the slices of source code, by extracting source codes from sliced IRs with line number information in the metadata of each IR instruction.  


 ParaSlicer - Paramiterized program slicer for LLVM

ParaSlicer is the Haskell implementation for our parameterized program slicing of LLVM IR, using the Haskell library llvm-analysis for analyzing LLVM bitcode. ParaSlicer-1.0.0 ran on Linux such as (32-bit) Ubuntu with LLVM-3.2 (Clang-3.2).  All experiments in our tests were performed on an Intel Core i5 2.5GHz machine with 4GB RAM and 32-bit Ubuntu 12.04.

---------------------------------------------------------------
Usage: ParaSlicer [-o | --output FILE2/DIR] [-c | --criterion VARIABLE] [-d | --direction DIRECTION] FILE1 
Generate static slice based on paramiterized method for FILE1 (which can be bitcode, llvm assembly, or C/CPP sourcecode)

Available options:
  -h,--help                           Show this help text
  -o,--output FILE2/DIR       The destination of a file output
  -c,--criterion VARIABLE     The criterion variables (with the style of Var@Fun,e.g. x@main) for slicing.
                                           ( If null, just output the slice table for all single variables. )

  -d,--direction DIRECTION  The type of output to slicing: Fwd, Bwd, Both. Default: Bwd
  FILE1                               The input file whose file extension can be .bt, .ll, .c or .cpp

Examples: 
1) ./ParaSlicer sum3.c
 Backward Static SliceTable:
  Variable           SrcLineNumbers 
    a@add          {11,12,13,15,27,28,33,39,40}
    b@add          {}
    i@main          {12,13,15,28,33,39,40}
    sum@main    {11,12,13,15,27,28,33,39,40}
    tmp@inc        {39}
    x@A              {11,12,13,15,27,28,33,39,40}
    y@A              {12,13,15,28,33,39,40}
    z@inc            {12,13,15,28,33,39,40}

2) ./ParaSlicer -d Fwd -c n@main sum3.c
  Forward Static Slice for "n@main":
   <SourceLines> 9,13,15,16,18,19,20: 
    %n = alloca i32 , align 4
    %4 = call i32 @isoc99_scanf ( i8* i8* getelementptr ( 3 x i8* @.str1 , i32 0, i32 0 ), i32* %n )
    %8 = load i32* %n , align 4
    %9 = icmp sle i32 %7 , %8
    br i1 %9 , label %10 , label %11
    call void @A ( i32* %sum, i32* %i )
    %.pre = load i32* %i , align 4
    br label %6
    %12 = load i32* %sum , align 4
    %13 = call i32 @printf ( i8* i8* getelementptr ( 10 x i8* @.str2 , i32 0, i32 0 ), i32 %12 )
    %15 = load i32* %i , align 4
    %16 = call i32 @printf ( i8* i8* getelementptr ( 8 x i8* @.str3 , i32 0, i32 0 ), i32 %15 )
    ret i32 0 

Experiments:  

The statistic results of our ParaSlicer on some Benchmarks (sorted by number of instructions) 

Program   LOC  #Instr.  #Proc. Average backward slice Average     forward slice Total time(sec.) Total space(MB)
#Instr. percent #Instr. percent
qsortexam 119 177 2 106 59.89% 88 49.72% 0.82 3
compress 521 475 9 69 14.53% 31 6.53% 0.48 4
edn 284 634 9 46 7.26% 32 5.05% 0.47 3
time 1,395 722 6 16 2.22% 59 8.17% 1.05 11
loop50 364 819 1 548 66.91% 242 29.55% 27.73 23
statemate 1,276 1,271 8 280 22.03% 42 3.30% 1.55 20
loop100 714 1,619 1 1,089 67.26% 484 29.89% 218.67 94
181.mcf 4,820 1,944 26 120 6.17% 116 5.97% 1.76 6
loop150 1,064 2,419 1 1,631 67.42% 725 29.97% 301.53 193
loop200 1,414 3,219 1 2,172 67.47% 967 30.04% 575.04 367
loop250 1,764 4,019 1 2,713 67.50% 1,209 30.08% 766.07 578
256.bzip2 8,014 4,505 75 240 5.33% 181 4.02% 24.59 30
gitview 976 4,766 139 37 0.78% 117 2.45% 47.05 176
pkg-config 5,191 5,278 92 24 0.45% 205 3.88% 62.56 52
dc 2,686 6,169 110 1,518 24.61% 229 3.71% 85.78 477
nsichneu 4,253 6,434 1 243 3.78% 67 1.04% 78.8 1,024
164.gzip 11,121 6,614 92 175 2.65% 240 3.63% 83.22 48
barcode 30,472 7,013 59 22 0.31% 88 1.25% 23.03 43
yacc 23,906 12,465 83 1 0.01% 454 3.64% 524.34 322
diff 87,393 14,187 183 84 0.59% 506 3.57% 316.8 127
197.parser 27,371 15,864 325 458 2.89% 604 3.81% 1,293.8 1,165
cflow 32,396 16,592 217 137 0.83% 306 1.84% 253.78 183
find 12,086 18,355 360 66 0.36% 297 1.62% 448.9 262
300.twolf 49,372 41,937 191 163 0.39% 497 1.19% 1,131.16 316
Average 12,874 7,396 83 498 20.48% 324.42 11.00% 261.21 230.29

 

Last edited Sep 11, 2014 at 2:07 AM by zhangyz, version 31