Modifications from original SPAT
- Now uses java18
- Add
run.sh
for running SPAT. Usage is in the script itself. - Add
postprocessing.py
for compiling transformation result
Steps to run the jar file are the same as below except
for the argument PathofJre
. It is replaced by the path of lib
. An example is
"/usr/lib/jvm/java-18-openjdk-amd64/lib". This path can be found with whereis java
and tracing to the directory of the original binary (instead of a
symlink). The library folder is usually a sibling directory of the directory
that contains the binary.
Note that it is recommended to use run.sh
instead of running the jar
file directly since the former integrates into
postprocessing.py
.
python3 postprocessing.py -h
contains brief usage information.
This script organizes a benchmark's result into a jsonl
file. It takes in
three arguments.
postprocessing.py
assumes default directory structure for SPAT (ran with
./run.sh
:
Benchmark
└── <benchmark_name>
├── Original
└── transformed
├── _<aug_id>
│ ├── n<original_entry_id>.java
│ ├── n<original_entry_id>.java
│ ...
├── _<aug_id>
...
In this case, benchmark_path
would be Benchmark/<benchmark_name>
. The script
will iterate through transformed
subdirectory of the benchmark path. For each
.java
file, it will record an augmented entry, noting its augmentation type
via the provided <aug_id>
. The Supported
Transformations section specifies an aug_id
for
each augmentation type.
Additionally, the script will append extra data from the metadata_jsonl
argument. This file will be queried by <test_id>
, and the resulting data will
be added to the augmented entry. In the case of CodeSearchNet, metadata_jsonl
is provided by preprocess.py
.
Eclipse is used to develop and build the project. Click "File > Export" and select the option "Runnable JAR file". Use the "Noargs - RuleWriter" launch configuration and keep everything else as default. Click finish. The resulting .jar file should be saved in the "artifacts" folder
Semantic-and-Naturalness Preserving Auto Transformation. This tool is a source-to-source transformation tool that can deal with partial code snippets (programs without dependency information). The transformed code will be semantic-equivalent to the original ones, as well as syntax-naturalness-preserving.
We have currently verified it on Windows10.
This project is developed in "Eclipse IDE for RCP and RAP Developers". If you want to play with the code, please use the same IDE. Starting with the "src/spat/RuleSelector.java" will bring you a nice view of the whole project.
We have produced a runnable jar file already in "artifacts".
To use this tool, simply type the followed command:
java -jar SPAT.jar [RuleId] [RootDir] [OutputDir] [PathofJre] \& [PathofotherDependentJar]
[RuleId] is the transformation rule you want to adopt.
[RootDir] is the root directory path in which you put all your code snippets to be transformed. each ".java'' file is regarded as a code snippet. Each file should contain one Java class. For method-level code snippets, users need to warp each method with a "foo'' class.
[OutputDir] is the directory path where you want to store the transformed code snippets.
[PathofJre] is the path of rt.jar (usually placed in ".../jre1.x.x_xxx/lib/''})
[PathofotherDependentJar] is optional, one can use it to specify additional dependent libraries.
For example,
java -jar .\artifacts\SPAT.jar 5 .\Benchmarks\9133\Original .\Benchmarks\9133\transformed\_5 C:\Program Files\Java\jre1.8.0_221\lib\rt.jar
This command will transform all java files under the ".\Benchmarks\9133\Original" path by the transformation rule 5 "ConditionalExp2SingleIF" to the path ".\Benchmarks\9133\_5". The only dependency is the rt.jar (java runtime).
Replace the local variables' identifiers with new non-repeated identifiers.
Replace the for statement with an semantic-equivalent while statement.
Replace the while statement with an semantic-equivalent for statement.
Switch the two code blocks in the if statement and the corresponding else statement.
Change a single if statement into a conditional expression statement.
Change a conditional expression statement into a single if statement.
Change the assignment
Change the assignment
Divide a infix expression into two expressions whose values are stored in temporary variables.
Divide a if statement with a compound condition (
Switch the places of two adjacent statements in a code block, where the former statement has no shared variable with the latter statement.
Replace the if-continue statement in a loop block with if-else statement.
Merge the declaration statements into a single composite declaration statement.
Divide the composite declaration statement into separated declaration statements.
Switch the two expressions on both sides of the infix expression whose operator is
Switch the two expressions of the String.equal function, such as '123'.equals(x) -> x.equals('123').
Divide the pre-or-post expression into two seperated expressions.
Change the Switch-Case statements into If-Else statements.
The Educoder code clone dataset. In the "records.txt" file, each record is a triple (file1,file2,label). For example, (file1,file2,-1) means that it is not a clone, otherwise it is a clone.
The 9133 benchmark is selected from BCB benchmark, we use the 9133 instances to evaluate the syntax naturalness, applicability, and speed of each transformation rule.
This dataset is used to train the Neural Probabilistic Language Model (see below).
- The Neural Probabilistic Language Model https://github.com/chiaminchuang/A-Neural-Probabilistic-Language-Model
- Code2vec https://github.com/tech-srl/code2vec
- DeepCom and Hybrid-DeepCom https://github.com/xing-hu/EMSE-DeepCom
- The dataset of DeepCom https://github.com/xing-hu/DeepCom
- ASTNN https://github.com/zhangj111/astnn
- TBCCD https://github.com/yh1105/datasetforTBCCD
- Jobfuscate https://www.duckware.com/jobfuscate/index.html
Shiwen Yu, Ting Wang, Ji Wang, "Data Augmentation by Program Transformation." Journal of Systems and Software (JSS 2022). (under JSS open science, the preprint pdf can be checked in ".\paper")
Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, Xiangke Liao, “Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding.” 44th International Conference on Software Engineering (ICSE 2022)