Directed Grammar-Based Test Generation - Replication Package

Kirschner, Lukas; SOREMEKUN, EZEKIEL

doi:10.6084/m9.figshare.24549064.v3

赵本夫：写作要倾尽全力，就像井水是打不尽的

journal contribution

posted on 2025-08-06, 15:13 authored by Lukas KirschnerLukas Kirschner, EZEKIEL SOREMEKUN

<h3>Artifact Structure:</h3>This artifact consists of a project archive `TSE-Project.tar.xz`, a results archive `TSE-Results.tar.xz`, a results archive for the numeric seed sensitivity analysis and EvoSuite `Evosuite_and_NumericSeed_Data.tar.xz`, and the scripts and project files for the seed analysis and EvoSuite evaluation `Evosuite_and_NumericSeed_Project.tar.xz`.To evaluate the artifact, please first extract the project archive into a folder of your choice. It contains all scripts, Dockerfiles and executables required to run the evaluation, as well as the generated graphs and tables from the evaluation inside the `project/out` directory. To build and run the Dockerfile, please consult the "Building and testing the Dockerfile on your local machine" section of the README.md file that is also contained inside the project artifact.We included all intermediate results of our evaluation inside the `TSE-Results.tar.xz` archive. To evaluate those files (including the inputs generated during our evaluation), please extract this archive into the `project/workingdir` directory. It requires about 50GiB of disk space.To run the numeric seed analysis and EvoSuite comparison, please extract the `Evosuite_and_NumericSeed_Project.tar.xz` archive into the previously created project directory, overwriting the existing files. The dockerfile needs to be rebuilt to include those scripts.You may examine the existing results by extracting the `Evosuite_and_NumericSeed_Data.tar.xz` archive into the previously created `project/workingdir` directory. The graphs will be created in the `out` folder when running the scripts.<h3>Paper Abstract:</h3>Context: To effectively test complex software, it is important to generate goal-specific inputs, i.e., inputs that achieve a specific testing goal. For instance, developers may intend to target one or more testing goal(s) during testing – generate complex inputs or trigger new or error-prone behaviors. Problem: However, most state-of-the-art test generators are not designed to target specific goals. Notably, grammar-based test generators, which (randomly) produce syntactically valid inputs via an input specification (i.e., grammar) have a low probability of achieving an arbitrary testing goal. Aim: This work addresses this challenge by proposing an automated test generation approach (called FDLOOP) which iteratively learns relevant input properties from existing inputs to drive the generation of goal-specific inputs. Method: The main idea of our approach is to leverage test feedback to generate goal-specific inputs via a combination of evolutionary testing and grammar learning. FDLOOP automatically learns a mapping between input structures and a specific testing goal, such mappings allow to generate inputs that target the goal-at-hand. Given a testing goal, FDLOOP iteratively selects, evolves and learn the input distribution of goal-specific test inputs via test feedback and a probabilistic grammar. We concretize FDLOOP for four testing goals, namely unique code coverage, input-to-code complexity, program failures (exceptions) and long execution time. We evaluate FDLOOP using three (3) well-known input formats (JSON, CSS and JavaScript) and 20 open-source software. Results: FDLOOP is up to 89% more effective than the baseline grammar-based test generators (i.e., random, probabilistic and inverse-probabilistic methods) and it outperforms the closest state-of-the-art approach (EvoGfuzz) by up to 77%. In addition, we show that the main components of FDLOOP (i.e., input mutator and grammar mutator) contribute positively to the effectiveness of our approach. We also observed that FDLOOP is effective across varying parameter settings – the number of initial seed inputs, the number of generated inputs, and the number of input generations. Implications: Finally, our evaluation demonstrates that FDLOOP is effective for targeting a specific testing goal – revealing error-prone behaviors, generating complex inputs, or producing inputs with long execution time – and scales to multiple testing goals.

History

Usage metrics

Keywords

Software Testing Test Generation Input Grammar Grammar Learning Probabilistic Grammar Evolutionary Testing

Licence

MIT

王安石是什么朝代的	cas是什么意思	网球肘用什么药	属鸡是什么命	家家酒是什么意思
92是什么意思	可颂是什么意思	拉稀屎是什么原因	蜜獾为什么什么都不怕	妒忌是什么意思
什么生火	性有什么好处和坏处	心包积液挂什么科	火疖子是什么	迎春花是什么颜色的
最近我和你都有一样的心情什么歌	鲁肃的性格特点是什么	看血脂高挂什么科	躺下就头晕是什么原因	南通在江苏什么位置

感染幽门螺旋杆菌吃什么药hcv8jop6ns7r.cn	眼睛痒是怎么回事用什么药hcv9jop7ns0r.cn	evian是什么品牌hcv7jop6ns5r.cn	八面玲珑是什么意思luyiluode.com	脂肪粒是什么bjcbxg.com
宽宽的什么填空hcv7jop7ns2r.cn	1比1是什么意思hcv8jop8ns9r.cn	双开什么意思hcv9jop0ns2r.cn	红颜知己是什么chuanglingweilai.com	蒋介石为什么不杀张学良helloaicloud.com
梦到狗是什么意思hcv8jop5ns1r.cn	焯水什么意思hcv9jop2ns4r.cn	edifice是什么牌子手表xinmaowt.com	牙疼吃什么药效果最好hcv8jop7ns9r.cn	家慈是对什么人的称呼helloaicloud.com
唯女子与小人难养也什么意思hcv9jop6ns6r.cn	疖肿挂什么科hcv9jop2ns7r.cn	护肝片什么时候吃最好hcv8jop5ns9r.cn	为什么纯牛奶容易爆痘hcv7jop7ns3r.cn	挺尸是什么意思dajiketang.com

赵本夫：写作要倾尽全力，就像井水是打不尽的

History

Usage metrics

Categories

Keywords

Licence

Exports