<h3>Artifact Structure:</h3><p dir="ltr">This artifact consists of a project archive `TSE-Project.tar.xz`, a results archive `TSE-Results.tar.xz`, a results archive for the numeric seed sensitivity analysis and EvoSuite `Evosuite_and_NumericSeed_Data.tar.xz`, and the scripts and project files for the seed analysis and EvoSuite evaluation `Evosuite_and_NumericSeed_Project.tar.xz`.</p><p dir="ltr">To evaluate the artifact, please first extract the project archive into a folder of your choice. It contains all scripts, Dockerfiles and executables required to run the evaluation, as well as the generated graphs and tables from the evaluation inside the `project/out` directory. To build and run the Dockerfile, please consult the "Building and testing the Dockerfile on your local machine" section of the README.md file that is also contained inside the project artifact.</p><p dir="ltr">We included all intermediate results of our evaluation inside the `TSE-Results.tar.xz` archive. To evaluate those files (including the inputs generated during our evaluation), please extract this archive into the `project/workingdir` directory. It requires about 50GiB of disk space.</p><p dir="ltr">To run the numeric seed analysis and EvoSuite comparison, please extract the `Evosuite_and_NumericSeed_Project.tar.xz` archive into the previously created project directory, overwriting the existing files. The dockerfile needs to be rebuilt to include those scripts.</p><p dir="ltr">You may examine the existing results by extracting the `Evosuite_and_NumericSeed_Data.tar.xz` archive into the previously created `project/workingdir` directory. The graphs will be created in the `out` folder when running the scripts.</p><h3>Paper Abstract:</h3><p dir="ltr"><b>Context</b>: To effectively test complex software, it is important to generate goal-specific inputs, i.e., inputs that achieve a specific testing goal. For instance, developers may intend to target one or more testing goal(s) during testing – generate complex inputs or trigger new or error-prone behaviors. <b>Problem</b>: However, most state-of-the-art test generators are not designed to target specific goals. Notably, grammar-based test generators, which (randomly) produce syntactically valid inputs via an input specification (i.e., grammar) have a low probability of achieving an arbitrary testing goal. <b>Aim</b>: This work addresses this challenge by proposing an automated test generation approach (called FDLOOP) which iteratively learns relevant input properties from existing inputs to drive the generation of goal-specific inputs. <b>Method</b>: The main idea of our approach is to leverage test feedback to generate goal-specific inputs via a combination of evolutionary testing and grammar learning. FDLOOP automatically learns a mapping between input structures and a specific testing goal, such mappings allow to generate inputs that target the goal-at-hand. Given a testing goal, FDLOOP iteratively selects, evolves and learn the input distribution of goal-specific test inputs via test feedback and a probabilistic grammar. We concretize FDLOOP for four testing goals, namely unique code coverage, input-to-code complexity, program failures (exceptions) and long execution time. We evaluate FDLOOP using three (3) well-known input formats (JSON, CSS and JavaScript) and 20 open-source software. <b>Results</b>: FDLOOP is up to 89% more effective than the baseline grammar-based test generators (i.e., random, probabilistic and inverse-probabilistic methods) and it outperforms the closest state-of-the-art approach (EvoGfuzz) by up to 77%. In addition, we show that the main components of FDLOOP (i.e., input mutator and grammar mutator) contribute positively to the effectiveness of our approach. We also observed that FDLOOP is effective across varying parameter settings – the number of initial seed inputs, the number of generated inputs, and the number of input generations. <b>Implications</b>: Finally, our evaluation demonstrates that FDLOOP is effective for targeting a specific testing goal – revealing error-prone behaviors, generating complex inputs, or producing inputs with long execution time – and scales to multiple testing goals.</p>