Use Consistent and Intuitive Data Layout
Recipes for Repeatable Experiments

Experiments can be thought of as series of data processing steps; at least, in computer science and data science. Each step consumes some input data, processes the data, and produces some output data.
In this view point, there are various artifacts.
- There is always some data that is fixed before an experiment begins and this data does not change during the experiment. This data may serve as input to any data processing steps in the experiment. Let’s refer to such data as initial/bootstrap/input data, e.g., data used to evaluate the performance of new sorting algorithm.
- Some actions will be performed on the data as part of an experiment. These actions may need to be performed in a specific order and they may consume initial data or data generated by other actions. A script describes such actions along with any constraints on ordering of actions and the data flow between actions, e.g., calculate the mean of field f1, calculate the mean of f2, and compare calculated means via a t-test.
- The execution of scripts generates data that is based the initial data or data generated by other scripts. Let’s refer to such generated data as output data.
Given the above artifacts, lay them out in a consistent fashion.
Recipe
- Have a dedicated folder for each experiment, e.g., evaluate-tools.
- Place initial/bootstrap/input data under input folder, e.g., evaluate-tools/input. The contents of this folder should not change during the experiment.
- Capture each data processing step in a separate script. Place the script in scripts folder, e.g., evaluate-tools/scripts/calculate-means.sh.
- Place data generated by executing scripts under output folder, e.g., evaluate-tools/output.
- Create a master script that executes various scripts (steps) to orchestrate/run the experiment, e.g., evaluate-tools/masterScript.sh.
- Document the experiment in a README.md file, e.g., evaluate-tools/README.md.
- Use a version control system to store the artifacts of the experiment.
Here are few examples: