Self-Archiving Reproducible Python Scripts

May 04, 2022 in CS
1 min read

Conducting a lot of computation experiments with python I found it to be quite useful to work with some patterns during prototyping and producing experiment data: First, using generated keys for entities such as an experiment setting, a repetition or models instances. With these keys I can easily persist them and run scripts in parallel while producing large amounts of data. Secondly, using the date gives a really good self-archiving mechanism as it supports you later on to assess whether something is not relevant anymore. Thus, I often write experiments or scripts which make use of below logic in which the python script itself is copied and thus also versioned together with an experimental run. Of course, I still use git on top of that but executing runs often during prototyping or executing experiments with different settings is not meaningful to put into version management. In these settings, this helpful code snippet gives benefits:

import os
import shutil
from datetime import datetime

date_now = datetime.now().strftime("%Y-%m-%d")
datetime_now = datetime.now().strftime("%Y-%m-%d-%H%M%I")
key_invocation = f"{date_now}-debug-test"
path_gen_base = f"gen/experiment-title/{key_invocation}"

# Copy itself to the base path for reproducibility concerns
if not os.path.exists(path_gen_base):
    os.makedirs(path_gen_base)
shutil.copy(os.path.basename(__file__), os.path.join(path_gen_base, datetime_now + "-source.py"))

# ..