# Uni Rostock PyTorch Framework The `uros_pf` framework intents to reduce the setup time for new machine learning projects based on pytorch. It unifies standard procedures, such as configuring, input/output procedures, training, logging etc., that are used in all projects. ## Basic principles - Files in the `uros_pf` package are of general interest. They must not be adapted to a certain project. - Project specific source code goes into the `scenario` folder. - Data / configuration files / model checkpoints etc. are stored in a separate workdir folder. - To implement a new scenario at least 3 files have to be coded: the `input_processor` (preparing the input and targets) the `model` (connecting the trainer, optimizer and model) and the `module` (containing the neural network or the ML-method) - The input processor, the model and all hyperparameter shall be configured by the config file (a `yaml`-file). It is usually stored in the workdir and passed to the trainer by `-cn config_name` (`config_name` without extension `.yaml`). ## Sample project A working configuration file (named `ag_linear_config.yaml`) is ```yaml builder: input: "scenario.ag_news_corpus.ag_ip.AGInputProcessor" model: "scenario.ag_news_corpus.ag_simple_model.SimpleModel" trainer: epochs: 10 input: feature_size: 1000 train_file: "data/ag_news_corpus/train.csv" val_file: "data/ag_news_corpus/test.csv" batch_size: 10 samples_per_epoch: 12000 model: num_of_classes: 4 loss_fn: "torch.nn.MSELoss" metric_fns: "uros_pf.metrics.accuracy.Accuracy" module_cls: "scenario.ag_news_corpus.ag_least_square_module.AGLeastSquareModule" optimizer: "torch.optim.SGD" lr: 0.01 module: feature_size: ${input.feature_size} num_of_classes: ${model.num_of_classes} ``` The structure of the workdir is: ```commandline ├── data │ ├── ag_news_corpus │ │ ├── test.csv │ │ ├── train.csv ├── config │ ├── ag_linear_config.yaml ``` whereas the data is taken from [mhjabreel's github account](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv). To run the example project just execute `python3 path/to/src/uros_pf/trainer/trainer.py -cn ag_linear_config` from the workdir. ## Install ### Ubuntu / Debian Open a terminal and execute the following command lines while substituting `path/2/virtual/environment` and `path/to/project/dir` with your own folders. ```commandline sudo apt-get update sudo apt install python3-venv python3-pip ENV_HOME=path/2/virtual/environment python3 -m venv $ENV_HOME source $ENV_HOME/bin/activate pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu pip install tensorboard transformers[torch] evaluate datasets scikit-learn tokenizers tqdm pip install hydra-core --upgrade ``` To install the project framework and the Pycharm environment download and install PyCharm from https://www.jetbrains.com/de-de/pycharm/download/other.html and run: ``` cd path/to/project/dir git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf ``` (tested with Ubuntu 23.04)