# Uni Rostock PyTorch Framework The `uros_pf` framework intents to reduce the setup time for new machine learning projects based on pytorch. It unifies standard procedures, such as configuring, input/output procedures, training, logging etc., that are used in all projects. ## Basic principles - Files in the `uros_pf` package are of general interest. They must not be adapted to a certain project. - Project specific source code goes into the `scenario` folder. - Data / configuration files / model checkpoints etc. are stored in a separate workdir folder. - To implement a new scenario at least 3 files have to be coded: the `input_processor` (preparing the input and targets) the `model` (connecting the trainer, optimizer and model) and the `module` (containing the neural network or the ML-method) - The input processor, the model and all hyperparameter shall be configured by the config file (a `yaml`-file). It is usually stored in the workdir and passed to the trainer by `-cn config_name` (`config_name` without extension `.yaml`). ## Sample project A working configuration file (named `ag_linear_config.yaml`) is ```yaml hydra: run: dir: ${trainer.model_path} builder: input: "scenario.ag_news_corpus.ag_ip.AGInputProcessor" model: "scenario.ag_news_corpus.ag_simple_model.SimpleModel" trainer: epochs: 20 model_path: models/${now:%Y-%m-%d}/${now:%H-%M-%S} input: feature_size: 1000 train_file: "data/ag_news_corpus/train.csv" val_file: "data/ag_news_corpus/test.csv" batch_size: 10 samples_per_epoch: 12000 model: num_of_classes: 4 loss_fn: "torch.nn.MSELoss" metric_fns: "uros_pf.metrics.accuracy.Accuracy" module_cls: "scenario.ag_news_corpus.ag_least_square_module.AGLeastSquareModule" optimizer: "torch.optim.SGD" lr: 0.01 module: feature_size: ${input.feature_size} num_of_classes: ${model.num_of_classes} ``` The structure of the workdir is: ```commandline ├── data │ ├── ag_news_corpus │ │ ├── test.csv │ │ ├── train.csv ├── config │ ├── ag_linear_config.yaml ``` whereas the data is taken from [mhjabreel's github account](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv). To run the example project just execute `python3 path/to/src/uros_pf/trainer/trainer.py -cn ag_linear_config` from the workdir. ## Install ### Requirements Operating system: Windows 7, 8, 10 or 11 (64bit) or Linux (ideally Ubuntu 23.04) - Processor: x86 processor architecture (no ARM support) - Hard disk: 5GB free disk space ### General In the workshop, Python (3.11) is used as the programming language and numerous modules written in Python are used, which can be obtained via Python's own package manager **pip**. In order to install all this specifically for the purposes of the workshop on your system without coming into conflict with Python and module versions that may already exist or that may be installed later, we use **Anaconda** (more precisely: the minimised variant **Miniconda**) on Windows systems. With this software it is possible to run independent Python installations in so-called **virtual environments**. So if you want to create your own projects with other Python versions, modules or module versions after this workshop, this is possible with Anaconda without any problems. As a code editor, we use **PyCharm**, which offers various tools for the development of Python programmes. ### Ubuntu / Debian Open a terminal and execute the following command lines while substituting `path/2/virtual/environment` and `path/to/project/dir` with your own folders. ```commandline sudo apt-get update sudo apt install python3-venv python3-pip ENV_HOME=path/2/virtual/environment python3 -m venv $ENV_HOME source $ENV_HOME/bin/activate pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu pip install tensorboard transformers[torch] evaluate datasets scikit-learn tokenizers tqdm pip install hydra-core --upgrade ``` To install the project framework and the Pycharm environment download and install PyCharm from https://www.jetbrains.com/de-de/pycharm/download/other.html and run: ``` cd path/to/project/dir git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf ``` (tested with Ubuntu 23.04) ### Windows #### Virtual environment 1. install [Miniconda](https://docs.conda.io/en/latest/miniconda.html#windows-installers): Use the installer for Windows 64bit systems. In the installation process, select the suggested default settings. If no shortcut to the Anaconda prompt has been created on the desktop after the installation, create one yourself: Create a new shortcut with the target `cmd /c start "Anaconda Prompt" "C:\path\to\Miniconda3\Scripts\activate"`, replacing `C:\path\to\` with the correct path to the Miniconda installation. 2. expand the sources that Miniconda can access to install software in virtual environments: By default, Anaconda often only has access to older versions of software packages. To expand the scope, we add another community-maintained source, **Conda-Forge**. To do this, open the Anaconda prompt and type: `conda config --add channels conda-forge`. Conda-Forge should now be entered in the list of channels for obtaining software packages: `conda config --show channels`. 3. Create a new virtual environment: In the Anaconda prompt, you should see the prefix `(base)` in the line before the cursor, which means that you are currently in the base environment of Anaconda. To create the new environment, type: `conda create -n llm_workshop python=3.11 git`. You can replace `llm_workshop` with a name of your choice. 4. Activate the new virtual environment: `conda activate llm_workshop`. If you forget the name of the environment or want to see an overview of all created virtual environments: `conda env list`. 5. update the packages *pip* and *setuptools* installed with Python: `python -m pip install -U pip && pip install -U setuptools`. 6. Install the Python modules needed for the workshop: `pip install torch torchvision torchaudio transformers[torch] tensorboard evaluate datasets scikit-learn notebook hydra-core --upgrade`. #### Workshop folder and repository 7. Create a workshop folder on your system. Change to this workshop folder in the Anaconda prompt. 8. Clone the workshop repository into the workshop folder: `git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf.git`. 9. create the folder `workdir` in the workshop folder, so that there should now be a total of two folders in the workshop folder: `workdir` and `uros_pf`. In `workdir` we will store data, configurations and models. #### PyCharm 10. install [PyCharm](https://www.jetbrains.com/de-de/pycharm/download/other.html) (community version 2023.2.1 for Windows). 11. start PyCharm and open it. 11. Start PyCharm and open the repository folder `uros_pf` in the Workshop folder: Click on the main menu icon in the upper left corner, click on `Open` and select the repository folder. Answer the following security question with `Trust Project`. 12. Activate your virtual environment in PyCharm: PyCharm can now use the virtual environment we created in point 3 as a working environment and thus also the Python 3.11 interpreter it contains. Click on the selection of the Python interpreter in the lower right corner. If your Miniconda environment is not yet available, select `Add New Interpreter` and then `Add Local Interpreter`. In the now opened display `Add Python Interpreter` select `Conda Environment` as type on the left. Then enter the path to `conda.exe` of your Miniconda installation on the right: ` C:\path\to\Miniconda3\Scripts\conda.exe`. Then activate `Use existing environment` and select your virtual environment from the dropdown menu, which should now be found in the dropdown menu. Confirm with `OK`. Now PyCharm can use the Python modules you installed in the virtual environment in point 6.