# Uni Rostock PyTorch Framework

The `uros_pf` framework intents to reduce the setup time for new machine 
learning projects based on pytorch. 
It unifies standard procedures, such as configuring, input/output
procedures, training, logging etc., that are used in all projects.

## Basic principles

- Files in the `uros_pf` package are of general interest. They must
not be adapted to a certain project.
- Project specific source code goes into the `scenario` folder.
- Data / configuration files / model checkpoints etc. are stored in a 
separate workdir folder.
- To implement a new scenario at least 3 files have to be coded: the 
`input_processor` (preparing the input and targets) the `model` 
 (connecting the trainer, optimizer and model) and the `module` (containing 
the neural network or the ML-method)
- The input processor, the model and all hyperparameter shall be configured
by the config file (a `yaml`-file). It is usually stored in the workdir and 
passed to the trainer by `-cn config_name` (`config_name` without extension `.yaml`).


## Sample project

A working configuration file (named `ag_linear_config.yaml`) is
```yaml
hydra:
    run:
        dir: ${trainer.model_path}
builder:
    input: "scenario.ag_news_corpus.ag_ip.AGInputProcessor"
    model: "scenario.ag_news_corpus.ag_simple_model.SimpleModel"
trainer:
    epochs: 20
    model_path: models/${now:%Y-%m-%d}/${now:%H-%M-%S}
input:
    feature_size: 1000
    train_file: "data/ag_news_corpus/train.csv"
    val_file: "data/ag_news_corpus/test.csv"
    batch_size: 10
    samples_per_epoch: 12000
model:
    num_of_classes: 4
    loss_fn: "torch.nn.MSELoss"
    metric_fns: "uros_pf.metrics.accuracy.Accuracy"
    module_cls: "scenario.ag_news_corpus.ag_least_square_module.AGLeastSquareModule"
    optimizer: "torch.optim.SGD"
    lr: 0.01
    module:
        feature_size: ${input.feature_size}
        num_of_classes: ${model.num_of_classes}

```
The structure of the workdir is: 
```commandline
├── data
│   ├── ag_news_corpus
│   │   ├── test.csv
│   │   ├── train.csv
├── config
│   ├── ag_linear_config.yaml
```
whereas the data is taken from 
[mhjabreel's github account](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv).

To run the example project just execute 
`python3 path/to/src/uros_pf/trainer/trainer.py -cn ag_linear_config`
from the workdir.

## Install
### Requirements
Operating system: Windows 7, 8, 10 or 11 (64bit) or Linux (ideally Ubuntu 23.04)
- Processor: x86 processor architecture (no ARM support)
- Hard disk: 5GB free disk space
### General
In the workshop, Python (3.11) is used as the programming language and numerous 
modules written in Python are used, which can be obtained via Python's own package 
manager **pip**. In order to install all this specifically for the purposes of the 
workshop on your system without coming into conflict with Python and module versions 
that may already exist or that may be installed later, we use **Anaconda** (more 
precisely: the minimised variant **Miniconda**) on Windows systems. With this 
software it is possible to run independent Python installations in so-called 
**virtual environments**. So if you want to create your own projects with other 
Python versions, modules or module versions after this workshop, this is possible 
with Anaconda without any problems.

As a code editor, we use **PyCharm**, which offers various tools for the development 
of Python programmes.

### Ubuntu / Debian
Open a terminal and execute the following command lines while substituting 
`path/2/virtual/environment` and `path/to/project/dir` with your own folders.

```commandline
sudo apt-get update 
sudo apt install python3-venv python3-pip
ENV_HOME=path/2/virtual/environment
python3 -m venv $ENV_HOME 
source $ENV_HOME/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install tensorboard transformers[torch] evaluate datasets scikit-learn tokenizers tqdm
pip install hydra-core --upgrade
```
To install the project framework and the Pycharm environment download and install PyCharm from 
https://www.jetbrains.com/de-de/pycharm/download/other.html and run:
```
cd path/to/project/dir
git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf
```
(tested with Ubuntu 23.04)

### Windows
#### Virtual environment
1. install [Miniconda](https://docs.conda.io/en/latest/miniconda.html#windows-installers):
Use the installer for Windows 64bit systems. In the installation process, select the suggested default 
settings. If no shortcut to the Anaconda prompt has been created on the desktop after the 
installation, create one yourself: Create a new shortcut with the target 
`cmd /c start "Anaconda Prompt" "C:\path\to\Miniconda3\Scripts\activate"`, replacing 
`C:\path\to\` with the correct path to the Miniconda installation.
2. expand the sources that Miniconda can access to install software in virtual environments: 
By default, Anaconda often only has access to older versions of software packages. To expand the 
scope, we add another community-maintained source, **Conda-Forge**. To do this, open the Anaconda 
prompt and type: `conda config --add channels conda-forge`. Conda-Forge should now be entered in 
the list of channels for obtaining software packages: `conda config --show channels`. 
3. Create a new virtual environment: In the Anaconda prompt, you should see the prefix `(base)` 
in the line before the cursor, which means that you are currently in the base environment of Anaconda. 
To create the new environment, type: `conda create -n llm_workshop python=3.11 git`. You can replace 
`llm_workshop` with a name of your choice. 
4. Activate the new virtual environment: `conda activate llm_workshop`. If you forget the name of the 
environment or want to see an overview of all created virtual environments: `conda env list`.
5. update the packages *pip* and *setuptools* installed with Python: 
`python -m pip install -U pip && pip install -U setuptools`.
6. Install the Python modules needed for the workshop: 
`pip install torch torchvision torchaudio transformers[torch] tensorboard evaluate datasets scikit-learn notebook hydra-core --upgrade`.
#### Workshop folder and repository 
7. Create a workshop folder on your system. Change to this workshop folder in the Anaconda prompt. 
8. Clone the workshop repository into the workshop folder: 
`git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf.git`. 
9. create the folder `workdir` in the workshop folder, so that there should now be a total of 
two folders in the workshop folder: `workdir` and `uros_pf`. In `workdir` we will store data, 
configurations and models.
#### PyCharm
10. install [PyCharm](https://www.jetbrains.com/de-de/pycharm/download/other.html) 
(community version 2023.2.1 for Windows). 11. start PyCharm and open it.
11. Start PyCharm and open the repository folder `uros_pf` in the Workshop folder: Click on the 
main menu icon in the upper left corner, click on `Open` and select the repository folder. Answer 
the following security question with `Trust Project`.
12. Activate your virtual environment in PyCharm: PyCharm can now use the virtual environment 
we created in point 3 as a working environment and thus also the Python 3.11 interpreter it contains. 
Click on the selection of the Python interpreter in the lower right corner. If your Miniconda 
environment is not yet available, select `Add New Interpreter` and then `Add Local Interpreter`. 
In the now opened display `Add Python Interpreter` select `Conda Environment` as type on the left. 
Then enter the path to `conda.exe` of your Miniconda installation on the right: `
C:\path\to\Miniconda3\Scripts\conda.exe`. Then activate `Use existing environment` and select 
your virtual environment from the dropdown menu, which should now be found in the dropdown menu. 
Confirm with `OK`. Now PyCharm can use the Python modules you installed in the virtual environment 
in point 6.