Skip to content
README.md 3.21 KiB
Newer Older
Tobias Strauß's avatar
Tobias Strauß committed
# Uni Rostock PyTorch Framework

The `uros_pf` framework intents to reduce the setup time for new machine 
learning projects based on pytorch. 
It unifies standard procedures, such as configuring, input/output
procedures, training, logging etc., that are used in all projects.

## Basic principles

- Files in the `uros_pf` package are of general interest. They must
not be adapted to a certain project.
- Project specific source code goes into the `scenario` folder.
- Data / configuration files / model checkpoints etc. are stored in a 
separate workdir folder.
- To implement a new scenario at least 3 files have to be coded: the 
`input_processor` (preparing the input and targets) the `model` 
 (connecting the trainer, optimizer and model) and the `module` (containing 
the neural network or the ML-method)
Tobias Strauß's avatar
Tobias Strauß committed
- The input processor, the model and all hyperparameter shall be configured
by the config file (a `yaml`-file). It is usually stored in the workdir and 
passed to the trainer by `-cn config_name` (`config_name` without extension `.yaml`).

Tobias Strauß's avatar
Tobias Strauß committed
## Sample project

A working configuration file (named `ag_linear_config.yaml`) is
```yaml
hydra:
    run:
        dir: ${trainer.model_path}
Tobias Strauß's avatar
Tobias Strauß committed
builder:
    input: "scenario.ag_news_corpus.ag_ip.AGInputProcessor"
    model: "scenario.ag_news_corpus.ag_simple_model.SimpleModel"
trainer:
    epochs: 20
    model_path: models/${now:%Y-%m-%d}/${now:%H-%M-%S}
Tobias Strauß's avatar
Tobias Strauß committed
input:
    feature_size: 1000
    train_file: "data/ag_news_corpus/train.csv"
    val_file: "data/ag_news_corpus/test.csv"
    batch_size: 10
    samples_per_epoch: 12000
model:
    num_of_classes: 4
    loss_fn: "torch.nn.MSELoss"
    metric_fns: "uros_pf.metrics.accuracy.Accuracy"
    module_cls: "scenario.ag_news_corpus.ag_least_square_module.AGLeastSquareModule"
    optimizer: "torch.optim.SGD"
    lr: 0.01
    module:
        feature_size: ${input.feature_size}
        num_of_classes: ${model.num_of_classes}

```
The structure of the workdir is: 
```commandline
├── data
│   ├── ag_news_corpus
│   │   ├── test.csv
│   │   ├── train.csv
├── config
│   ├── ag_linear_config.yaml
```
whereas the data is taken from 
[mhjabreel's github account](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv).

To run the example project just execute 
`python3 path/to/src/uros_pf/trainer/trainer.py -cn ag_linear_config`
from the workdir.

## Install

### Ubuntu / Debian
Open a terminal and execute the following command lines while substituting 
`path/2/virtual/environment` and `path/to/project/dir` with your own folders.

```commandline
sudo apt-get update 
sudo apt install python3-venv python3-pip
ENV_HOME=path/2/virtual/environment
python3 -m venv $ENV_HOME 
source $ENV_HOME/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install tensorboard transformers[torch] evaluate datasets scikit-learn tokenizers tqdm
pip install hydra-core --upgrade
```
To install the project framework and the Pycharm environment download and install PyCharm from 
https://www.jetbrains.com/de-de/pycharm/download/other.html and run:
```
cd path/to/project/dir
git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf
```
(tested with Ubuntu 23.04)