Newer
Older
# Uni Rostock PyTorch Framework
The `uros_pf` framework intents to reduce the setup time for new machine
learning projects based on pytorch.
It unifies standard procedures, such as configuring, input/output
procedures, training, logging etc., that are used in all projects.
## Basic principles
- Files in the `uros_pf` package are of general interest. They must
not be adapted to a certain project.
- Project specific source code goes into the `scenario` folder.
- Data / configuration files / model checkpoints etc. are stored in a
separate workdir folder.
- To implement a new scenario at least 3 files have to be coded: the
`input_processor` (preparing the input and targets) the `model`
(connecting the trainer, optimizer and model) and the `module` (containing
the neural network or the ML-method)
- The input processor, the model and all hyperparameter shall be configured
by the config file (a `yaml`-file). It is usually stored in the workdir and
passed to the trainer by `-cn config_name` (`config_name` without extension `.yaml`).
## Sample project
A working configuration file (named `ag_linear_config.yaml`) is
```yaml
hydra:
run:
dir: ${trainer.model_path}
builder:
input: "scenario.ag_news_corpus.ag_ip.AGInputProcessor"
model: "scenario.ag_news_corpus.ag_simple_model.SimpleModel"
trainer:
epochs: 20
model_path: models/${now:%Y-%m-%d}/${now:%H-%M-%S}
input:
feature_size: 1000
train_file: "data/ag_news_corpus/train.csv"
val_file: "data/ag_news_corpus/test.csv"
batch_size: 10
samples_per_epoch: 12000
model:
num_of_classes: 4
loss_fn: "torch.nn.MSELoss"
metric_fns: "uros_pf.metrics.accuracy.Accuracy"
module_cls: "scenario.ag_news_corpus.ag_least_square_module.AGLeastSquareModule"
optimizer: "torch.optim.SGD"
lr: 0.01
module:
feature_size: ${input.feature_size}
num_of_classes: ${model.num_of_classes}
```
The structure of the workdir is:
```commandline
├── data
│ ├── ag_news_corpus
│ │ ├── test.csv
│ │ ├── train.csv
├── config
│ ├── ag_linear_config.yaml
```
whereas the data is taken from
[mhjabreel's github account](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv).
To run the example project just execute
`python3 path/to/src/uros_pf/trainer/trainer.py -cn ag_linear_config`
from the workdir.
## Install
### Ubuntu / Debian
Open a terminal and execute the following command lines while substituting
`path/2/virtual/environment` and `path/to/project/dir` with your own folders.
```commandline
sudo apt-get update
sudo apt install python3-venv python3-pip
ENV_HOME=path/2/virtual/environment
python3 -m venv $ENV_HOME
source $ENV_HOME/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install tensorboard transformers[torch] evaluate datasets scikit-learn tokenizers tqdm
pip install hydra-core --upgrade
```
To install the project framework and the Pycharm environment download and install PyCharm from
https://www.jetbrains.com/de-de/pycharm/download/other.html and run:
```
cd path/to/project/dir
git clone https://citlab0.math.uni-rostock.de/gitlab/shared/uros_pf
```
(tested with Ubuntu 23.04)