🛠️ Trainer

The Trainer class is a customizable tool designed to streamline the training, evaluation,and prediction processes for PyTorch models.

With a comprehensive set of features and flexible functionalities, it empowers users to efficiently train and assess their machine learning models with ease. At its core, the Trainer class encapsulates essential components necessary for model training, including the model architecture, data loaders for training, validation, and optionally testing datasets, loss function, optimizer, and scheduling mechanisms. It also provides hooks for incorporating custom metrics and callbacks, offering extensibility for tailored evaluation and logging during training.

Trainer

class torchmate.trainer.Trainer(model: torch.nn.Module, train_dataloader: torch.utils.data.DataLoader, val_dataloader: torch.utils.data.DataLoader, loss_fn: Callable | torch.nn.Module, optimizer: torch.optim.Optimizer, num_epochs: int = 1, test_dataloader: torch.utils.data.DataLoader | None = None, metrics: List[Callable] | None = None, callbacks: List[Type[Callback]] | None = None, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, schedule_monitor: str = 'val_loss', mixed_precision: bool = False, use_grad_penalty: bool = False, device: str | torch.device = 'cpu')[source]

Bases: Module

Encapsulate training essentials

Parameters:

model (torch.nn.Module, required) – The PyTorch model to be trained.
train_dataloader (torch.utils.data.DataLoader, required) – DataLoader for the training dataset.
val_dataloader (torch.utils.data.DataLoader, required) – DataLoader for the validation dataset.
loss_fn (torch.nn.Module, required) – Loss function for training.
optimizer (torch.optim.Optimizer, required) – Optimizer for updating model parameters.
num_epochs (int, optional) – Number of training epochs (default is 1).
test_dataloader (torch.utils.data.DataLoader, optional) – DataLoader for the test dataset.
metrics (List[callable], optional) – List of metrics functions for evaluation.
callbacks (List[Callback], optional) – List of callback functions for various stages.
scheduler (torch.optim.lr_scheduler._LRScheduler, optional) – Learning rate scheduler.
schedule_monitor (str, optional) – Metric to monitor for scheduler (default is “val_loss”).
mixed_precision (bool, optional) – Whether to use mixed precision (fp16) training (default is False).
use_gradient_penalty (bool, optional) – Whether to use gradient penalty (default is False).
device (str, optional) – Device to use for training (default is “cpu”).

Other Attributes:

history (dict): Training history containing loss, metrics, and learning rates.
early_stop (bool): Flag for early stopping.
update_params (bool): Flag for updating model parameters.
accumulation_steps (int): Number of steps for gradient accumulation during training.

Important Methods:

fit(): Train and validate the model for the specified number of epochs and return history.
evaluate(): Evaluate the model on the validation dataset and return evaluation history.
predict(): Make predictions using the model on the test dataset.

Example usage:

import torch
import numpy as np

import os
import time

from torchmate.trainer import Trainer
from torchmate.callbacks import CSVLogger, ModelCheckpoint
from sklearn.model_selection import train_test_split

# Create a simple neural network model
class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc1 = torch.nn.Linear(1, 1)

    def forward(self, x):
        return self.fc1(x)

# Create synthetic data
X = torch.tensor(np.random.rand(1000, 1), dtype=torch.float32)
y = 2 * X + 1 + torch.randn(1000, 1) * 0.1  # Adding some noise

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DataLoader objects for training and validation
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True)
val_dataset = torch.utils.data.TensorDataset(X_val, y_val)
val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=8, shuffle=False)


# Create Metrics

class MSE(torch.nn.Module):
    __name__ = 'mse'
    def __init__(self, weight=None, size_average=True):
        super(MSE, self).__init__()
    def forward(self, inputs, targets):
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        mse = torch.mean(torch.abs(inputs - targets))
        return mse


def mae(inputs, targets):
    inputs = inputs.view(-1)
    targets = targets.view(-1)
    mae = torch.abs(torch.mean(inputs - targets))
    return mae

model = SimpleModel()
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)

metrics = [MSE(),mae]

logdir = "logs"
csv_file = os.path.join(logdir,"logs.csv")
ckpt_dir = os.path.join(logdir,"model")

callbacks = [CSVLogger(filename=csv_file),
            ModelCheckpoint(checkpoint_dir=ckpt_dir)
            ]


device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print(device)

# Create a Trainer instance with the callbacks
trainer = Trainer(model,
                train_dataloader,
                val_dataloader,
                loss_fn,
                optimizer,
                num_epochs=3,
                scheduler=scheduler,
                metrics=metrics,
                callbacks=callbacks,
                device=device,
                mixed_precision=True,
                use_grad_penalty=True
                )


# Train the model
history = trainer.fit()

print("_"*150)

print(pd.read_csv(csv_file))

fit()[source]

Train the model and returns the training history.

Returns:: A dictionary object encapsulating the training history.
Return type:: Dict

Evaluate the model on the a dataset and returns the evaluation history.

This method provides flexibility for customized evaluation.

Parameters:

dataloader (torch.utils.data.DataLoader, optional) – A PyTorch DataLoader containing the validation data. If not provided, the self.val_dataloader attribute will be used. Defaults to None.
loss_fn (Callable or torch.nn.Module, optional) – A custom loss function for evaluation. If not provided, the self.loss_fn attribute will be used. Defaults to None.
metrics (List[Callable], optional) – A list of custom evaluation metrics. If not provided, the self.metrics attribute will be used. Defaults to None.
callbacks (List[Callback], optional) – A list of callback objects for evaluation stages. If not provided, the self.callbacks attribute will be used. Defaults to None.
device (str or torch.device, optional) – The device to use for evaluation (e.g., “cpu” or “cuda”). If not provided, the self.device attribute will be used. Defaults to None.

Returns:

A dictionary object containing the evaluation results.

Return type:

Dict

predict(test_dataloader: torch.utils.data.DataLoader | None = None, callbacks: List[Type[Callback]] | None = None, device: str | torch.device | None = None)[source]

Perform predictions on the provided test data using the trained model.

This method enables you to make predictions on a test dataset using the trained model within your Trainer class.

Parameters:

test_dataloader (DataLoader, optional) – A PyTorch DataLoader containing the test data. If not provided, the self.test_dataloader attribute will be used. Defaults to None.
callbacks (list[Callback], optional) – A list of callback objects to be executed at various stages of the prediction process. If not provided, the self.callbacks attribute will be used. Defaults to None.
device (str or torch.device, optional) – The device to run the prediction on (e.g., “cpu” or “cuda”). If not provided, the self.device attribute will be used. Defaults to None.

Returns:

A PyTorch Tensor containing the predicted outputs for the test data.

Return type:

torch.Tensor

Raises:

ValueError – If both test_dataloader and self.test_dataloader are None.