AI Run Tool Documentation
Welcome to the official documentation for the Rendora AI Run Tool. This tool allows you to easily manage and execute AI workloads on Rendora's GPU infrastructure.
Table of Contents
Introduction
The Rendora AI Run Tool is a command-line interface (CLI) that simplifies the process of running AI models and scripts on Rendora's cloud GPU resources. It provides features for:
- Automated resource allocation
- Simplified data management
- Real-time job monitoring
- Scalable deployments
- Model Tracking
Installation
You can install the Rendora AI Run Tool using pip:
pip install rendora-ai-run-tool
Make sure you have Python 3.7+ installed on your system.
Configuration
Before using the tool, you need to configure your Rendora API key. You can set it as an environment variable:
export RENDORA_API_KEY="YOUR_API_KEY"
Alternatively, you can configure it in the `config.yaml` file located in your home directory (`~/.rendora/config.yaml`).
Usage
Running Python Scripts
To run a Python script, use the `run` command:
rendora run --script my_script.py --gpu rtx4090 --data data.csv
Available options:
- `--script`: Path to the Python script.
- `--gpu`: Type of GPU to use (e.g., rtx4090, h100).
- `--data`: Path to the data file.
- `--cpu`: Number of CPU cores to allocate (optional).
- `--memory`: Memory to allocate in GB (optional).
Managing Data
You can upload data to Rendora using the `upload` command:
rendora upload --file my_data.csv
Uploaded data is stored in your Rendora storage and can be accessed by your scripts.
Monitoring Jobs
You can monitor the status of your jobs using the `status` command:
rendora status --job_id JOB_ID
This will display real-time information about your job's progress, resource utilization, and any errors that may occur.
Fine-Tuning Models
Fine-tuning allows you to adapt a pre-trained model to a specific dataset, improving its performance on a particular task.
To fine-tune a model, use the `finetune` command:
rendora finetune --script finetune.py --base_model pretrained_model.pth --dataset new_data.csv --learning_rate 0.0001
Available options:
- `--script`: Path to the fine-tuning script.
- `--base_model`: Path or name of the pre-trained model.
- `--dataset`: Path to the dataset used for fine-tuning.
- `--learning_rate`: Learning rate for the fine-tuning process.
Using Different AI Frameworks
The Rendora AI Run Tool supports a variety of AI frameworks, including PyTorch and TensorFlow. Ensure your scripts are compatible with the chosen framework.
PyTorch Example
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
# (Define your model, dataset, and training loop here)
# Model = YourModel()
# DataLoader = YourDataLoader()
# Optimizer = YourOptimizer()
# LossFn = YourLossFunction()
# Dummy Variables
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
class Dataset(Dataset):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
def __len__(self):
return 0
def __getitem__(self, idx):
return 0
Model = Model()
train_loader = DataLoader(Dataset(), batch_size=1)
Optimizer = optim.Adam(Model.parameters(), lr=1e-3)
LossFn = nn.MSELoss()
def training_loop(model, train_loader, Optimizer, LossFn):
size = len(train_loader.dataset)
for batch, X in enumerate(train_loader):
Optimizer.zero_grad()
preds = Model(X)
loss = LossFn(preds, torch.rand_like(preds))
loss.backward()
Optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), (batch + 1)*len(X)
print(f'loss: {loss:.4f} [{current:>5d}/{size:>5d}')
training_loop(Model, train_loader, Optimizer, LossFn)
print("PyTorch training complete!")
rendora run --script train_pytorch.py --gpu rtx4090
TensorFlow Example
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Define a simple sequential model
model = Sequential([
Dense(128, activation='relu', input_shape=(784,)), # Example input shape
Dense(10, activation='softmax') # 10 output classes for a classification task
])
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Example data (replace with your actual data loading)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() # MNIST dataset
# Preprocess the data
x_train = x_train.reshape(60000, 784).astype('float32') / 255.0
x_test = x_test.reshape(10000, 784).astype('float32') / 255.0
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)
# Train the model
model.fit(x_train, y_train, epochs=2, batch_size=32) # Train for 2 epochs with a batch size of 32
# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test)
print('Test accuracy:', accuracy)
rendora run --script train_tensorflow.py --gpu rtx4090
Remember to install the required dependencies (e.g., `torch`, `tensorflow`) before running your scripts:
pip install torch tensorflow
Model Lineage Tracking
The Rendora AI Run Tool allows you to track the lineage of your models, making it easier to reproduce experiments and manage model versions.
To enable model lineage tracking, configure the following settings in `config.yaml`:
lineage_tracking:
enabled: true
metadata_store: "your_metadata_store_url"
Here's how to use the tool to run a model training script with specific metadata, to be tracked:
import rendora_ai_run
try:
job_id = rendora_ai_run.run(
script="train_model.py",
gpu_type="rtx4090",
data="training_data.csv",
environment={"NUM_EPOCHS": "100"}, # Example environment variable
metadata={"model_type": "CNN", "dataset_version": "v1.2", "notes": "Experiment with dropout"}
)
print(f"Job submitted successfully. Job ID: {job_id}")
except Exception as e:
print(f"Error submitting job: {e}")
Best Practices for Model Management
- Model Versioning: Use a version control system (e.g., Git) to track changes to your model code.
- Experiment Tracking: Keep detailed records of your experiments, including hyperparameters, datasets, and evaluation metrics.
- Regular Evaluation: Continuously evaluate your models to ensure they are performing as expected.
Examples
Here are some examples of how to use the Rendora AI Run Tool for common tasks.
Running a Script on a Specific GPU
rendora run --script my_script.py --gpu rtx4090
Running a Fine-Tuning job and storing to a datastore
rendora run --script train_model.py --gpu rtx4090 \
--data training_data.csv \
--cpu 4 \
--memory 8 \
--environment NUM_EPOCHS=100 LEARNING_RATE=0.001 \
--metadata model_type=CNN dataset_version=v1.2 notes="Experiment with dropout"
Troubleshooting
This section provides solutions to common problems you may encounter while using the Rendora AI Run Tool.
- "API Key Not Found": Make sure you have set the `RENDORA_API_KEY` environment variable or configured it in `config.yaml`.
- "GPU Not Available": Try a different GPU type or wait for resources to become available.
API Reference
This section provides a detailed reference to the Rendora AI Run Tool API. This is more detail about all attributes and how to track models and what it can do.
FAQ
Frequently asked questions about the Rendora AI Run Tool.
- Q: What GPU types are supported? A: The tool supports a variety of NVIDIA and AMD GPUs, including RTX 4090, H100, and A100.
- Q: How do I manage my data? A: Use the `upload` command to upload data to Rendora storage, and access it from your scripts.