Sync all skills and memories 2026-04-14 07:27

2026-04-14 07:27:20 +09:00
parent 516bb44fe6
commit 1eba2bca95
386 changed files with 167655 additions and 0 deletions
--- a/skills/mlops/evaluation/weights-and-biases/references/sweeps.md
+++ b/skills/mlops/evaluation/weights-and-biases/references/sweeps.md
@@ -0,0 +1,847 @@
+# Comprehensive Hyperparameter Sweeps Guide
+
+Complete guide to hyperparameter optimization with W&B Sweeps.
+
+## Table of Contents
+- Sweep Configuration
+- Search Strategies
+- Parameter Distributions
+- Early Termination
+- Parallel Execution
+- Advanced Patterns
+- Real-World Examples
+
+## Sweep Configuration
+
+### Basic Sweep Config
+
+```python
+sweep_config = {
+    'method': 'bayes',  # Search strategy
+    'metric': {
+        'name': 'val/accuracy',
+        'goal': 'maximize'  # or 'minimize'
+    },
+    'parameters': {
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-1
+        },
+        'batch_size': {
+            'values': [16, 32, 64, 128]
+        }
+    }
+}
+
+# Initialize sweep
+sweep_id = wandb.sweep(sweep_config, project="my-project")
+```
+
+### Complete Config Example
+
+```python
+sweep_config = {
+    # Required: Search method
+    'method': 'bayes',
+
+    # Required: Optimization metric
+    'metric': {
+        'name': 'val/f1_score',
+        'goal': 'maximize'
+    },
+
+    # Required: Parameters to search
+    'parameters': {
+        # Continuous parameter
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-1
+        },
+
+        # Discrete values
+        'batch_size': {
+            'values': [16, 32, 64, 128]
+        },
+
+        # Categorical
+        'optimizer': {
+            'values': ['adam', 'sgd', 'rmsprop', 'adamw']
+        },
+
+        # Uniform distribution
+        'dropout': {
+            'distribution': 'uniform',
+            'min': 0.1,
+            'max': 0.5
+        },
+
+        # Integer range
+        'num_layers': {
+            'distribution': 'int_uniform',
+            'min': 2,
+            'max': 10
+        },
+
+        # Fixed value (constant across runs)
+        'epochs': {
+            'value': 50
+        }
+    },
+
+    # Optional: Early termination
+    'early_terminate': {
+        'type': 'hyperband',
+        'min_iter': 5,
+        's': 2,
+        'eta': 3,
+        'max_iter': 27
+    }
+}
+```
+
+## Search Strategies
+
+### 1. Grid Search
+
+Exhaustively search all combinations.
+
+```python
+sweep_config = {
+    'method': 'grid',
+    'parameters': {
+        'learning_rate': {
+            'values': [0.001, 0.01, 0.1]
+        },
+        'batch_size': {
+            'values': [16, 32, 64]
+        },
+        'optimizer': {
+            'values': ['adam', 'sgd']
+        }
+    }
+}
+
+# Total runs: 3 × 3 × 2 = 18 runs
+```
+
+**Pros:**
+- Comprehensive search
+- Reproducible results
+- No randomness
+
+**Cons:**
+- Exponential growth with parameters
+- Inefficient for continuous parameters
+- Not scalable beyond 3-4 parameters
+
+**When to use:**
+- Few parameters (< 4)
+- All discrete values
+- Need complete coverage
+
+### 2. Random Search
+
+Randomly sample parameter combinations.
+
+```python
+sweep_config = {
+    'method': 'random',
+    'parameters': {
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-1
+        },
+        'batch_size': {
+            'values': [16, 32, 64, 128, 256]
+        },
+        'dropout': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 0.5
+        },
+        'num_layers': {
+            'distribution': 'int_uniform',
+            'min': 2,
+            'max': 8
+        }
+    }
+}
+
+# Run 100 random trials
+wandb.agent(sweep_id, function=train, count=100)
+```
+
+**Pros:**
+- Scales to many parameters
+- Can run indefinitely
+- Often finds good solutions quickly
+
+**Cons:**
+- No learning from previous runs
+- May miss optimal region
+- Results vary with random seed
+
+**When to use:**
+- Many parameters (> 4)
+- Quick exploration
+- Limited budget
+
+### 3. Bayesian Optimization (Recommended)
+
+Learn from previous trials to sample promising regions.
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'metric': {
+        'name': 'val/loss',
+        'goal': 'minimize'
+    },
+    'parameters': {
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-1
+        },
+        'weight_decay': {
+            'distribution': 'log_uniform',
+            'min': 1e-6,
+            'max': 1e-2
+        },
+        'dropout': {
+            'distribution': 'uniform',
+            'min': 0.1,
+            'max': 0.5
+        },
+        'num_layers': {
+            'values': [2, 3, 4, 5, 6]
+        }
+    }
+}
+```
+
+**Pros:**
+- Most sample-efficient
+- Learns from past trials
+- Focuses on promising regions
+
+**Cons:**
+- Initial random exploration phase
+- May get stuck in local optima
+- Slower per iteration
+
+**When to use:**
+- Expensive training runs
+- Need best performance
+- Limited compute budget
+
+## Parameter Distributions
+
+### Continuous Distributions
+
+```python
+# Log-uniform: Good for learning rates, regularization
+'learning_rate': {
+    'distribution': 'log_uniform',
+    'min': 1e-6,
+    'max': 1e-1
+}
+
+# Uniform: Good for dropout, momentum
+'dropout': {
+    'distribution': 'uniform',
+    'min': 0.0,
+    'max': 0.5
+}
+
+# Normal distribution
+'parameter': {
+    'distribution': 'normal',
+    'mu': 0.5,
+    'sigma': 0.1
+}
+
+# Log-normal distribution
+'parameter': {
+    'distribution': 'log_normal',
+    'mu': 0.0,
+    'sigma': 1.0
+}
+```
+
+### Discrete Distributions
+
+```python
+# Fixed values
+'batch_size': {
+    'values': [16, 32, 64, 128, 256]
+}
+
+# Integer uniform
+'num_layers': {
+    'distribution': 'int_uniform',
+    'min': 2,
+    'max': 10
+}
+
+# Quantized uniform (step size)
+'layer_size': {
+    'distribution': 'q_uniform',
+    'min': 32,
+    'max': 512,
+    'q': 32  # Step by 32: 32, 64, 96, 128...
+}
+
+# Quantized log-uniform
+'hidden_size': {
+    'distribution': 'q_log_uniform',
+    'min': 32,
+    'max': 1024,
+    'q': 32
+}
+```
+
+### Categorical Parameters
+
+```python
+# Optimizers
+'optimizer': {
+    'values': ['adam', 'sgd', 'rmsprop', 'adamw']
+}
+
+# Model architectures
+'model': {
+    'values': ['resnet18', 'resnet34', 'resnet50', 'efficientnet_b0']
+}
+
+# Activation functions
+'activation': {
+    'values': ['relu', 'gelu', 'silu', 'leaky_relu']
+}
+```
+
+## Early Termination
+
+Stop underperforming runs early to save compute.
+
+### Hyperband
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'metric': {'name': 'val/accuracy', 'goal': 'maximize'},
+    'parameters': {...},
+
+    # Hyperband early termination
+    'early_terminate': {
+        'type': 'hyperband',
+        'min_iter': 3,      # Minimum iterations before termination
+        's': 2,             # Bracket count
+        'eta': 3,           # Downsampling rate
+        'max_iter': 27      # Maximum iterations
+    }
+}
+```
+
+**How it works:**
+- Runs trials in brackets
+- Keeps top 1/eta performers each round
+- Eliminates bottom performers early
+
+### Custom Termination
+
+```python
+def train():
+    run = wandb.init()
+
+    for epoch in range(MAX_EPOCHS):
+        loss = train_epoch()
+        val_acc = validate()
+
+        wandb.log({'val/accuracy': val_acc, 'epoch': epoch})
+
+        # Custom early stopping
+        if epoch > 5 and val_acc < 0.5:
+            print("Early stop: Poor performance")
+            break
+
+        if epoch > 10 and val_acc > best_acc - 0.01:
+            print("Early stop: No improvement")
+            break
+```
+
+## Training Function
+
+### Basic Template
+
+```python
+def train():
+    # Initialize W&B run
+    run = wandb.init()
+
+    # Get hyperparameters
+    config = wandb.config
+
+    # Build model with config
+    model = build_model(
+        hidden_size=config.hidden_size,
+        num_layers=config.num_layers,
+        dropout=config.dropout
+    )
+
+    # Create optimizer
+    optimizer = create_optimizer(
+        model.parameters(),
+        name=config.optimizer,
+        lr=config.learning_rate,
+        weight_decay=config.weight_decay
+    )
+
+    # Training loop
+    for epoch in range(config.epochs):
+        # Train
+        train_loss, train_acc = train_epoch(
+            model, optimizer, train_loader, config.batch_size
+        )
+
+        # Validate
+        val_loss, val_acc = validate(model, val_loader)
+
+        # Log metrics
+        wandb.log({
+            'train/loss': train_loss,
+            'train/accuracy': train_acc,
+            'val/loss': val_loss,
+            'val/accuracy': val_acc,
+            'epoch': epoch
+        })
+
+    # Log final model
+    torch.save(model.state_dict(), 'model.pth')
+    wandb.save('model.pth')
+
+    # Finish run
+    wandb.finish()
+```
+
+### With PyTorch
+
+```python
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+import wandb
+
+def train():
+    run = wandb.init()
+    config = wandb.config
+
+    # Data
+    train_loader = DataLoader(
+        train_dataset,
+        batch_size=config.batch_size,
+        shuffle=True
+    )
+
+    # Model
+    model = ResNet(
+        num_classes=config.num_classes,
+        dropout=config.dropout
+    ).to(device)
+
+    # Optimizer
+    if config.optimizer == 'adam':
+        optimizer = torch.optim.Adam(
+            model.parameters(),
+            lr=config.learning_rate,
+            weight_decay=config.weight_decay
+        )
+    elif config.optimizer == 'sgd':
+        optimizer = torch.optim.SGD(
+            model.parameters(),
+            lr=config.learning_rate,
+            momentum=config.momentum,
+            weight_decay=config.weight_decay
+        )
+
+    # Scheduler
+    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
+        optimizer, T_max=config.epochs
+    )
+
+    # Training
+    for epoch in range(config.epochs):
+        model.train()
+        train_loss = 0.0
+
+        for data, target in train_loader:
+            data, target = data.to(device), target.to(device)
+
+            optimizer.zero_grad()
+            output = model(data)
+            loss = nn.CrossEntropyLoss()(output, target)
+            loss.backward()
+            optimizer.step()
+
+            train_loss += loss.item()
+
+        # Validation
+        model.eval()
+        val_loss, val_acc = validate(model, val_loader)
+
+        # Step scheduler
+        scheduler.step()
+
+        # Log
+        wandb.log({
+            'train/loss': train_loss / len(train_loader),
+            'val/loss': val_loss,
+            'val/accuracy': val_acc,
+            'learning_rate': scheduler.get_last_lr()[0],
+            'epoch': epoch
+        })
+```
+
+## Parallel Execution
+
+### Multiple Agents
+
+Run sweep agents in parallel to speed up search.
+
+```python
+# Initialize sweep once
+sweep_id = wandb.sweep(sweep_config, project="my-project")
+
+# Run multiple agents in parallel
+# Agent 1 (Terminal 1)
+wandb.agent(sweep_id, function=train, count=20)
+
+# Agent 2 (Terminal 2)
+wandb.agent(sweep_id, function=train, count=20)
+
+# Agent 3 (Terminal 3)
+wandb.agent(sweep_id, function=train, count=20)
+
+# Total: 60 runs across 3 agents
+```
+
+### Multi-GPU Execution
+
+```python
+import os
+
+def train():
+    # Get available GPU
+    gpu_id = os.environ.get('CUDA_VISIBLE_DEVICES', '0')
+
+    run = wandb.init()
+    config = wandb.config
+
+    # Train on specific GPU
+    device = torch.device(f'cuda:{gpu_id}')
+    model = model.to(device)
+
+    # ... rest of training ...
+
+# Run agents on different GPUs
+# Terminal 1
+# CUDA_VISIBLE_DEVICES=0 wandb agent sweep_id
+
+# Terminal 2
+# CUDA_VISIBLE_DEVICES=1 wandb agent sweep_id
+
+# Terminal 3
+# CUDA_VISIBLE_DEVICES=2 wandb agent sweep_id
+```
+
+## Advanced Patterns
+
+### Nested Parameters
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'metric': {'name': 'val/accuracy', 'goal': 'maximize'},
+    'parameters': {
+        'model': {
+            'parameters': {
+                'type': {
+                    'values': ['resnet', 'efficientnet']
+                },
+                'size': {
+                    'values': ['small', 'medium', 'large']
+                }
+            }
+        },
+        'optimizer': {
+            'parameters': {
+                'type': {
+                    'values': ['adam', 'sgd']
+                },
+                'lr': {
+                    'distribution': 'log_uniform',
+                    'min': 1e-5,
+                    'max': 1e-1
+                }
+            }
+        }
+    }
+}
+
+# Access nested config
+def train():
+    run = wandb.init()
+    model_type = wandb.config.model.type
+    model_size = wandb.config.model.size
+    opt_type = wandb.config.optimizer.type
+    lr = wandb.config.optimizer.lr
+```
+
+### Conditional Parameters
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'parameters': {
+        'optimizer': {
+            'values': ['adam', 'sgd']
+        },
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-1
+        },
+        # Only used if optimizer == 'sgd'
+        'momentum': {
+            'distribution': 'uniform',
+            'min': 0.5,
+            'max': 0.99
+        }
+    }
+}
+
+def train():
+    run = wandb.init()
+    config = wandb.config
+
+    if config.optimizer == 'adam':
+        optimizer = torch.optim.Adam(
+            model.parameters(),
+            lr=config.learning_rate
+        )
+    elif config.optimizer == 'sgd':
+        optimizer = torch.optim.SGD(
+            model.parameters(),
+            lr=config.learning_rate,
+            momentum=config.momentum  # Conditional parameter
+        )
+```
+
+## Real-World Examples
+
+### Image Classification
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'metric': {
+        'name': 'val/top1_accuracy',
+        'goal': 'maximize'
+    },
+    'parameters': {
+        # Model
+        'architecture': {
+            'values': ['resnet50', 'resnet101', 'efficientnet_b0', 'efficientnet_b3']
+        },
+        'pretrained': {
+            'values': [True, False]
+        },
+
+        # Training
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-5,
+            'max': 1e-2
+        },
+        'batch_size': {
+            'values': [16, 32, 64, 128]
+        },
+        'optimizer': {
+            'values': ['adam', 'sgd', 'adamw']
+        },
+        'weight_decay': {
+            'distribution': 'log_uniform',
+            'min': 1e-6,
+            'max': 1e-2
+        },
+
+        # Regularization
+        'dropout': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 0.5
+        },
+        'label_smoothing': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 0.2
+        },
+
+        # Data augmentation
+        'mixup_alpha': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 1.0
+        },
+        'cutmix_alpha': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 1.0
+        }
+    },
+    'early_terminate': {
+        'type': 'hyperband',
+        'min_iter': 5
+    }
+}
+```
+
+### NLP Fine-Tuning
+
+```python
+sweep_config = {
+    'method': 'bayes',
+    'metric': {'name': 'eval/f1', 'goal': 'maximize'},
+    'parameters': {
+        # Model
+        'model_name': {
+            'values': ['bert-base-uncased', 'roberta-base', 'distilbert-base-uncased']
+        },
+
+        # Training
+        'learning_rate': {
+            'distribution': 'log_uniform',
+            'min': 1e-6,
+            'max': 1e-4
+        },
+        'per_device_train_batch_size': {
+            'values': [8, 16, 32]
+        },
+        'num_train_epochs': {
+            'values': [3, 4, 5]
+        },
+        'warmup_ratio': {
+            'distribution': 'uniform',
+            'min': 0.0,
+            'max': 0.1
+        },
+        'weight_decay': {
+            'distribution': 'log_uniform',
+            'min': 1e-4,
+            'max': 1e-1
+        },
+
+        # Optimizer
+        'adam_beta1': {
+            'distribution': 'uniform',
+            'min': 0.8,
+            'max': 0.95
+        },
+        'adam_beta2': {
+            'distribution': 'uniform',
+            'min': 0.95,
+            'max': 0.999
+        }
+    }
+}
+```
+
+## Best Practices
+
+### 1. Start Small
+
+```python
+# Initial exploration: Random search, 20 runs
+sweep_config_v1 = {
+    'method': 'random',
+    'parameters': {...}
+}
+wandb.agent(sweep_id_v1, train, count=20)
+
+# Refined search: Bayes, narrow ranges
+sweep_config_v2 = {
+    'method': 'bayes',
+    'parameters': {
+        'learning_rate': {
+            'min': 5e-5,  # Narrowed from 1e-6 to 1e-4
+            'max': 1e-4
+        }
+    }
+}
+```
+
+### 2. Use Log Scales
+
+```python
+# ✅ Good: Log scale for learning rate
+'learning_rate': {
+    'distribution': 'log_uniform',
+    'min': 1e-6,
+    'max': 1e-2
+}
+
+# ❌ Bad: Linear scale
+'learning_rate': {
+    'distribution': 'uniform',
+    'min': 0.000001,
+    'max': 0.01
+}
+```
+
+### 3. Set Reasonable Ranges
+
+```python
+# Base ranges on prior knowledge
+'learning_rate': {'min': 1e-5, 'max': 1e-3},  # Typical for Adam
+'batch_size': {'values': [16, 32, 64]},       # GPU memory limits
+'dropout': {'min': 0.1, 'max': 0.5}           # Too high hurts training
+```
+
+### 4. Monitor Resource Usage
+
+```python
+def train():
+    run = wandb.init()
+
+    # Log system metrics
+    wandb.log({
+        'system/gpu_memory_allocated': torch.cuda.memory_allocated(),
+        'system/gpu_memory_reserved': torch.cuda.memory_reserved()
+    })
+```
+
+### 5. Save Best Models
+
+```python
+def train():
+    run = wandb.init()
+    best_acc = 0.0
+
+    for epoch in range(config.epochs):
+        val_acc = validate(model)
+
+        if val_acc > best_acc:
+            best_acc = val_acc
+            # Save best checkpoint
+            torch.save(model.state_dict(), 'best_model.pth')
+            wandb.save('best_model.pth')
+```
+
+## Resources
+
+- **Sweeps Documentation**: https://docs.wandb.ai/guides/sweeps
+- **Configuration Reference**: https://docs.wandb.ai/guides/sweeps/configuration
+- **Examples**: https://github.com/wandb/examples/tree/master/examples/wandb-sweeps