SmolFactory / docs /MODEL_CARD_USER_INPUT_ANALYSIS.md
Tonic's picture
adds monkey patch for trackio monitoring in torch and readme creator improvements
39db0ca verified
|
raw
history blame
8.57 kB
# Model Card User Input Analysis
## Overview
This document analyzes the interaction between the model card template (`templates/model_card.md`), the model card generator (`scripts/model_tonic/generate_model_card.py`), and the launch script (`launch.sh`) to identify variables that require user input and improve the user experience.
## Template Variables Analysis
### Variables in `templates/model_card.md`
The model card template uses the following variables that can be populated with user input:
#### Core Model Information
- `{{model_name}}` - Display name of the model
- `{{model_description}}` - Brief description of the model
- `{{repo_name}}` - Hugging Face repository name
- `{{base_model}}` - Base model used for fine-tuning
#### Training Configuration
- `{{training_config_type}}` - Type of training configuration used
- `{{trainer_type}}` - Type of trainer (SFT, DPO, etc.)
- `{{batch_size}}` - Training batch size
- `{{gradient_accumulation_steps}}` - Gradient accumulation steps
- `{{learning_rate}}` - Learning rate used
- `{{max_epochs}}` - Maximum number of epochs
- `{{max_seq_length}}` - Maximum sequence length
#### Dataset Information
- `{{dataset_name}}` - Name of the dataset used
- `{{dataset_size}}` - Size of the dataset
- `{{dataset_format}}` - Format of the dataset
- `{{dataset_sample_size}}` - Sample size (for lightweight configs)
#### Training Results
- `{{training_loss}}` - Final training loss
- `{{validation_loss}}` - Final validation loss
- `{{perplexity}}` - Model perplexity
#### Infrastructure
- `{{hardware_info}}` - Hardware used for training
- `{{experiment_name}}` - Name of the experiment
- `{{trackio_url}}` - Trackio monitoring URL
- `{{dataset_repo}}` - HF Dataset repository
#### Author Information
- `{{author_name}}` - Author name for citations and attribution
- `{{model_name_slug}}` - URL-friendly model name
#### Quantization
- `{{quantized_models}}` - Boolean indicating if quantized models exist
## User Input Requirements
### Previously Missing User Inputs
#### 1. **Author Name** (`author_name`)
- **Purpose**: Used in model card metadata and citations
- **Template Usage**: `{{#if author_name}}author: {{author_name}}{{/if}}`
- **Citation Usage**: `author={{{author_name}}}`
- **Default**: "Your Name"
- **User Input Added**: βœ… **IMPLEMENTED**
#### 2. **Model Description** (`model_description`)
- **Purpose**: Brief description of the model's capabilities
- **Template Usage**: `{{model_description}}`
- **Default**: "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities."
- **User Input Added**: βœ… **IMPLEMENTED**
### Variables That Don't Need User Input
Most variables are automatically populated from:
- **Training Configuration**: Batch size, learning rate, epochs, etc.
- **System Detection**: Hardware info, model size, etc.
- **Auto-Generation**: Repository names, experiment names, etc.
- **Training Results**: Loss values, perplexity, etc.
## Implementation Changes
### 1. Launch Script Updates (`launch.sh`)
#### Added User Input Prompts
```bash
# Step 8.2: Author Information for Model Card
print_step "Step 8.2: Author Information"
echo "================================="
print_info "This information will be used in the model card and citation."
get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME
print_info "Model description will be used in the model card and repository."
get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities." MODEL_DESCRIPTION
```
#### Updated Configuration Summary
```bash
echo " Author: $AUTHOR_NAME"
```
#### Updated Model Push Call
```bash
python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
--token "$HF_TOKEN" \
--trackio-url "$TRACKIO_URL" \
--experiment-name "$EXPERIMENT_NAME" \
--dataset-repo "$TRACKIO_DATASET_REPO" \
--author-name "$AUTHOR_NAME" \
--model-description "$MODEL_DESCRIPTION"
```
### 2. Push Script Updates (`scripts/model_tonic/push_to_huggingface.py`)
#### Added Command Line Arguments
```python
parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
```
#### Updated Class Constructor
```python
def __init__(
self,
model_path: str,
repo_name: str,
token: Optional[str] = None,
private: bool = False,
trackio_url: Optional[str] = None,
experiment_name: Optional[str] = None,
dataset_repo: Optional[str] = None,
hf_token: Optional[str] = None,
author_name: Optional[str] = None,
model_description: Optional[str] = None
):
```
#### Updated Model Card Generation
```python
variables = {
"model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
"model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
# ... other variables
"author_name": self.author_name or training_config.get('author_name', 'Your Name'),
}
```
## User Experience Improvements
### 1. **Interactive Prompts**
- Users are now prompted for author name and model description
- Default values are provided for convenience
- Clear explanations of what each field is used for
### 2. **Configuration Summary**
- Author name is now displayed in the configuration summary
- Users can review all settings before proceeding
### 3. **Automatic Integration**
- User inputs are automatically passed to the model card generation
- No manual editing of scripts required
## Template Variable Categories
### Automatic Variables (No User Input Needed)
- `repo_name` - Auto-generated from username and date
- `base_model` - Always "HuggingFaceTB/SmolLM3-3B"
- `training_config_type` - From user selection
- `trainer_type` - From user selection
- `batch_size`, `learning_rate`, `max_epochs` - From training config
- `hardware_info` - Auto-detected
- `experiment_name` - Auto-generated with timestamp
- `trackio_url` - Auto-generated from space name
- `dataset_repo` - Auto-generated
- `training_loss`, `validation_loss`, `perplexity` - From training results
### User Input Variables (Now Implemented)
- `author_name` - βœ… **Added user prompt**
- `model_description` - βœ… **Added user prompt**
### Conditional Variables
- `quantized_models` - Set automatically based on quantization choices
- `dataset_sample_size` - Set based on training configuration type
## Benefits of These Changes
### 1. **Better Attribution**
- Author names are properly captured and used in citations
- Model cards include proper attribution
### 2. **Customizable Descriptions**
- Users can provide custom model descriptions
- Better model documentation and discoverability
### 3. **Improved User Experience**
- No need to manually edit scripts
- Interactive prompts with helpful defaults
- Clear feedback on what information is being collected
### 4. **Consistent Documentation**
- All model cards will have proper author information
- Standardized model descriptions
- Better integration with Hugging Face Hub
## Future Enhancements
### Potential Additional User Inputs
1. **License Selection** - Allow users to choose model license
2. **Model Tags** - Custom tags for better discoverability
3. **Usage Examples** - Custom usage examples for specific use cases
4. **Limitations Description** - Custom limitations based on training data
### Template Improvements
1. **Dynamic License** - Support for different license types
2. **Custom Tags** - User-defined model tags
3. **Usage Scenarios** - Template sections for different use cases
## Testing
The changes have been tested to ensure:
- βœ… Author name is properly passed to model card generation
- βœ… Model description is properly passed to model card generation
- βœ… Default values work correctly
- βœ… Configuration summary displays new fields
- βœ… Model push script accepts new parameters
## Conclusion
The analysis identified that the model card template had two key variables (`author_name` and `model_description`) that would benefit from user input. These have been successfully implemented with:
1. **Interactive prompts** in the launch script
2. **Command line arguments** in the push script
3. **Proper integration** with the model card generator
4. **User-friendly defaults** and clear explanations
This improves the overall user experience and ensures that model cards have proper attribution and descriptions.