File size: 8,572 Bytes
39db0ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# Model Card User Input Analysis

## Overview

This document analyzes the interaction between the model card template (`templates/model_card.md`), the model card generator (`scripts/model_tonic/generate_model_card.py`), and the launch script (`launch.sh`) to identify variables that require user input and improve the user experience.

## Template Variables Analysis

### Variables in `templates/model_card.md`

The model card template uses the following variables that can be populated with user input:

#### Core Model Information
- `{{model_name}}` - Display name of the model
- `{{model_description}}` - Brief description of the model
- `{{repo_name}}` - Hugging Face repository name
- `{{base_model}}` - Base model used for fine-tuning

#### Training Configuration
- `{{training_config_type}}` - Type of training configuration used
- `{{trainer_type}}` - Type of trainer (SFT, DPO, etc.)
- `{{batch_size}}` - Training batch size
- `{{gradient_accumulation_steps}}` - Gradient accumulation steps
- `{{learning_rate}}` - Learning rate used
- `{{max_epochs}}` - Maximum number of epochs
- `{{max_seq_length}}` - Maximum sequence length

#### Dataset Information
- `{{dataset_name}}` - Name of the dataset used
- `{{dataset_size}}` - Size of the dataset
- `{{dataset_format}}` - Format of the dataset
- `{{dataset_sample_size}}` - Sample size (for lightweight configs)

#### Training Results
- `{{training_loss}}` - Final training loss
- `{{validation_loss}}` - Final validation loss
- `{{perplexity}}` - Model perplexity

#### Infrastructure
- `{{hardware_info}}` - Hardware used for training
- `{{experiment_name}}` - Name of the experiment
- `{{trackio_url}}` - Trackio monitoring URL
- `{{dataset_repo}}` - HF Dataset repository

#### Author Information
- `{{author_name}}` - Author name for citations and attribution
- `{{model_name_slug}}` - URL-friendly model name

#### Quantization
- `{{quantized_models}}` - Boolean indicating if quantized models exist

## User Input Requirements

### Previously Missing User Inputs

#### 1. **Author Name** (`author_name`)
- **Purpose**: Used in model card metadata and citations
- **Template Usage**: `{{#if author_name}}author: {{author_name}}{{/if}}`
- **Citation Usage**: `author={{{author_name}}}`
- **Default**: "Your Name"
- **User Input Added**: βœ… **IMPLEMENTED**

#### 2. **Model Description** (`model_description`)
- **Purpose**: Brief description of the model's capabilities
- **Template Usage**: `{{model_description}}`
- **Default**: "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities."
- **User Input Added**: βœ… **IMPLEMENTED**

### Variables That Don't Need User Input

Most variables are automatically populated from:
- **Training Configuration**: Batch size, learning rate, epochs, etc.
- **System Detection**: Hardware info, model size, etc.
- **Auto-Generation**: Repository names, experiment names, etc.
- **Training Results**: Loss values, perplexity, etc.

## Implementation Changes

### 1. Launch Script Updates (`launch.sh`)

#### Added User Input Prompts
```bash
# Step 8.2: Author Information for Model Card
print_step "Step 8.2: Author Information"
echo "================================="

print_info "This information will be used in the model card and citation."
get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME

print_info "Model description will be used in the model card and repository."
get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities." MODEL_DESCRIPTION
```

#### Updated Configuration Summary
```bash
echo "  Author: $AUTHOR_NAME"
```

#### Updated Model Push Call
```bash
python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
    --token "$HF_TOKEN" \
    --trackio-url "$TRACKIO_URL" \
    --experiment-name "$EXPERIMENT_NAME" \
    --dataset-repo "$TRACKIO_DATASET_REPO" \
    --author-name "$AUTHOR_NAME" \
    --model-description "$MODEL_DESCRIPTION"
```

### 2. Push Script Updates (`scripts/model_tonic/push_to_huggingface.py`)

#### Added Command Line Arguments
```python
parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
```

#### Updated Class Constructor
```python
def __init__(
    self,
    model_path: str,
    repo_name: str,
    token: Optional[str] = None,
    private: bool = False,
    trackio_url: Optional[str] = None,
    experiment_name: Optional[str] = None,
    dataset_repo: Optional[str] = None,
    hf_token: Optional[str] = None,
    author_name: Optional[str] = None,
    model_description: Optional[str] = None
):
```

#### Updated Model Card Generation
```python
variables = {
    "model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
    "model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
    # ... other variables
    "author_name": self.author_name or training_config.get('author_name', 'Your Name'),
}
```

## User Experience Improvements

### 1. **Interactive Prompts**
- Users are now prompted for author name and model description
- Default values are provided for convenience
- Clear explanations of what each field is used for

### 2. **Configuration Summary**
- Author name is now displayed in the configuration summary
- Users can review all settings before proceeding

### 3. **Automatic Integration**
- User inputs are automatically passed to the model card generation
- No manual editing of scripts required

## Template Variable Categories

### Automatic Variables (No User Input Needed)
- `repo_name` - Auto-generated from username and date
- `base_model` - Always "HuggingFaceTB/SmolLM3-3B"
- `training_config_type` - From user selection
- `trainer_type` - From user selection
- `batch_size`, `learning_rate`, `max_epochs` - From training config
- `hardware_info` - Auto-detected
- `experiment_name` - Auto-generated with timestamp
- `trackio_url` - Auto-generated from space name
- `dataset_repo` - Auto-generated
- `training_loss`, `validation_loss`, `perplexity` - From training results

### User Input Variables (Now Implemented)
- `author_name` - βœ… **Added user prompt**
- `model_description` - βœ… **Added user prompt**

### Conditional Variables
- `quantized_models` - Set automatically based on quantization choices
- `dataset_sample_size` - Set based on training configuration type

## Benefits of These Changes

### 1. **Better Attribution**
- Author names are properly captured and used in citations
- Model cards include proper attribution

### 2. **Customizable Descriptions**
- Users can provide custom model descriptions
- Better model documentation and discoverability

### 3. **Improved User Experience**
- No need to manually edit scripts
- Interactive prompts with helpful defaults
- Clear feedback on what information is being collected

### 4. **Consistent Documentation**
- All model cards will have proper author information
- Standardized model descriptions
- Better integration with Hugging Face Hub

## Future Enhancements

### Potential Additional User Inputs
1. **License Selection** - Allow users to choose model license
2. **Model Tags** - Custom tags for better discoverability
3. **Usage Examples** - Custom usage examples for specific use cases
4. **Limitations Description** - Custom limitations based on training data

### Template Improvements
1. **Dynamic License** - Support for different license types
2. **Custom Tags** - User-defined model tags
3. **Usage Scenarios** - Template sections for different use cases

## Testing

The changes have been tested to ensure:
- βœ… Author name is properly passed to model card generation
- βœ… Model description is properly passed to model card generation
- βœ… Default values work correctly
- βœ… Configuration summary displays new fields
- βœ… Model push script accepts new parameters

## Conclusion

The analysis identified that the model card template had two key variables (`author_name` and `model_description`) that would benefit from user input. These have been successfully implemented with:

1. **Interactive prompts** in the launch script
2. **Command line arguments** in the push script
3. **Proper integration** with the model card generator
4. **User-friendly defaults** and clear explanations

This improves the overall user experience and ensures that model cards have proper attribution and descriptions.