File size: 3,083 Bytes
fbdb6f7
 
 
43b66f1
fbdb6f7
 
 
 
 
 
43b66f1
fbdb6f7
 
43b66f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ecc4eb
43b66f1
 
 
 
0ecc4eb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: DataHubHub
emoji: 
colorFrom: red
colorTo: indigo
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
license: apache-2.0
language: en
---

# ML Dataset & Code Generation Manager

A comprehensive platform for ML dataset management and code generation with Hugging Face integration.

## Features

- **Dataset Management**: Upload, explore, and manage machine learning datasets
- **Data Visualization**: Visualize dataset statistics and distributions
- **Code Generation**: Fine-tune models for code generation tasks
- **Code Quality Tools**: Improve code quality with integrated formatters, linters, and type checkers

## Technology Stack

- **Frontend**: Streamlit
- **Backend**: Python
- **Database**: SQLite (via SQLAlchemy)
- **ML Integration**: Hugging Face Transformers, Datasets
- **Visualization**: Plotly, Matplotlib

## Project Structure

```
.
├── app.py                     # Main application entry point
├── components/                # UI components
│   ├── code_quality.py        # Code quality tools
│   ├── dataset_preview.py     # Dataset preview component
│   ├── dataset_statistics.py  # Dataset statistics component
│   ├── dataset_uploader.py    # Dataset upload component
│   ├── dataset_validation.py  # Dataset validation component
│   ├── dataset_visualization.py # Dataset visualization component
│   └── fine_tuning/           # Fine-tuning components
│       ├── finetune_ui.py     # Fine-tuning UI
│       └── model_interface.py # Model interface
├── database/                  # Database configuration
│   ├── models.py              # Database models
│   └── operations.py          # Database operations
├── utils/                     # Utility functions
│   ├── dataset_utils.py       # Dataset utilities
│   ├── huggingface_integration.py # Hugging Face integration
│   └── smolagents_integration.py # SmolaAgents integration
└── assets/                    # Static assets
```

## Deployment

This application is designed to be deployed as a Hugging Face Space.

### Hugging Face Space Deployment

1. Fork this repository
2. Create a new Hugging Face Space
3. Connect the forked repository to your Space
4. The application will be deployed automatically

### Local Development

1. Clone the repository
2. Install dependencies:
   ```
   pip install streamlit pandas numpy plotly matplotlib scikit-learn SQLAlchemy huggingface-hub datasets transformers torch
   ```
3. Run the application:
   ```
   streamlit run app.py
   ```

## Configuration

- `.streamlit/config.toml`: Streamlit configuration
- `.streamlit/secrets.toml`: Secrets and API keys
- `huggingface-spacefile`: Hugging Face Space configuration

## API Keys

To use the Hugging Face integration features, add your Hugging Face API token to `.streamlit/secrets.toml`:

```toml
[huggingface]
hf_token = "HF_TOKEN"
```

## License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.