yasserrmd commited on
Commit
bee16e6
Β·
verified Β·
1 Parent(s): bb81f15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -4
README.md CHANGED
@@ -4,7 +4,120 @@ tags:
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
+ # Diffusion Text Demo Model
8
+
9
+ A prototype **diffusion-based language model** implemented in PyTorch and trained on a subset of the [**TinyStories** dataset](https://huggingface.co/datasets/roneneldan/TinyStories).
10
+ This model demonstrates iterative denoising for text generation, conditioned on an input prompt.
11
+
12
+ ---
13
+
14
+ ## Training Details
15
+
16
+ * **Dataset:** 50,000 samples from [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
17
+ * **Epochs:** 50
18
+ * **Batch size:** 16
19
+ * **Learning rate:** 1e-5
20
+ * **Diffusion steps (T):** 10
21
+ * **Tokenizer:** Naive whitespace (for demo purposes)
22
+
23
+ ---
24
+
25
+ ## πŸ“‰ Training Loss
26
+
27
+ | Stage | Start Loss | End Loss |
28
+ | ------------ | ---------- | -------- |
29
+ | Epochs 1–10 | 8.38 | 6.13 |
30
+ | Epochs 11–20 | 6.12 | 6.04 |
31
+ | Epochs 21–50 | 6.04 | 5.92 |
32
+
33
+ **Final Loss (Epoch 50): 5.92**
34
+
35
+ ### Loss Curve
36
+
37
+ <img src="diffusion_textmodel_loss.png" width="800" />
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ### Install Requirements
44
+
45
+ ```bash
46
+ pip install torch huggingface_hub
47
+ ```
48
+
49
+ ### Load the Model
50
+
51
+ ```python
52
+ import torch
53
+ from modeling_diffusion import DiffusionTextModel
54
+
55
+ # Load directly from Hub
56
+ model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
57
+ model.eval()
58
+
59
+ device = "cuda" if torch.cuda.is_available() else "cpu"
60
+ model.to(device)
61
+ ```
62
+
63
+ ### Inference with Prompt
64
+
65
+ ```python
66
+ def generate_with_prompt(model, input_text, max_length, T=10):
67
+ model.eval()
68
+ input_tokens = input_text.split()
69
+ input_ids = [vocab.get(tok, mask_id) for tok in input_tokens]
70
+
71
+ seq = torch.full((1, max_length), mask_id, dtype=torch.long, device=device)
72
+ seq[0, :len(input_ids)] = torch.tensor(input_ids, device=device)
73
+
74
+ for step in range(T, 0, -1):
75
+ with torch.no_grad():
76
+ logits = model(seq, torch.tensor([step], device=device))
77
+ probs = torch.softmax(logits, dim=-1)
78
+ for pos in range(len(input_ids), max_length):
79
+ if seq[0, pos].item() == mask_id:
80
+ seq[0, pos] = torch.multinomial(probs[0, pos], 1)
81
+
82
+ ids = seq[0].tolist()
83
+ if pad_id in ids:
84
+ ids = ids[:ids.index(pad_id)]
85
+ return " ".join(id_to_word[i] for i in ids)
86
+
87
+ print(generate_with_prompt(model, "the cat", max_length=50))
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Use in a Hugging Face Space
93
+
94
+ ```python
95
+ import gradio as gr
96
+ from modeling_diffusion import DiffusionTextModel
97
+
98
+ model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
99
+ model.eval()
100
+
101
+ def infer(prompt):
102
+ return generate_with_prompt(model, prompt, max_length=50)
103
+
104
+ gr.Interface(fn=infer, inputs="text", outputs="text").launch()
105
+ ```
106
+
107
+ ---
108
+
109
+ ## References
110
+
111
+ This model was inspired by several works on diffusion for text:
112
+
113
+ * Li et al. (2022) – [**Diffusion-LM Improves Controllable Text Generation**](https://arxiv.org/abs/2205.14217)
114
+ * Austin et al. (2021) – [**Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM)**](https://arxiv.org/abs/2107.03006)
115
+ * He et al. (2023) – [**DiffusionBERT: Improving Generative Masked Language Models with Diffusion**](https://arxiv.org/abs/2211.15029)
116
+ * Gong et al. (2023) – [**DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models**](https://arxiv.org/abs/2211.11694)
117
+ * Nie et al. (2025) – [**Large Language Diffusion Models (LLaDA)**](https://arxiv.org/abs/2501.04687)
118
+
119
+ ---
120
+
121
+ ⚠️ **Disclaimer:** This is a research prototype. Generations may not be coherent, since the model is trained with a simple tokenizer and on a limited dataset subset. For production-quality results, train longer with a subword tokenizer (e.g., GPT-2 BPE) and scale model size.
122
+
123
+ ---