yasserrmd
/

diffusion-text-demo

@@ -4,7 +4,120 @@ tags:
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 - pytorch_model_hub_mixin
 ---
+# Diffusion Text Demo Model
+A prototype **diffusion-based language model** implemented in PyTorch and trained on a subset of the [**TinyStories** dataset](https://huggingface.co/datasets/roneneldan/TinyStories).
+This model demonstrates iterative denoising for text generation, conditioned on an input prompt.
+---
+## Training Details
+* **Dataset:** 50,000 samples from [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
+* **Epochs:** 50
+* **Batch size:** 16
+* **Learning rate:** 1e-5
+* **Diffusion steps (T):** 10
+* **Tokenizer:** Naive whitespace (for demo purposes)
+---
+## 📉 Training Loss
+| Stage        | Start Loss | End Loss |
+| ------------ | ---------- | -------- |
+| Epochs 1–10  | 8.38       | 6.13     |
+| Epochs 11–20 | 6.12       | 6.04     |
+| Epochs 21–50 | 6.04       | 5.92     |
+**Final Loss (Epoch 50): 5.92**
+### Loss Curve
+<img src="diffusion_textmodel_loss.png" width="800" />
+---
+## Usage
+### Install Requirements
+```bash
+pip install torch huggingface_hub
+```
+### Load the Model
+```python
+import torch
+from modeling_diffusion import DiffusionTextModel
+# Load directly from Hub
+model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
+model.eval()
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+```
+### Inference with Prompt
+```python
+def generate_with_prompt(model, input_text, max_length, T=10):
+    model.eval()
+    input_tokens = input_text.split()
+    input_ids = [vocab.get(tok, mask_id) for tok in input_tokens]
+    seq = torch.full((1, max_length), mask_id, dtype=torch.long, device=device)
+    seq[0, :len(input_ids)] = torch.tensor(input_ids, device=device)
+    for step in range(T, 0, -1):
+        with torch.no_grad():
+            logits = model(seq, torch.tensor([step], device=device))
+            probs = torch.softmax(logits, dim=-1)
+            for pos in range(len(input_ids), max_length):
+                if seq[0, pos].item() == mask_id:
+                    seq[0, pos] = torch.multinomial(probs[0, pos], 1)
+    ids = seq[0].tolist()
+    if pad_id in ids:
+        ids = ids[:ids.index(pad_id)]
+    return " ".join(id_to_word[i] for i in ids)
+print(generate_with_prompt(model, "the cat", max_length=50))
+```
+---
+## Use in a Hugging Face Space
+```python
+import gradio as gr
+from modeling_diffusion import DiffusionTextModel
+model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
+model.eval()
+def infer(prompt):
+    return generate_with_prompt(model, prompt, max_length=50)
+gr.Interface(fn=infer, inputs="text", outputs="text").launch()
+```
+---
+## References
+This model was inspired by several works on diffusion for text:
+* Li et al. (2022) – [**Diffusion-LM Improves Controllable Text Generation**](https://arxiv.org/abs/2205.14217)
+* Austin et al. (2021) – [**Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM)**](https://arxiv.org/abs/2107.03006)
+* He et al. (2023) – [**DiffusionBERT: Improving Generative Masked Language Models with Diffusion**](https://arxiv.org/abs/2211.15029)
+* Gong et al. (2023) – [**DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models**](https://arxiv.org/abs/2211.11694)
+* Nie et al. (2025) – [**Large Language Diffusion Models (LLaDA)**](https://arxiv.org/abs/2501.04687)
+---
+⚠️ **Disclaimer:** This is a research prototype. Generations may not be coherent, since the model is trained with a simple tokenizer and on a limited dataset subset. For production-quality results, train longer with a subword tokenizer (e.g., GPT-2 BPE) and scale model size.
+---