ybian-umd commited on
Commit
de54a33
·
verified ·
1 Parent(s): ad731b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -26,6 +26,59 @@ library_name: transformers
26
  > - **Fair Comparisons:** In rigorously controlled experiments, SDAR achieves **on-par general task performance** with strong AR baselines, ensuring credibility and reproducibility.
27
  > - **Superior Learning Efficiency:** On complex scientific reasoning tasks (e.g., GPQA, ChemBench, Physics), SDAR shows **clear gains over AR models** of the same scale, approaching or even exceeding leading closed-source systems.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  # Performance
30
 
31
  ### SDAR v.s. Qwen
 
26
  > - **Fair Comparisons:** In rigorously controlled experiments, SDAR achieves **on-par general task performance** with strong AR baselines, ensuring credibility and reproducibility.
27
  > - **Superior Learning Efficiency:** On complex scientific reasoning tasks (e.g., GPQA, ChemBench, Physics), SDAR shows **clear gains over AR models** of the same scale, approaching or even exceeding leading closed-source systems.
28
 
29
+ # Inference
30
+
31
+ ## Using the tailored inference engine [JetEngine](https://github.com/Labman42/JetEngine)
32
+
33
+ JetEngine enables more efficient inference compared to the built-in implementation.
34
+
35
+ ```bash
36
+ git clone https://github.com/Labman42/JetEngine.git
37
+ cd JetEngine
38
+ pip install .
39
+ ```
40
+
41
+ The following example shows how to quickly load a model with JetEngine and run a prompt end-to-end.
42
+
43
+ ```python
44
+ import os
45
+ from jetengine import LLM, SamplingParams
46
+ from transformers import AutoTokenizer
47
+
48
+ model_path = os.path.expanduser("/path/to/your/sdar-model")
49
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
50
+ # Initialize the LLM
51
+ llm = LLM(
52
+ model_path,
53
+ enforce_eager=True,
54
+ tensor_parallel_size=1,
55
+ mask_token_id=151669, # Optional: only needed for masked/diffusion models
56
+ block_length=4
57
+ )
58
+
59
+ # Set sampling/generation parameters
60
+ sampling_params = SamplingParams(
61
+ temperature=1.0,
62
+ topk=0,
63
+ topp=1.0,
64
+ max_tokens=256,
65
+ remasking_strategy="low_confidence_dynamic",
66
+ block_length=4,
67
+ denoising_steps=4,
68
+ dynamic_threshold=0.9
69
+ )
70
+
71
+ # Prepare a simple chat-style prompt
72
+ prompt = tokenizer.apply_chat_template(
73
+ [{"role": "user", "content": "Explain what reinforcement learning is in simple terms."}],
74
+ tokenize=False,
75
+ add_generation_prompt=True
76
+ )
77
+
78
+ # Generate text
79
+ outputs = llm.generate_streaming([prompt], sampling_params)
80
+ ```
81
+
82
  # Performance
83
 
84
  ### SDAR v.s. Qwen