Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,8 @@ InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for M
|
|
9 |
|
10 |
[](https://arxiv.org/pdf/2505.13893)
|
11 |
[](https://github.com/Reallm-Labs/InfiGFusion/edit/main/README.md)
|
12 |
-
|
|
|
13 |
</h4>
|
14 |
|
15 |
**InfiGFusion** is the first structure-aware fusion framework for large language models that models semantic dependencies among logits using feature-level graphs. We introduce a novel Graph-on-Logits Distillation (GLD) loss that captures cross-dimension interactions via co-activation graphs and aligns them using an efficient, provable approximation of Gromov-Wasserstein distance (reducing complexity from O(n^4) to O(nlogn)). Our released **InfiGFusion-14B** model consistently shows better performance, achieving +35.6 on Multistep Arithmetic and +37.06 on Causal Judgement over SFT, demonstrating superior multi-step and complex logic inference.
|
|
|
9 |
|
10 |
[](https://arxiv.org/pdf/2505.13893)
|
11 |
[](https://github.com/Reallm-Labs/InfiGFusion/edit/main/README.md)
|
12 |
+
[](https://huggingface.co/papers/2505.13893)
|
13 |
+
|
14 |
</h4>
|
15 |
|
16 |
**InfiGFusion** is the first structure-aware fusion framework for large language models that models semantic dependencies among logits using feature-level graphs. We introduce a novel Graph-on-Logits Distillation (GLD) loss that captures cross-dimension interactions via co-activation graphs and aligns them using an efficient, provable approximation of Gromov-Wasserstein distance (reducing complexity from O(n^4) to O(nlogn)). Our released **InfiGFusion-14B** model consistently shows better performance, achieving +35.6 on Multistep Arithmetic and +37.06 on Causal Judgement over SFT, demonstrating superior multi-step and complex logic inference.
|