Update README.md
Browse files
README.md
CHANGED
@@ -3,9 +3,11 @@ library_name: transformers
|
|
3 |
tags:
|
4 |
- trl
|
5 |
- sft
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
# Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers: Sparse-GIN model
|
9 |
|
10 |
We present an approach to enhancing Transformer architectures by integrating graph-aware relational reasoning into their attention mechanisms. Building on the inherent connection between attention and graph theory, we reformulate the Transformer’s attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach improves the model’s ability to capture complex dependencies and generalize across tasks, as evidenced by a reduced generalization gap and improved learning performance.
|
11 |
|
|
|
3 |
tags:
|
4 |
- trl
|
5 |
- sft
|
6 |
+
datasets:
|
7 |
+
- lamm-mit/mlabonne-orca-math-word-problems-80k
|
8 |
---
|
9 |
|
10 |
+
# Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers: Sparse-GIN model for math word problems
|
11 |
|
12 |
We present an approach to enhancing Transformer architectures by integrating graph-aware relational reasoning into their attention mechanisms. Building on the inherent connection between attention and graph theory, we reformulate the Transformer’s attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach improves the model’s ability to capture complex dependencies and generalize across tasks, as evidenced by a reduced generalization gap and improved learning performance.
|
13 |
|