lamm-mit
/

Llama-3.2-3B-Instruct-Sparse-GIN-orca-math-word-problems

text-generation-inference

Model card Files Files and versions

mjbuehler commited on Jan 7

Commit

993a0ea

·

verified ·

1 Parent(s): 081e0a6

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -3,9 +3,11 @@ library_name: transformers
 tags:
 - trl
 - sft
 ---
-# Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers: Sparse-GIN model
 We present an approach to enhancing Transformer architectures by integrating graph-aware relational reasoning into their attention mechanisms. Building on the inherent connection between attention and graph theory, we reformulate the Transformer’s attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach improves the model’s ability to capture complex dependencies and generalize across tasks, as evidenced by a reduced generalization gap and improved learning performance.

 tags:
 - trl
 - sft
+datasets:
+- lamm-mit/mlabonne-orca-math-word-problems-80k
 ---
+# Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers: Sparse-GIN model for math word problems
 We present an approach to enhancing Transformer architectures by integrating graph-aware relational reasoning into their attention mechanisms. Building on the inherent connection between attention and graph theory, we reformulate the Transformer’s attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach improves the model’s ability to capture complex dependencies and generalize across tasks, as evidenced by a reduced generalization gap and improved learning performance.