JoseRFJunior (Junior R F Junior)

updated a model 27 days ago

JoseRFJunior/EVA2

Updated 27 days ago

published a model 27 days ago

JoseRFJunior/EVA2

Updated 27 days ago

updated a model 27 days ago

JoseRFJunior/EVA

Updated 27 days ago

published a model 27 days ago

JoseRFJunior/EVA

Updated 27 days ago

published an article 28 days ago

Article

NietzscheDB: Multi-Manifold Architecture

28 days ago

published an article 28 days ago

Article

NietzscheLab — Autonomous Knowledge Evolution Engine

28 days ago

published an article 28 days ago

Article

NQL — Nietzsche Query Language

28 days ago

liked a model 3 months ago

fbononibelloepoch/malaria-detection

Object Detection • Updated Nov 23, 2024 • 19 • 1

reactedto AtAndDev's post with ❤️🤗 10 months ago

Post

3139

deepseek-ai/DeepSeek-R1-0528

This is the end

1 reply

·

updated 2 models 10 months ago

JoseRFJunior/ZetaNet

Updated May 31, 2025

JoseRFJunior/FractalBrainNet

Updated May 31, 2025

published a model 10 months ago

JoseRFJunior/FractalBrainNet

Updated May 31, 2025

reactedto their post with ❤️ over 1 year ago

Post

1737

JoseRFJunior/TransNAR
https://github.com/JoseRFJuniorLLMs/TransNAR
https://arxiv.org/html/2406.09308v1
TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.

posted an update over 1 year ago

Post

1737

JoseRFJunior/TransNAR
https://github.com/JoseRFJuniorLLMs/TransNAR
https://arxiv.org/html/2406.09308v1
TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.