Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
prithivMLmodsΒ
posted an update
1 day ago
nroggendorffΒ
posted an update
about 13 hours ago
singhsidhukuldeepΒ
posted an update
2 days ago
Post
3210
Exciting breakthrough in AI:
@Meta
's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!
The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:
>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.
Three-Component Architecture:
β’ Lightweight Local Encoder that converts bytes to patch representations
β’ Powerful Global Latent Transformer that processes patches
β’ Local Decoder that converts patches back to bytes
>> Technical Advantages
β’ Matches performance of Llama 3 at 8B parameters while being more efficient
β’ Superior handling of non-English languages and rare character sequences
β’ Remarkable 99.9% accuracy on spelling tasks
β’ Better scaling properties than token-based models
>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.
This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!
The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:
>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.
Three-Component Architecture:
β’ Lightweight Local Encoder that converts bytes to patch representations
β’ Powerful Global Latent Transformer that processes patches
β’ Local Decoder that converts patches back to bytes
>> Technical Advantages
β’ Matches performance of Llama 3 at 8B parameters while being more efficient
β’ Superior handling of non-English languages and rare character sequences
β’ Remarkable 99.9% accuracy on spelling tasks
β’ Better scaling properties than token-based models
>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.
This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!
Post
466
**15 Agentic Systems and Frameworks of 2024**
This year, we started our βAI Agents and Agentic Workflowsβ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them.
The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs β we will be covering that for you. Now, letβs review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024:
1. GUI Agents: A Survey (2412.13501)
2. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2411.03562)
3. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)
4. MALT: Improving Reasoning with Multi-Agent LLM Training (2412.01928)
5. Agent S: An Open Agentic Framework that Uses Computers Like a Human (2410.08164)
6. Automated Design of Agentic Systems (2408.08435)
7. AgentInstruct: Toward Generative Teaching with Agentic Flows (2407.03502)
8. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant (2410.18603)
9. WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents (2410.07484)
10. Generative Agent Simulations of 1,000 People (2411.10109)
11. DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747)
12. PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking (2410.12375)
13. Generative World Explorer (2411.11844)
14. Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines (2412.14684)
15. AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions (2410.20424)
Thanks for reading Turing Post!
Subscribe to receive new posts straight into your inbox -> https://www.turingpost.com/subscribe
This year, we started our βAI Agents and Agentic Workflowsβ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them.
The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs β we will be covering that for you. Now, letβs review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024:
1. GUI Agents: A Survey (2412.13501)
2. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2411.03562)
3. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)
4. MALT: Improving Reasoning with Multi-Agent LLM Training (2412.01928)
5. Agent S: An Open Agentic Framework that Uses Computers Like a Human (2410.08164)
6. Automated Design of Agentic Systems (2408.08435)
7. AgentInstruct: Toward Generative Teaching with Agentic Flows (2407.03502)
8. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant (2410.18603)
9. WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents (2410.07484)
10. Generative Agent Simulations of 1,000 People (2411.10109)
11. DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747)
12. PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking (2410.12375)
13. Generative World Explorer (2411.11844)
14. Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines (2412.14684)
15. AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions (2410.20424)
Thanks for reading Turing Post!
Subscribe to receive new posts straight into your inbox -> https://www.turingpost.com/subscribe
Post
2853
Last Week in Medical AI: Top Research Papers/Models π₯
π (December 15 β December 21, 2024)
Medical LLM & Other Models
- MedMax: Mixed-Modal Biomedical Assistant
- Advanced multimodal instruction tuning
- Enhanced biomedical knowledge integration
- Comprehensive assistant capabilities
- MGH Radiology Llama 70B
- Specialized radiology focus
- State-of-the-art performance
- Enhanced report generation capabilities
- HC-LLM: Historical Radiology Reports
- Context-aware report generation
- Historical data integration
- Improved accuracy in diagnostics
Frameworks & Methods
- ReflecTool: Reflection-Aware Clinical Agents
- Process-Supervised Clinical Notes
- Federated Learning with RAG
- Query Pipeline Optimization
Benchmarks & Evaluations
- Multi-OphthaLingua
- Multilingual ophthalmology benchmark
- Focus on LMICs healthcare
- Bias assessment framework
- ACE-M3 Evaluation Framework
- Multimodal medical model testing
- Comprehensive capability assessment
- Standardized evaluation metrics
LLM Applications
- Patient-Friendly Video Reports
- Medical Video QA Systems
- Gene Ontology Annotation
- Healthcare Recommendations
Special Focus: Medical Ethics & AI
- Clinical Trust Impact Study
- Mental Health AI Challenges
- Hospital Monitoring Ethics
- Radiology AI Integration
Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!
- Full thread in detail:
https://x.com/OpenlifesciAI/status/1870504774162063760
- Youtube Link: youtu.be/SbFp4fnuxjo
- Spotify: https://t.co/QPmdrXuWP9
π (December 15 β December 21, 2024)
Medical LLM & Other Models
- MedMax: Mixed-Modal Biomedical Assistant
- Advanced multimodal instruction tuning
- Enhanced biomedical knowledge integration
- Comprehensive assistant capabilities
- MGH Radiology Llama 70B
- Specialized radiology focus
- State-of-the-art performance
- Enhanced report generation capabilities
- HC-LLM: Historical Radiology Reports
- Context-aware report generation
- Historical data integration
- Improved accuracy in diagnostics
Frameworks & Methods
- ReflecTool: Reflection-Aware Clinical Agents
- Process-Supervised Clinical Notes
- Federated Learning with RAG
- Query Pipeline Optimization
Benchmarks & Evaluations
- Multi-OphthaLingua
- Multilingual ophthalmology benchmark
- Focus on LMICs healthcare
- Bias assessment framework
- ACE-M3 Evaluation Framework
- Multimodal medical model testing
- Comprehensive capability assessment
- Standardized evaluation metrics
LLM Applications
- Patient-Friendly Video Reports
- Medical Video QA Systems
- Gene Ontology Annotation
- Healthcare Recommendations
Special Focus: Medical Ethics & AI
- Clinical Trust Impact Study
- Mental Health AI Challenges
- Hospital Monitoring Ethics
- Radiology AI Integration
Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!
- Full thread in detail:
https://x.com/OpenlifesciAI/status/1870504774162063760
- Youtube Link: youtu.be/SbFp4fnuxjo
- Spotify: https://t.co/QPmdrXuWP9
luigi12345Β
posted an update
1 day ago
Post
1909
PERFECT FINAL PROMPT for Coding and Debugging.
Step 1: Generate the prompt that if sent to you will make you adjust the script so it meets each and every of the criteria it needs to meet to be 100% bug free and perfect.
Step 2: adjust the script following the steps and instructions in the prompt created in Step 1.
Post
1300
π’ So far I noticed that π§ reasoning with llm π€ in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.π΄ Third-party framework installation
2.π΄ Text chunking
3.π΄ support of meta-annotation like spans / objects / etc.
π To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. π
β https://github.com/nicolay-r/bulk-translate
bulk-translate is a tiny Python π no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.
It supports π¨βπ» API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python π
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
π https://github.com/nicolay-r/bulk-translate/wiki
All you have to do is to provide iterator of texts, where each text:
1. β String object
2. β List of strings and nested lists that represent spans (value + any ID data).
π€ By default I provide a wrapper over googletrans which you can override with your own π₯
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.π΄ Third-party framework installation
2.π΄ Text chunking
3.π΄ support of meta-annotation like spans / objects / etc.
π To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. π
β https://github.com/nicolay-r/bulk-translate
bulk-translate is a tiny Python π no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.
It supports π¨βπ» API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python π
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
π https://github.com/nicolay-r/bulk-translate/wiki
All you have to do is to provide iterator of texts, where each text:
1. β String object
2. β List of strings and nested lists that represent spans (value + any ID data).
π€ By default I provide a wrapper over googletrans which you can override with your own π₯
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
ehristoforuΒ
posted an update
1 day ago
Post
2039
βοΈ Ultraset - all-in-one dataset for SFT training in Alpaca format.
fluently-sets/ultraset
β Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.
π€― Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.
π€ For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.
βοΈ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.
fluently-sets/ultraset
β Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.
π€― Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.
π€ For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.
βοΈ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.
Post
1185
Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
fuzzy-mittenzΒ
posted an update
1 day ago
Post
618
So a cool thing happened,
Nomic/GPT4ALL released a "Reasoning/Thinking"(QwQ/o1/o3 type) Model using JavaScript functions to calculate things like the haversine function for distance between two places and so on, it's VERY cool the complex calculative/recursive AI in such a small package..
I was able to adapt their methods to one of my small models "Replicant" 2gb and created a new model with importance matrix Quantization using "THE_KEY" Dataset for better inference in the coding model I pulled from Whiterabbitneo's Qwen2.5 model... I give you Reasoning Rabbit.. enjoy
IntelligentEstate/o3-ReasoningRabbit_Q2.5-Cd-7B-IQ4_XS-GGUF
-IntelligentEstate/o3-ReasoningRabbit_Q2.5-Cd-7B-IQ4_XS-GGUF
IntelligentEstate/Replicant_Warder-o3-Q2.5_3B-iQ5_K_S-GGUF
IntelligentEstate/Replicant_Warder-o3-Q2.5_3B-iQ5_K_S-GGUF
-WhiteRabbitNeo/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B
Nomic/GPT4ALL released a "Reasoning/Thinking"(QwQ/o1/o3 type) Model using JavaScript functions to calculate things like the haversine function for distance between two places and so on, it's VERY cool the complex calculative/recursive AI in such a small package..
I was able to adapt their methods to one of my small models "Replicant" 2gb and created a new model with importance matrix Quantization using "THE_KEY" Dataset for better inference in the coding model I pulled from Whiterabbitneo's Qwen2.5 model... I give you Reasoning Rabbit.. enjoy
IntelligentEstate/o3-ReasoningRabbit_Q2.5-Cd-7B-IQ4_XS-GGUF
-IntelligentEstate/o3-ReasoningRabbit_Q2.5-Cd-7B-IQ4_XS-GGUF
IntelligentEstate/Replicant_Warder-o3-Q2.5_3B-iQ5_K_S-GGUF
IntelligentEstate/Replicant_Warder-o3-Q2.5_3B-iQ5_K_S-GGUF
-WhiteRabbitNeo/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B