SSW

ssml2050

AI & ML interests

NATURAL LANGUAGE PROCESSING-,GENERATIVE AI

Recent Activity

Organizations

Social Post Explorers's profile picture

ssml2050's activity

reacted to fdaudens's post with πŸ‘ 12 months ago
view post
Post
2480
A new dataset for anyone interested in Satellite imagery: 3 million @Satellogic images of unique locations β€” 6 million images, including location revisits β€” from around the world under a Creative Commons CC-BY 4.0 license.

Interesting potential in journalism.

satellogic/EarthView
reacted to di-zhang-fdu's post with πŸ‘ 12 months ago
view post
Post
2391
ChemLLM-20B SFT and DPO is coming!πŸ€—
  • 1 reply
Β·
reacted to Jaward's post with πŸ‘ 12 months ago
view post
Post
2456
# Thoughts on Neural Scaling Laws
When you take a zoomed-out perspective view on the success goals of neural networks, you see they all revolve around the Scaling Laws - empirical observations that performance improves with increased model size, dataset, and compute resources.

The specifics of how these laws apply, vary for different modalities and architectures. This is notable in the empirical equations used to measure these laws.

Yet they all heavily rely on three main factors - Data, Size and Computation. These factors themselves also have sub-dependencies - data size & quality, model size & architecture, num of GPUs & code for compute kernels respectively.

As research in these laws progresses, we begin to see new scaling laws emerge that may apply in much different ways than usual. This is typical in recent local LLMs (Phi-3, Gemma 2B, LLMs in a flash) which shows small sized models with small rich quality data beating large models

I look forward to the singularity moment - when these laws take a full round spin and meet at where it all began:)

References:
- Scaling Laws for Neural Language Models: https://arxiv.org/pdf/2001.08361
- Scaling Laws for Autoregressive Generative Modeling: https://arxiv.org/abs/2010.14701
- LLMs in a flash: https://arxiv.org/abs/2312.11514
- Phi-3 Technical Report: https://arxiv.org/abs/2404.14219
- Gemma 2B: https://arxiv.org/pdf/2403.08295
reacted to ayush-thakur02's post with πŸ‘ 12 months ago
view post
Post
2929
Enhancing Distributed Systems with Self-Healing Nodes and Adaptive Data Sharding

Paper: Self-healing Nodes with Adaptive Data-Sharding (2405.00004)

The paper introduces an innovative approach to improve distributed systems by integrating self-healing nodes with adaptive data sharding. This method leverages advanced concepts like self-replication, fractal regeneration, and predictive sharding to enhance scalability, performance, fault tolerance, and adaptability.

Key Concepts:
- Self-Replication: Nodes can create copies of themselves or their data to aid in recovery and load balancing.
- Fractal Regeneration: Nodes can reconfigure and restore their functionality after partial damage, inspired by natural fractals.
- Predictive Sharding: Nodes can anticipate future data trends and proactively adjust data distribution to optimize performance.

Methodology:
The approach consists of four main steps:
- Temporal data sharding based on data's temporal characteristics.
- Self-replicating nodes to enhance data availability and reliability.
- Fractal regeneration for robust recovery mechanisms.
- Predictive sharding using consistent hashing to anticipate and adapt to future data trends.

Results and Analysis:
Experimental evaluations show that this approach outperforms existing data sharding techniques in scalability, performance, fault tolerance, and adaptability. The use of synthetic data and workload generators created realistic scenarios for testing.

Applications:
The methodology can be applied to various domains such as distributed database systems, blockchain networks, IoT, and cloud computing, offering improvements in data distribution efficiency and system resilience.
reacted to phenixrhyder's post with πŸ‘ 12 months ago
view post
Post
3207
Midjourney Ai
  • 3 replies
Β·
reacted to Jaward's post with πŸ‘ about 1 year ago
view post
Post
4521
This is the closest I’ve seen of a scalable AI/LLM Operating System - it has all the major ingredients of a feasible AI OS 1 architecture:

- Extends classical OS functionalities with an LLM Kernel.
- Multi agent-centric approach.
- Optimized resource allocation system that allows for LLM-based tasks and Classical OS tasks to coexist.
- An Agent Scheduler that can perform classical os operations (FIFO, RR).
- A Context Manager to improve alignment.
- Lazy Memory Manager for agents (ensures data is stored and accessible only while the agent is active)
- An Enhanced security module for the AI-driven environment.

It does hit all checkpoints, doesn’t it? An upscale version of @karpathy ’s.

Code: https://github.com/agiresearch/AIOS
Β·
reacted to DmitryRyumin's post with πŸ‘ about 1 year ago
view post
Post
1758
πŸš€πŸŽ­πŸŒŸ New Research Alert! 🌟 πŸŽ­πŸš€
πŸ“„ Title: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation πŸ”

πŸ“ Description: AniPortrait is a novel framework for generating photorealistic portrait animations driven by audio and a reference image, with superior facial naturalness, pose variety, and visual quality, with potential applications in facial motion editing and facial reenactment.

πŸ‘₯ Authors: Huawei Wei, @ZJYang , Zhisheng Wang

πŸ”— Paper: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation (2403.17694)

πŸ“ Repository: https://github.com/Zejun-Yang/AniPortrait

πŸ€— Demo: ZJYang/AniPortrait_official
πŸ”₯ Model πŸ€–: ZJYang/AniPortrait

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

πŸ” Keywords: #AniPortrait #Animation #AudioDriven #Photorealistic #FacialAnimation #DeepLearning #Innovation
  • 2 replies
Β·
reacted to osanseviero's post with πŸ‘ about 1 year ago
view post
Post
2097
Diaries of Open Source. Part 11 πŸš€

πŸš€Databricks release DBRX, potentially the best open access model! A 132B Mixture of Experts with 36B active params and trained on 12 trillion tokens
Blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Base and instruct models: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: https://hf.co/spaces/databricks/dbrx-instruct

🀏1-bit and 2-bit quantization exploration using HQQ+
Blog post: https://mobiusml.github.io/1bit_blog/
Models: https://hf.co/collections/mobiuslabsgmbh/llama2-7b-hqq-6604257a96fc8b9c4e13e0fe
GitHub: https://github.com/mobiusml/hqq

πŸ“šCosmopedia: a large-scale synthetic dataset for pre-training - it includes 25 billion tokens and 30 million files
Dataset: HuggingFaceTB/cosmopedia
Blog: https://hf.co/blog/cosmopedia

⭐Mini-Gemini: multi-modal VLMs, from 2B to 34B
Models: https://hf.co/collections/YanweiLi/mini-gemini-6603c50b9b43d044171d0854
Paper: Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models (2403.18814)
GitHub: https://github.com/dvlab-research/MiniGemini

πŸ”₯VILA - On Pre-training for VLMs
Models: Efficient-Large-Model/vila-on-pre-training-for-visual-language-models-65d8022a3a52cd9bcd62698e
Paper: VILA: On Pre-training for Visual Language Models (2312.07533)

Misc
πŸ‘€ FeatUp: a framework for image features at any resolution: mhamilton723/FeatUp FeatUp: A Model-Agnostic Framework for Features at Any Resolution (2403.10516)
🍞ColBERTus Maxiums, a colbertialized embedding model mixedbread-ai/mxbai-colbert-large-v1
πŸ–ŒοΈSemantic Palette, a new drawing paradigm ironjr/SemanticPalette
πŸ§‘β€βš•οΈHistoGPT, a vision model that generates accurate pathology reports marr-peng-lab/histogpt https://www.medrxiv.org/content/10.1101/2024.03.15.24304211v1
Β·
reacted to BramVanroy's post with πŸ‘ about 1 year ago
view post
Post
2458
🎈 LLM Benchmarks Update!

**tl;dr: do not depend on benchmark leaderboards to choose your "chatbot" model! (Especially for non-English languages.)**

First of all, I'm discontinuing the Open #Dutch #LLM Leaderboard (https://lnkd.in/eFnsaFR6). It will stay online for now, but I urge the use of the ScandEval leaderboard instead (https://scandeval.com/dutch-nlg/) by @saattrupdan . It contains more tasks, has better reproducibility and statistics (CI) and a flexible back-end library (scandeval) to run your own benchmarks with. As part of project "Leesplank" (with Michiel Buisman and Maarten Lens-FitzGerald) we recently added GPT-4-1106-preview scores to add a good "target" to the leaderboard.

An important note here is that benchmark leaderboards are not a golden truth. Especially evaluating generative models is hard. You run into issues like prompt engineering (and sensitivity of models to one or other prompt), structured output generation, and - quite simply - "how to automatically evaluate open-ended generation".

πŸ’‘ Another important but under-discussed facet is the discrepancy between models' capability of understanding vs. generating *in different languages* (so the NLU part of NLG benchmarking). In other words: some of the listed models score really well on, e.g., MCQ benchmarks but are not suitable to use as DUTCH chat bots. Interestingly, some of these models seem to understand questions in Dutch and are able to pick the right answer (because they have good knowledge or reasoning skills), but generating fluent and grammatical Dutch is something else entirely! This is perhaps also true for humans: it's easier to sort-of grasp the meaning of a new language and answer with "Yes" or "No", but answering fluently in the language is much harder! Yet, your language production fluency does not necessarily say anything about your knowledge and reasoning skills.

Hopefully we can get a chat arena for Dutch some day - user feedback is the most powerful metric!
Β·
reacted to JustinLin610's post with πŸ‘ about 1 year ago
view post
Post
4452
Just now, we release a small MoE model, Qwen1.5-MoE-A2.7B, a 14B model with 2.7B activated parameters. Leaving the hype, I would love to share more things here in HF. But if you don't know much about this, check our blog for more info: https://qwenlm.github.io/blog/qwen-moe/

At the beginning, it was trying with the MoE stuff, making Megatron work well with MegaBlocks. As always, we worked with small ones first. However, we have been struggling with a lot of details.

With megablocks and so many tricks that make training MoE models work, it is almost impossible to fail. The challenge is actually how good your model is. Then things became more complex than I had expected. Finegrained experts actually pissed me off but damn it works for the model at this scale. However, it brings complexity to the model, and this is somehow why at this moment our codes are not merged into llama.cpp cuz it really brings problems. Shared experts might be good, but we need more engineering efforts to really unleash its benefits in inference acceleration.

For the community, this is actually our first time releasing an MoE model. We don't know what will happen to us, but we are prepared for complaints. I just hope that we can really make things clear, and provide a good recipe to play with our MoE model just like people playing with Mixtral.
  • 1 reply
Β·
reacted to thomwolf's post with πŸ‘ about 1 year ago
view post
Post
5302
A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.
  • 2 replies
Β·
reacted to vladbogo's post with πŸ‘ about 1 year ago
view post
Post
1710
A new paper titled "Long-Form Factuality in Large Language Models" proposes a new approach to evaluate the long-form factuality of large language models using an AI agent! They introduce SAFE (Search-Augmented Factuality Evaluator) which leverages an LLM to break down responses into individual facts, query Google to verify each fact, and perform multi-step reasoning.

Keypoints:
* SAFE (Search-Augmented Factuality Evaluator) is an automated method using an LLM agent to evaluate factuality
* It also introduces LongFact, a 2,280 prompt set spanning 38 topics to test open-domain factual knowledge
* SAFE achieves a 72% humans agreement while being 20x cheaper. It also wins 76% of the disagreements measured on a small scale experiment where a more thorough human procedure (researchers + full internet search) was used.
* Larger models like GPT-4, Claude Opus and Gemini Ultra tend to exhibit better long-form factuality.

Paper: Long-form factuality in large language models (2403.18802)
Code and data: https://github.com/google-deepmind/long-form-factuality

Congrats to the authors for their work!