AI & ML interests

None defined yet.

Recent Activity

C4AI-Community's activity

mmhamdy 
posted an update 4 days ago
view post
Post
1475
What inspired the Transformer architecture in the "Attention Is All You Need" paper? And how were various ideas combined to create this groundbreaking model?

In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer.

💡 Examples of ideas explored in the article:

✅ What was the inspiration for the attention mechanism?
✅ How did we go from attention to self-attention?
✅ Did the team have any other names in mind for the model?

and more...

I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates.

Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story
prithivMLmods 
posted an update 4 days ago
view post
Post
1615
Luna, the single-speaker text-to-speech model, features a Radio & Atcosim-style sound with a female voice. It offers authentic radio podcast noise and empathetic speech generation, fine-tuned based on Orpheus's Llama-based speech generation state-of-the-art model. 🎙️

+ Model : prithivMLmods/Llama-3B-Mono-Luna
+ Collection : prithivMLmods/clean-radio-mono-voice-67e76fe1b3a87cc3bccef803
+ Reference ft : https://github.com/canopyai/Orpheus-TTS
+ Base Model : canopylabs/orpheus-3b-0.1-ft

I also tried some other clean-voice single-speaker models based on Orpheus. If you're interested, check out the collection.

🔉Try the Mono Luna demo here: http://colab.research.google.com/drive/1K0AAIOKDE5XE0znxXaiiUJvPSpFveteK
·
prithivMLmods 
posted an update 7 days ago
view post
Post
1652
Dropping some new Journey Art and Realism adapters for Flux.1-Dev, including Thematic Arts, 2021 Memory Adapters, Thread of Art, Black of Art, and more. For more details, visit the model card on Stranger Zone HF 🤗

+ Black-of-Art-Flux : strangerzonehf/Black-of-Art-Flux
+ Thread-of-Art-Flux : strangerzonehf/Thread-of-Art-Flux
+ 2021-Art-Flux : strangerzonehf/2021-Art-Flux
+ 3d-Station-Toon : strangerzonehf/3d-Station-Toon
+ New-Journey-Art-Flux : strangerzonehf/New-Journey-Art-Flux
+ Casual-Pencil-Pro : strangerzonehf/Casual-Pencil-Pro
+ Realism-H6-Flux : strangerzonehf/Realism-H6-Flux

- Repository Page : strangerzonehf

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.
  • 1 reply
·
prithivMLmods 
posted an update 9 days ago
view post
Post
2571
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific 𝗶𝗺𝗮𝗴𝗲 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. 🧤☄️

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Age-Classification : prithivMLmods/Age-Classification-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers 🤗.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

prithivMLmods 
posted an update 13 days ago
view post
Post
2254
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉GitHub [ Demo ] : https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
·
prithivMLmods 
posted an update 19 days ago
view post
Post
940
Hey Guys! One Small Announcement 🤗
Stranger Zone now accepts LoRA requests!

✍️Request : strangerzonehf/Request-LoRA [ or ] strangerzonehf/Request-LoRA#1

Page : strangerzonehf

Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!

Thank you!
prithivMLmods 
posted an update 21 days ago
view post
Post
2484
Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal
🥠Git : https://github.com/PRITHIVSAKTHIUR/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
  • 1 reply
·
prithivMLmods 
posted an update 22 days ago
prithivMLmods 
posted an update 28 days ago
singhsidhukuldeep 
posted an update about 1 month ago
view post
Post
6821
Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!
singhsidhukuldeep 
posted an update about 1 month ago
view post
Post
587
Exciting Research Alert: Enhancing Dense Retrieval with Deliberate Thinking

I just came across a fascinating new paper titled "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search" that introduces DEBATER (Deliberate Thinking based Dense Retriever), a novel approach to improve information retrieval using large language models.

The research team from Northeastern University and Tsinghua University has developed a method that significantly outperforms existing dense retrieval systems by enabling LLMs to "think deliberately" before generating document representations.

>> Technical Details

DEBATER enhances LLM-based retrievers through two key mechanisms:

1. Chain-of-Deliberation (CoD): This approach delays the computation of document embeddings by performing several steps of reasoning. It incorporates a sequence of prompt tokens that stimulate the reasoning capability of LLMs, encouraging the model to think step-by-step before producing the final document embedding.

2. Self Distillation (SD): This mechanism distills knowledge from different thinking steps into the final document representation. It identifies the most informative thinking steps and integrates them into a unified text embedding.

The implementation uses cosine similarity to measure the similarity between queries and documents. During training, DEBATER calculates similarity scores between query representation and document representations at each thinking step, then selects the most useful thinking step from CoD.

>> Performance

What's particularly impressive is that DEBATER-4B outperforms larger 7B-scale LLM-based dense retrievers while using significantly fewer parameters. In experiments on the BEIR benchmark, DEBATER achieved more than a 2% improvement over baseline retrievers.

The researchers found that an appropriate thinking depth (around 4-8 steps) effectively activates the reasoning capabilities of LLM-based retrievers.