Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
9
4
19
Bilge YΓΌcel
bilgeyucel
Follow
HasanBeratSoke's profile picture
PeepDaSlan9's profile picture
aunl's profile picture
18 followers
Β·
6 following
bilgeyucl
bilgeyucel
bilge-yucel
AI & ML interests
NLP, Semantic Search, LLMs
Recent Activity
reacted
to
anakin87
's
post
with β€οΈ
4 days ago
Haystack can now see π The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support! π Notebooks below This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases. What's new? π§ Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming π) ποΈ Prompt template language to handle structured inputs, including images π PDF and image converters π Image embedders using CLIP-like models π§Ύ LLM-based extractor to pull text from images π§© Components to build multimodal RAG pipelines and Agents I had the chance of leading this effort with @sjrhuschlee (great collab). π Below you can find two notebooks to explore the new features: σ ―β’σ σ Introduction to Multimodal Text Generation https://haystack.deepset.ai/cookbook/multimodal_intro σ ―β’σ σ Creating Vision+Text RAG Pipelines https://haystack.deepset.ai/tutorials/46_multimodal_rag (πΌοΈ image by @bilgeyucel )
reacted
to
anakin87
's
post
with π₯
4 days ago
π΅οΈπ Building Browser Agents - notebook No API? No problem. Browser Agents can use websites like you do: click, type, wait, read. π Step-by-step notebook: https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/browser_agents.ipynb π₯ In the video, the Agent: - Goes to Hugging Face Spaces - Finds https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell - Expands a short prompt ("my holiday on Lake Como") into a detailed image generation prompt - Waits for the image - Returns the image URL ## What else can it do? Great for information gathering and summarization ποΈποΈ Compare news websites and create a table of shared stories with links βΆοΈ Find content creator social profiles from YouTube videos ποΈ Find a product's price range on Amazon π π Gather public transportation travel options ## How is it built? ποΈ Haystack β Agent execution logic π§ Google Gemini 2.5 Flash β Good and fast LLM with a generous free tier π οΈ Playwright MCP server β Browser automation tools: navigate, click, type, wait... Even without vision capabilities, this setup can get quite far. ## Next steps - Try a local open model - Move from notebook to real deployment - Incorporate vision And you? Have you built something similar? What's in your stack?
liked
a model
about 1 month ago
Trendyol/Trendyol-LLM-8B-T1
View all activity
Organizations
spaces
1
Running
15
Captionate
πΈ
Generate Instagram captions from images
models
0
None public yet
datasets
1
bilgeyucel/seven-wonders
Viewer
β’
Updated
Mar 9, 2023
β’
151
β’
1.42k
β’
5