fal

Enterprise
company
Activity Feed

AI & ML interests

generative media platform for developers

Articles

fal's activity

benjamin-paineย 
posted an update 2 months ago
view post
Post
2165
Zonos is flying up the trending tab, and for good reason - it's the most expressive and emotive open-source TTS I've used to date. I'm happy to say it's now supported in Taproot, with added long-form synthesis support and other goodies.

Try it here: https://huggingface.co/spaces/benjamin-paine/zonos-longform

Getting started with Zonos in Taproot is easy; with a working CUDA toolkit and Python/Pip installation, all you have to do is:
apt install espeak-ng
pip install taproot
taproot install speech-synthesis:zonos-transformer
taproot invoke speech-synthesis:zonos-transformer --text "Hello, world!"

See more on GitHub at https://github.com/painebenjamin/taproot/
  • 2 replies
ยท
benjamin-paineย 
posted an update 4 months ago
view post
Post
3088
Hello HuggingFace ๐Ÿค—, and happy new year! ๐ŸŽ†

I'm thrilled to be releasing the first iteration of a project I've been working on for quite awhile now. It's called Taproot, and it's a seamlessly scalable open-source AI/ML inference engine designed for letting developers build real-time experiences clustered across a small-to-mid-sized cluster, without the burden of hyperscale infrastructure.

Along with the server and task framework is a client library for node and the browser. And what good is a server and client without an app to go alongside it? To that end, I'm also releasing Anachrovox, a fun, real-time hands-free voice assistant that can run on mid-level devices in <12GB VRAM, with web search, weather, and other tools. It uses my real-time browser wake-word library to detect utterances of the phrase 'Hey Vox', 'Hi Vox', 'Okay Vox', 'Anachrovox' or just 'Vox' (alongside some others.)

Releasing this many things at once will definitely result in bugs, so please report them when sighted! Thank you all!

Taproot: https://github.com/painebenjamin/taproot
Taproot JS Client: https://github.com/painebenjamin/taproot.js
Anachrovox: https://github.com/painebenjamin/anachrovox

The Anachrovox Spaces are networked together, balancing load across them to keep all front-ends responsive. You only have to choose what color you like the most!

https://huggingface.co/spaces/benjamin-paine/anachrovox
https://huggingface.co/spaces/benjamin-paine/anachrovox-amber
  • 12 replies
ยท
gokaygokayย 
posted an update 8 months ago
view post
Post
12786
FLUX Prompt Generator Updates

- gokaygokay/FLUX-Prompt-Generator

- There are now hundreds of new selections across diverse categories, each offering a lot of choices:

Architecture, Art, Artist, Brands, Character, Cinematic, Fashion, Feelings, Geography, Human, Interaction, Keywords, Objects, People, Photography, Plots, Poses, Scene, Science, Stuff, Time, Typography, Vehicle, Video Game

- In addition to Hugging Face, I've integrated new LLM providers: Groq, OpenAI, and Claude.

- Upgraded Vision Language Models (VLMs): We now feature Qwen2-VL, JoyCaption and Florence-2-large.

- New specialized system prompts for various styles and themes, including Happy, Simple, Poster, Only Objects, No Figure, Landscape, Fantasy.
  • 2 replies
ยท
isidenticalย 
posted an update 8 months ago
isidenticalย 
posted an update 8 months ago
view post
Post
662
Added FLUX.1 pro/dev/schnell and AuraFlow v0.2 to fal/imgsys !!! Go play with it and get us some votez
isidenticalย 
posted an update 8 months ago
view post
Post
1856
fal/AuraFlow-v0.3 is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
gokaygokayย 
posted an update 9 months ago
view post
Post
14392
I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator

You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

- gokaygokay/FLUX.1-dev-with-Captioner
ยท
gokaygokayย 
posted an update 9 months ago
gokaygokayย 
posted an update 9 months ago
isidenticalย 
posted an update 10 months ago
view post
Post
4040
Announcing the second open model in our Aura series of media models at @fal : fal/AuraFlow

Try it using diffusers or ComfyUI from publicly available weights, and read more about it in our blog https://blog.fal.ai/auraflow.
  • 3 replies
ยท
gokaygokayย 
posted an update 10 months ago
gokaygokayย 
posted an update 10 months ago
view post
Post
5043
Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team.

Hugging Face Spaces
- gokaygokay/Kolors

Model Page
- Kwai-Kolors/Kolors
gokaygokayย 
posted an update 10 months ago
view post
Post
4070
I've created a space for chatting with Gemma 2 using llama.cpp

- ๐ŸŽ›๏ธ Choose between 27B IT and 9b IT models
- ๐Ÿš€ Fast inference using llama.cpp

- gokaygokay/Gemma-2-llamacpp
  • 1 reply
ยท
gokaygokayย 
posted an update 10 months ago
view post
Post
3022
I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

https://huggingface.co/spaces/gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!
isidenticalย 
posted an update 10 months ago
view post
Post
1524
It is time for some Aura.

First in our series of fully open sourced / commercially available models by @fal-ai : AuraSR - a 600M parameter upscaler based on GigaGAN.

Blog: https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/

HF: https://huggingface.co/fal-ai/AuraSR

Code: https://github.com/fal-ai/aura-sr

Playground: https://fal.ai/models/fal-ai/aura-sr/playground

What other models would you like to see open-sourced and commercially available? :)
gokaygokayย 
posted an update 10 months ago
view post
Post
5976
I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.

This one gives you longer captions.

gokaygokay/SD3-Long-Captioner

This one gives you middle size captions.

https://huggingface.co/spaces/gokaygokay/SD3-Long-Captioner-V2

And this one gives you shorter captions.

https://huggingface.co/spaces/gokaygokay/SDXL-Captioner

ยท
isidenticalย 
posted an update 11 months ago
view post
Post
1252
One shot evaluations is hard. That is honestly what I learnt throughout the last couple of weeks trying to make imgsys.org data more and more relevant. There is just so much diversity in these models that saying one is better than other one even at a particular domain is impossible.

If you have any suggestions on how we can make the testing easier for one shot, single question image model testing; please give your suggestions under this thread so we can provide a more meaningful data point to the community!
Warlord-Kย 
posted an update 12 months ago
view post
Post
1507
What are some areas that Image generation models are currently lacking in?
ยท
isidenticalย 
posted an update about 1 year ago
isidenticalย 
posted an update over 1 year ago
view post
Post
What is the current SOTA in terms of fast personalized image generation? Most of the techniques that produce great results (which is hard to objectively measure, but subject similarity index being close to 80-90%) take either too much time (full on DreamBooth fine-tuning the base model) or or loose on the auxilary properties (high rank LoRAs).

We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?
ยท