Martin Viewegger

Viewegger

AI & ML interests

None yet

Recent Activity

liked a model 25 days ago
Lightricks/LTXV-LoRAs
liked a model 27 days ago
lodestones/Chroma
View all activity

Organizations

None yet

Viewegger's activity

reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3425
New foundation model on image and video captioning just dropped by NVIDIA AI ๐Ÿ”ฅ

Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references ๐Ÿ˜ฎ

The team released the models, the dataset, a new benchmark and a demo ๐Ÿคฉ nvidia/describe-anything-680825bb8f5e41ff0785834c

Most of the vision LMs focus on image as a whole, lacking localized references in captions, and not taking in visual prompts (points, boxes, drawings around objects)

DAM addresses this on two levels: new vision backbone that takes in focal crops and the image itself, and a large scale dataset ๐Ÿ‘€

They generate a dataset by extending existing segmentation and referring expression generation datasets like REFCOCO, by passing in the images and classes to VLMs and generating captions.

Lastly, they also release a new benchmark again with self-supervision, they use an LLM to evaluate the detailed captions focusing on localization ๐Ÿ‘
New activity in mahwizzzz/orpheus-urdu-tts about 1 month ago
New activity in kadirnar/Orpheus-TTS-Starrail about 2 months ago

Speaker reference from dataset

#2 opened about 2 months ago by
Viewegger
New activity in kadirnar/Orpheus-TTS-Starrail about 2 months ago

Starrail dataset language?

6
#1 opened about 2 months ago by
Viewegger
reacted to alibabasglab's post with ๐Ÿ‘ 5 months ago
reacted to alibabasglab's post with ๐Ÿ‘ 5 months ago
view post
Post
5337
๐ŸŽ‰ ClearerVoice-Studio New Feature: Speech Super-Resolution with MossFormer2 ! ๐Ÿš€
Weโ€™re excited to announce that ClearerVoice-Studio now supports speech super-resolution, powered by our latest MossFormer2-based model!
Whatโ€™s New?

๐Ÿ”Š Convert Low-Resolution to High-Resolution Audio:
Transform low-resolution audio (effective sampling rate โ‰ฅ 16 kHz) into crystal-clear, high-resolution audio at 48 kHz.

๐Ÿค– Cutting-Edge Technology:
Leverages the MossFormer2 model plus HiFi-GAN, optimised for generating high-quality audio with enhanced perceptual clarity.

๐ŸŽง Enhanced Listening Experience:
Perfect for speech enhancement, content restoration, and high-fidelity audio applications.

๐ŸŒŸ Try It Out!
Upgrade to the latest version of ClearerVoice-Studio (https://github.com/modelscope/ClearerVoice-Studio) to experience this powerful feature. Check out the updated documentation and examples in our repository.

Let us know your thoughts, feedback, or feature requests in the Issues section.