Gradio-Themes-Party

company

AI & ML interests

None defined yet.

Recent Activity

Gradio-Themes's activity

mikonvergence 
posted an update 2 days ago
view post
Post
637
🔵 𝐂𝐎𝐏-𝐆𝐄𝐍-𝐁𝐞𝐭𝐚: 𝐔𝐧𝐢𝐟𝐢𝐞𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐥𝐢𝐧𝐠 𝐨𝐟 𝐂𝐎𝐏𝐞𝐫𝐧𝐢𝐜𝐮𝐬 𝐈𝐦𝐚𝐠𝐞𝐫𝐲 𝐓𝐡𝐮𝐦𝐛𝐧𝐚𝐢𝐥𝐬

Today we release a prototype of COP-GEN - a universal generative model for Copernicus data. 𝐂𝐎𝐏-𝐆𝐄𝐍-𝐁𝐞𝐭𝐚 is a model trained globally on the thumbnails of the Major TOM Core datasets, including Sentinel-2 L1C, Sentinel-2 L2A, Sentinel-1 RTC, and COP-DEM GLO-30.

⚖️ 𝐌𝐨𝐝𝐞𝐥 mespinosami/COP-GEN-Beta

📱 𝐃𝐞𝐦𝐨 mikonvergence/COP-GEN-Beta

How is it universal? COP-GEN learns a joint generative process of all modalities, which means that it can reconstruct data from any subset of present observations. 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐚𝐥𝐥𝐲 to perform any of these tasks it can be used to approximate:

✅ Sentinel-1 to Sentinel-2 translation

✅ Elevation estimation from Sentinel-2 or Sentinel-1

✅ Atmospheric Correction (L1C to L2A pipeline)

✅ Atmospheric Generation (L2A to L1C)

✅ ...and any other task involving translation between the supported modalities

On its own, the model can be used as a useful prior for estimating the data likelihood distribution for Copernicus data. COP-GEN-Beta learns joint, conditional, and marginal distributions within a single unified backbone, allowing to flexibly sample any modality given any condition.

Why is it Beta? Because thumbnails are a low-cost representation of the data that scales well and we managed to develop this prototype quite fast. We are currently developing the more costly COP-GEN model that supports the original data. For now, we wanted to showcase the prototype and make it available to the community for a test!

🌐 𝐖𝐞𝐛𝐬𝐢𝐭𝐞 https://miquel-espinosa.github.io/cop-gen

💻 𝐂𝐨𝐝𝐞 https://github.com/miquel-espinosa/COP-GEN-Beta

📄 𝐏𝐚𝐩𝐞𝐫 https://arxiv.org/pdf/2504.08548
  • 1 reply
·
mikonvergence 
posted an update 7 days ago
view post
Post
1510
𝐌𝐄𝐒𝐀 🏔️ 𝐓𝐞𝐱𝐭-𝐛𝐚𝐬𝐞𝐝 𝐭𝐞𝐫𝐫𝐚𝐢𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥

MESA is a novel generative model based on latent denoising diffusion capable of generating 2.5D representations (co-registered colour and depth maps) of terrains based on text prompt conditioning.

Work developed by Paul Borne–Pons ( @NewtNewt ) during his joint internship at
Adobe & ESA, and in collaboration with asterisk labs.

🏔️ 𝐏𝐫𝐨𝐣𝐞𝐜𝐭 𝐏𝐚𝐠𝐞 : https://paulbornep.github.io/mesa-terrain/

📝 𝐏𝐫𝐞𝐩𝐫𝐢𝐧𝐭 : https://arxiv.org/abs/2504.07210
🤗 𝐌𝐨𝐝𝐞𝐥 𝐖𝐞𝐢𝐠𝐡𝐭𝐬 : NewtNewt/MESA
💾 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 : Major-TOM/Core-DEM
🧑🏻‍💻​𝐂𝐨𝐝𝐞 : https://github.com/PaulBorneP/MESA

𝐇𝐅 𝐒𝐩𝐚𝐜𝐞: mikonvergence/MESA
  • 2 replies
·
Nymbo 
posted an update 8 days ago
view post
Post
820
gen z boss and a o3-mini
gen z boss and a o3-mini
chansung 
posted an update 29 days ago
view post
Post
3481
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!
·
chansung 
posted an update about 1 month ago
view post
Post
2581
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
·
chansung 
posted an update about 1 month ago
view post
Post
1570
Gemma 3 Release in a nutshell
(seems like function calling is not supported whereas the announcement said so)
chansung 
posted an update 3 months ago
view post
Post
3017
Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161
chansung 
posted an update 3 months ago
view post
Post
4399
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
chansung 
posted an update 3 months ago
view post
Post
2032
Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI!

It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.

Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.

Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.

The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.
chansung 
posted an update 3 months ago
view post
Post
1739
New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the @akhaliq )

Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.

Link: https://deep-diver.github.io/ai-paper-reviewer/

This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)
chansung 
posted an update 3 months ago
view post
Post
2035
Simple summarization of Evolving Deeper LLM Thinking (Google DeepMind)

The process starts by posing a question.
1) The LLM generates initial responses.
2) These generated responses are evaluated according to specific criteria (program-based checker).
3) The LLM critiques the evaluated results.
4) The LLM refines the responses based on the evaluation, critique, and original responses.

The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged).

Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios.

However, there are two major drawbacks:
🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.)
🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.)

https://arxiv.org/abs/2501.09891
chansung 
posted an update 3 months ago
view post
Post
2076
Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.


Model: deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1
  • 1 reply
·