FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Paper • 2010.11929 • Published Oct 22, 2020 • 6
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks Paper • 1811.00056 • Published Oct 31, 2018 • 2
Data Generation for Post-OCR correction of Cyrillic handwriting Paper • 2311.15896 • Published Nov 27, 2023 • 3
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation Paper • 2309.03072 • Published Sep 6, 2023 • 2
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models Paper • 2003.11142 • Published Mar 24, 2020 • 2
U-Net: Convolutional Networks for Biomedical Image Segmentation Paper • 1505.04597 • Published May 18, 2015 • 7
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images Paper • 2310.16186 • Published Oct 24, 2023 • 2
RTSeg: Real-time Semantic Segmentation Comparative Study Paper • 1803.02758 • Published Mar 7, 2018 • 2
Generalizability vs. Robustness: Adversarial Examples for Medical Imaging Paper • 1804.00504 • Published Mar 23, 2018 • 2
Hierarchical multi-class segmentation of glioma images using networks with multi-level activation function Paper • 1810.09488 • Published Oct 22, 2018 • 2
IVD-Net: Intervertebral disc localization and segmentation in MRI with a multi-modal UNet Paper • 1811.08305 • Published Nov 19, 2018 • 2
A multi-path 2.5 dimensional convolutional neural network system for segmenting stroke lesions in brain MRI images Paper • 1905.10835 • Published May 26, 2019 • 2
Enforcing temporal consistency in Deep Learning segmentation of brain MR images Paper • 1906.07160 • Published Jun 13, 2019 • 3
Skip-Connected Neural Networks with Layout Graphs for Floor Plan Auto-Generation Paper • 2309.13881 • Published Sep 25, 2023 • 2
Inter-Scale Dependency Modeling for Skin Lesion Segmentation with Transformer-based Networks Paper • 2310.13727 • Published Oct 20, 2023 • 2
Latent Diffusion Model for Medical Image Standardization and Enhancement Paper • 2310.05237 • Published Oct 8, 2023 • 2
3D Medical Image Segmentation based on multi-scale MPU-Net Paper • 2307.05799 • Published Jul 11, 2023 • 2
Self-Supervised U-Net for Segmenting Flat and Sessile Polyps Paper • 2110.08776 • Published Oct 17, 2021 • 2
Enforcing Morphological Information in Fully Convolutional Networks to Improve Cell Instance Segmentation in Fluorescence Microscopy Images Paper • 2106.05843 • Published Jun 10, 2021 • 2
Saliency-Guided Deep Learning Network for Automatic Tumor Bed Volume Delineation in Post-operative Breast Irradiation Paper • 2105.02771 • Published May 6, 2021 • 2
Qutrit-inspired Fully Self-supervised Shallow Quantum Learning Network for Brain Tumor Segmentation Paper • 2009.06767 • Published Sep 14, 2020 • 2
The Effects of Image Pre- and Post-Processing, Wavelet Decomposition, and Local Binary Patterns on U-Nets for Skin Lesion Segmentation Paper • 1805.05239 • Published Apr 30, 2018 • 2
A joint 3D UNet-Graph Neural Network-based method for Airway Segmentation from chest CTs Paper • 1908.08588 • Published Aug 22, 2019 • 2
Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers Paper • 2201.10981 • Published Jan 26, 2022 • 2
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows Paper • 2107.00652 • Published Jul 1, 2021 • 2
2nd Place Solution to Google Landmark Recognition Competition 2021 Paper • 2110.02638 • Published Oct 6, 2021 • 2
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts Paper • 2010.01809 • Published Oct 5, 2020 • 2
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Paper • 2103.14030 • Published Mar 25, 2021 • 4
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images Paper • 2104.12137 • Published Apr 25, 2021 • 2
Bootstrap your own latent: A new approach to self-supervised Learning Paper • 2006.07733 • Published Jun 13, 2020 • 2
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation Paper • 2108.11993 • Published Aug 26, 2021 • 2
Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge Paper • 2202.13588 • Published Feb 28, 2022 • 2
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology Paper • 2204.05044 • Published Apr 11, 2022 • 2
Emerging Properties in Self-Supervised Vision Transformers Paper • 2104.14294 • Published Apr 29, 2021 • 3
GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection Paper • 2104.14528 • Published Apr 29, 2021 • 2
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation Paper • 2401.12208 • Published Jan 22 • 21
DAS: A Deformable Attention to Capture Salient Information in CNNs Paper • 2311.12091 • Published Nov 20, 2023 • 2
TANKER: Distributed Architecture for Named Entity Recognition and Disambiguation Paper • 1708.09230 • Published Aug 30, 2017
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 16
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14 • 8
LocalMamba: Visual State Space Model with Windowed Selective Scan Paper • 2403.09338 • Published Mar 14 • 7
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24
Language Grounded QFormer for Efficient Vision Language Understanding Paper • 2311.07449 • Published Nov 13, 2023 • 2
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Paper • 2112.10741 • Published Dec 20, 2021 • 3
Synthetic Shifts to Initial Seed Vector Exposes the Brittle Nature of Latent-Based Diffusion Models Paper • 2312.11473 • Published Nov 24, 2023 • 2
Lightweight Image Inpainting by Stripe Window Transformer with Joint Attention to CNN Paper • 2301.00553 • Published Jan 2, 2023 • 2
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells Paper • 2401.07278 • Published Jan 14 • 2
Flamingo: a Visual Language Model for Few-Shot Learning Paper • 2204.14198 • Published Apr 29, 2022 • 14
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 31
LightIt: Illumination Modeling and Control for Diffusion Models Paper • 2403.10615 • Published Mar 15 • 16
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing Paper • 2403.12032 • Published Mar 18 • 14
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data Paper • 2403.11207 • Published Mar 17 • 14
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis Paper • 2403.12963 • Published Mar 19 • 7
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18 • 16
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation Paper • 2403.12906 • Published Mar 19 • 5
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models Paper • 2403.13447 • Published Mar 20 • 18
S2LIC: Learned Image Compression with the SwinV2 Block, Adaptive Channel-wise and Global-inter Attention Context Paper • 2403.14471 • Published Mar 21 • 2
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 17
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 19
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models Paper • 2309.01674 • Published Sep 4, 2023 • 2
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Paper • 2210.07179 • Published Oct 13, 2022 • 3
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion Paper • 2403.17237 • Published Mar 25 • 9
One-step Diffusion with Distribution Matching Distillation Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric Paper • 1801.03924 • Published Jan 11, 2018 • 2
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 30
Condition-Aware Neural Network for Controlled Image Generation Paper • 2404.01143 • Published Apr 1 • 11
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 20
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models Paper • 2404.02747 • Published Apr 3 • 11
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss Paper • 2404.02731 • Published Apr 3 • 1
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 33
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 11
Prompt-to-Prompt Image Editing with Cross Attention Control Paper • 2208.01626 • Published Aug 2, 2022 • 2
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training Paper • 2404.03618 • Published Apr 4 • 2
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 77
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective Paper • 2404.07200 • Published Apr 10 • 1
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model Paper • 2404.07773 • Published Apr 11 • 1
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs Paper • 2404.07677 • Published Apr 11 • 1
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
Text Role Classification in Scientific Charts Using Multimodal Transformers Paper • 2402.14579 • Published Feb 8 • 1
Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data Paper • 2404.08613 • Published Apr 12 • 1
HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising Paper • 2404.09697 • Published Apr 15 • 1
Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis Paper • 2404.09666 • Published Apr 15 • 1
Comprehensive Survey of Model Compression and Speed up for Vision Transformers Paper • 2404.10407 • Published Apr 16 • 1
Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI Paper • 2404.11428 • Published Apr 17 • 1
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Paper • 2404.11565 • Published Apr 17 • 14
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Paper • 2401.18059 • Published Jan 31 • 35
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 29
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 30
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer Paper • 2404.14351 • Published Apr 22 • 5
MultiBooth: Towards Generating All Your Concepts in an Image from Text Paper • 2404.14239 • Published Apr 22 • 8
Efficient Transformer Encoders for Mask2Former-style models Paper • 2404.15244 • Published Apr 23 • 1
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 26
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 35
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections Paper • 2404.16845 • Published Feb 14 • 6
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 27
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 51
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published May 27 • 7
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces Paper • 2211.15775 • Published Nov 28, 2022 • 1
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published Jun 11 • 10
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published Jun 26 • 25
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space Paper • 2406.19370 • Published Jun 27 • 1
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity Paper • 2406.17720 • Published Jun 25 • 7
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2 • 22
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents Paper • 2407.03300 • Published Jul 3 • 11
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 84
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9 • 41
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 57
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 57
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 97
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 12 days ago • 71
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Paper • 2411.02327 • Published 5 days ago • 11