- Coupled Variational Autoencoder Variational auto-encoders are powerful probabilistic models in generative tasks but suffer from generating low-quality samples which are caused by the holes in the prior. We propose the Coupled Variational Auto-Encoder (C-VAE), which formulates the VAE problem as one of Optimal Transport (OT) between the prior and data distributions. The C-VAE allows greater flexibility in priors and natural resolution of the prior hole problem by enforcing coupling between the prior and the data distribution and enables flexible optimization through the primal, dual, and semi-dual formulations of entropic OT. Simulations on synthetic and real data show that the C-VAE outperforms alternatives including VAE, WAE, and InfoVAE in fidelity to the data, quality of the latent representation, and in quality of generated samples. 2 authors · Jun 4, 2023
- Improving Variational Autoencoders with Density Gap-based Regularization Variational autoencoders (VAEs) are one of the powerful unsupervised learning frameworks in NLP for latent representation learning and latent-directed generation. The classic optimization goal of VAEs is to maximize the Evidence Lower Bound (ELBo), which consists of a conditional likelihood for generation and a negative Kullback-Leibler (KL) divergence for regularization. In practice, optimizing ELBo often leads the posterior distribution of all samples converge to the same degenerated local optimum, namely posterior collapse or KL vanishing. There are effective ways proposed to prevent posterior collapse in VAEs, but we observe that they in essence make trade-offs between posterior collapse and hole problem, i.e., mismatch between the aggregated posterior distribution and the prior distribution. To this end, we introduce new training objectives to tackle both two problems through a novel regularization based on the probabilistic density gap between the aggregated posterior distribution and the prior distribution. Through experiments on language modeling, latent space visualization and interpolation, we show that our proposed method can solve both problems effectively and thus outperforms the existing methods in latent-directed generation. To the best of our knowledge, we are the first to jointly solve the hole problem and the posterior collapse. 5 authors · Nov 1, 2022
- Optimal decision making in robotic assembly and other trial-and-error tasks Uncertainty in perception, actuation, and the environment often require multiple attempts for a robotic task to be successful. We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. Examples include a robot trying to connect with a charging station, parallel parking, or assembling a tightly-fitting part. The ability to restart after predicting failure early, versus simply running to failure, can significantly decrease the makespan, that is, the total time to completion, with the drawback of potentially short-cutting an otherwise successful operation. Assuming task running times to be Poisson distributed, and using a Markov Jump process to capture the dynamics of the underlying Markov Decision Process, we derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time. We demonstrate this approach using a robotic peg-in-hole assembly problem using a real robotic system. Failures are predicted by a dilated convolutional network based on force-torque data, showing an average makespan reduction from 101s to 81s (N=120, p<0.05). We posit that the proposed algorithm generalizes to any robotic behavior with an unambiguous terminal reward, with wide ranging applications on how robots can learn and improve their behaviors in the wild. 2 authors · Jan 25, 2023
- 3D Cinemagraphy from a Single Image We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography. Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion. We empirically find that naively combining existing 2D image animation and 3D photography methods leads to obvious artifacts or inconsistent animation. Our key insight is that representing and animating the scene in 3D space offers a natural solution to this task. To this end, we first convert the input image into feature-based layered depth images using predicted depth values, followed by unprojecting them to a feature point cloud. To animate the scene, we perform motion estimation and lift the 2D motion into the 3D scene flow. Finally, to resolve the problem of hole emergence as points move forward, we propose to bidirectionally displace the point cloud as per the scene flow and synthesize novel views by separately projecting them into target image planes and blending the results. Extensive experiments demonstrate the effectiveness of our method. A user study is also conducted to validate the compelling rendering results of our method. 6 authors · Mar 10, 2023
- MAT: Mask-Aware Transformer for Large Hole Image Inpainting Recent studies have shown the importance of modeling long-range interactions in the inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution in consideration of computational cost. In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. We carefully design each component of our framework to guarantee the high fidelity and diversity of recovered images. Specifically, we customize an inpainting-oriented transformer block, where the attention module aggregates non-local information only from partial valid tokens, indicated by a dynamic mask. Extensive experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets. Code is released at https://github.com/fenglinglwb/MAT. 6 authors · Mar 29, 2022
1 Visual Prompting via Image Inpainting How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc. 5 authors · Sep 1, 2022
- Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes Completing a corrupted image with correct structures and reasonable textures for a mixed scene remains an elusive challenge. Since the missing hole in a mixed scene of a corrupted image often contains various semantic information, conventional two-stage approaches utilizing structural information often lead to the problem of unreliable structural prediction and ambiguous image texture generation. In this paper, we propose a Semantic Guidance and Evaluation Network (SGE-Net) to iteratively update the structural priors and the inpainted image in an interplay framework of semantics extraction and image inpainting. It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated, and, accordingly, poorly-inferred regions are refined in subsequent scales. Extensive experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches, in terms of clear boundaries and photo-realistic textures. 5 authors · Mar 15, 2020
- The Honeymoon Oberwolfach Problem: small cases The Honeymoon Oberwolfach Problem HOP(2m_1,2m_2,ldots,2m_t) asks the following question. Given n=m_1+m_2+ldots +m_t newlywed couples at a conference and t round tables of sizes 2m_1,2m_2,ldots,2m_t, is it possible to arrange the 2n participants at these tables for 2n-2 meals so that each participant sits next to their spouse at every meal, and sits next to every other participant exactly once? A solution to HOP(2m_1,2m_2,ldots,2m_t) is a decomposition of K_{2n}+(2n-3)I, the complete graph K_{2n} with 2n-3 additional copies of a fixed 1-factor I, into 2-factors, each consisting of disjoint I-alternating cycles of lengths 2m_1,2m_2,ldots,2m_t. The Honeymoon Oberwolfach Problem was introduced in a 2019 paper by Lepine and Sajna. The authors conjectured that HOP(2m_1,2m_2,ldots, 2m_t) has a solution whenever the obvious necessary conditions are satisfied, and proved the conjecture for several large cases, including the uniform cycle length case m_1=ldots=m_t, and the small cases with n le 9. In the present paper, we extend the latter result to all cases with n le 20 using a computer search. 2 authors · Jun 28, 2024
- Product representation of perfect cubes Let F_{k,d}(n) be the maximal size of a set {A}subseteq [n] such that the equation \[a_1a_2\dots a_k=x^d, \; a_1<a_2<\ldots<a_k\] has no solution with a_1,a_2,ldots,a_kA and integer x. Erdos, S\'ark\"ozy and T. S\'os studied F_{k,2}, and gave bounds when k=2,3,4,6 and also in the general case. We study the problem for d=3, and provide bounds for k=2,3,4,6 and 9, furthermore, in the general case, as well. In particular, we refute an 18 years old conjecture of Verstra\"ete. We also introduce another function f_{k,d} closely related to F_{k,d}: While the original problem requires a_1, ldots , a_k to all be distinct, we can relax this and only require that the multiset of the a_i's cannot be partitioned into d-tuples where each d-tuple consists of d copies of the same number. 5 authors · May 20, 2024
- Concavity Properties of Solutions of Elliptic Equations under Conformal Deformations We study the Dirichlet problem for the weighted Schr\"odinger operator \[-\Delta u +Vu = \lambda \rho u,\] where rho is a positive weighting function and V is a potential. Such equations appear naturally in conformal geometry and in the composite membrane problem. Our primary goal is to establish concavity estimates for the principle eigenfunction with respect to conformal connections. Doing so, we obtain new bounds on the fundamental gap problem, which is the difference between the first and second eigenvalues. In particular, we partially resolve a conjecture of Nguyen, Stancu and Wei [IMRN 2022] on the fundamental gap of horoconvex domains. In addition, we obtain a power convexity estimate for solutions to the torsion problem in spherical geometry on convex domains which are not too large. 3 authors · Mar 5, 2024