lcolok/ViTMatte · Hugging Face

Model Details

This model is a port of the ViTMatte models, which are trained and tested on the Composition-1k and Distinctions-646 datasets. This port focuses on the performance and accuracy of the models.

Note: The porting of the model is for the convenience of use, and to better promote and learn from this excellent open-source project.

Usage

This model aims to perform various image processing tasks, such as image segmentation, object recognition, and object detection.

Training Data

The model undergoes training and validation using two datasets:

Composition-1k, this dataset used for training and testing, includes 1000 samples.
Distinctions-646, this dataset includes 646 samples and is used for model validation.

Training Procedure

The model is trained using the gradient descent algorithm and evaluates its performance using the following four metrics:

SAD (Sum of Absolute Differences)
MSE (Mean Squared Error)
Grad (Gradient)
Conn (Connectivity)

Performance

The models have shown the following performance on the two datasets:

On the Composition-1k dataset:

Model	SAD	MSE	Grad	Conn
ViTMatte-S	21.46	3.3	7.24	16.21
ViTMatte-B	20.33	3.0	6.74	14.78

On the Distinctions-646 dataset:

Model	SAD	MSE	Grad	Conn
ViTMatte-S	21.22	2.1	8.78	17.55
ViTMatte-B	17.05	1.5	7.03	12.95

Both models perform well on these datasets, with ViTMatte-B outperforming ViTMatte-S on most evaluation metrics.

Disclaimer

This model is ported from lufficc's ViTMatte project. All original rights belong to lufficc.

Citation

If you use these models, please cite the original author and project: https://github.com/hustvl/ViTMatte

Thank you for using these models. If you encounter any issues or have any feedback during your usage, please raise them on the original GitHub project page of the author.