Update readme
Browse files
README.md
CHANGED
@@ -2,3 +2,73 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
💫 Official implementation of Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
💫 Official implementation of Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
|
5 |
+
|
6 |
+
> [**Accepted at CVPR 2025**](https://cvpr.thecvf.com/virtual/2025/poster/34423)<br>
|
7 |
+
|
8 |
+
<div align="left">
|
9 |
+
<a href="https://github.com/FBehrad/Charm">
|
10 |
+
<img src="https://github.com/FBehrad/Charm/blob/main/MainFigure_new.jpg?raw=true" alt="Overall framework" width="400"/>
|
11 |
+
</a>
|
12 |
+
</div>
|
13 |
+
|
14 |
+
We introduce **Charm** , a novel tokenization approach that preserves **C**omposition, **H**igh-resolution,
|
15 |
+
**A**spect **R**atio, and **M**ulti-scale information simultaneously. By preserving critical aesthetic information, <em> Charm </em> achieves significant performance improvement across different image aesthetic and quality assessment datasets.
|
16 |
+
|
17 |
+
|
18 |
+
### Quick Inference
|
19 |
+
|
20 |
+
* Step 1) Check our [GitHub Page](https://github.com/FBehrad/Charm/) and install the requirements.
|
21 |
+
|
22 |
+
```setup
|
23 |
+
pip install -r requirements.txt
|
24 |
+
```
|
25 |
+
___
|
26 |
+
* Step 2) Install Charm tokenizer.
|
27 |
+
```setup
|
28 |
+
pip install Charm-tokenizer
|
29 |
+
```
|
30 |
+
___
|
31 |
+
* Step 3) Tokenization + Position embedding preparation
|
32 |
+
|
33 |
+
<div align="center">
|
34 |
+
<a href="https://github.com/FBehrad/Charm">
|
35 |
+
<img src="https://github.com/FBehrad/Charm/blob/main/charm.gif?raw=true" alt="Charm tokenizer" width="700"/>
|
36 |
+
</a>
|
37 |
+
</div>
|
38 |
+
|
39 |
+
```python
|
40 |
+
from Charm_tokenizer.ImageProcessor import Charm_Tokenizer
|
41 |
+
|
42 |
+
img_path = r"img.png"
|
43 |
+
|
44 |
+
charm_tokenizer = Charm_Tokenizer(patch_selection='frequency', training_dataset='tad66k', without_pad_or_dropping=True)
|
45 |
+
tokens, pos_embed, mask_token = charm_tokenizer.preprocess(img_path)
|
46 |
+
```
|
47 |
+
Charm Tokenizer has the following input args:
|
48 |
+
* patch_selection (str): The method for selecting important patches
|
49 |
+
* Options: 'saliency', 'random', 'frequency', 'gradient', 'entropy', 'original'.
|
50 |
+
* training_dataset (str): Used to set the number of ViT input tokens to match a specific training dataset from the paper.
|
51 |
+
* Aesthetic assessment datasets: 'aadb', 'tad66k', 'para', 'baid'.
|
52 |
+
* Quality assessment datasets: 'spaq', 'koniq10k'.
|
53 |
+
* backbone (str): The ViT backbone model (default: 'facebook/dinov2-small').
|
54 |
+
* factor (float): The downscaling factor for less important patches (default: 0.5).
|
55 |
+
* scales (int): The number of scales used for multiscale processing (default: 2).
|
56 |
+
* random_crop_size (tuple): Used for the 'original' patch selection strategy (default: (224, 224)).
|
57 |
+
* downscale_shortest_edge (int): Used for the 'original' patch selection strategy (default: 256).
|
58 |
+
* without_pad_or_dropping (bool): Whether to avoid padding or dropping patches (default: True).
|
59 |
+
|
60 |
+
The output is the preprocessed tokens, their corresponding positional embeddings, and a mask token that indicates which patches are in high resolution and which are in low resolution.
|
61 |
+
___
|
62 |
+
|
63 |
+
* Step 4) Predicting aesthetic/quality score
|
64 |
+
|
65 |
+
```python
|
66 |
+
from Charm_tokenizer.Backbone import backbone
|
67 |
+
|
68 |
+
model = backbone(training_dataset='tad66k', device='cpu')
|
69 |
+
prediction = model.predict(tokens, pos_embed, mask_token)
|
70 |
+
```
|
71 |
+
|
72 |
+
**Note:**
|
73 |
+
1. While random patch selection during training helps avoid overfitting,for consistent results during inference, fully deterministic patch selection approaches should be used.
|
74 |
+
2. For the training code, check our [GitHub Page](https://github.com/FBehrad/Charm/).
|