FatemehBehrad commited on
Commit
6a38f05
·
verified ·
1 Parent(s): 8fe8252

Update readme

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -2,3 +2,73 @@
2
  license: apache-2.0
3
  ---
4
  💫 Official implementation of Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
  💫 Official implementation of Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
5
+
6
+ > [**Accepted at CVPR 2025**](https://cvpr.thecvf.com/virtual/2025/poster/34423)<br>
7
+
8
+ <div align="left">
9
+ <a href="https://github.com/FBehrad/Charm">
10
+ <img src="https://github.com/FBehrad/Charm/blob/main/MainFigure_new.jpg?raw=true" alt="Overall framework" width="400"/>
11
+ </a>
12
+ </div>
13
+
14
+ We introduce **Charm** , a novel tokenization approach that preserves **C**omposition, **H**igh-resolution,
15
+ **A**spect **R**atio, and **M**ulti-scale information simultaneously. By preserving critical aesthetic information, <em> Charm </em> achieves significant performance improvement across different image aesthetic and quality assessment datasets.
16
+
17
+
18
+ ### Quick Inference
19
+
20
+ * Step 1) Check our [GitHub Page](https://github.com/FBehrad/Charm/) and install the requirements.
21
+
22
+ ```setup
23
+ pip install -r requirements.txt
24
+ ```
25
+ ___
26
+ * Step 2) Install Charm tokenizer.
27
+ ```setup
28
+ pip install Charm-tokenizer
29
+ ```
30
+ ___
31
+ * Step 3) Tokenization + Position embedding preparation
32
+
33
+ <div align="center">
34
+ <a href="https://github.com/FBehrad/Charm">
35
+ <img src="https://github.com/FBehrad/Charm/blob/main/charm.gif?raw=true" alt="Charm tokenizer" width="700"/>
36
+ </a>
37
+ </div>
38
+
39
+ ```python
40
+ from Charm_tokenizer.ImageProcessor import Charm_Tokenizer
41
+
42
+ img_path = r"img.png"
43
+
44
+ charm_tokenizer = Charm_Tokenizer(patch_selection='frequency', training_dataset='tad66k', without_pad_or_dropping=True)
45
+ tokens, pos_embed, mask_token = charm_tokenizer.preprocess(img_path)
46
+ ```
47
+ Charm Tokenizer has the following input args:
48
+ * patch_selection (str): The method for selecting important patches
49
+ * Options: 'saliency', 'random', 'frequency', 'gradient', 'entropy', 'original'.
50
+ * training_dataset (str): Used to set the number of ViT input tokens to match a specific training dataset from the paper.
51
+ * Aesthetic assessment datasets: 'aadb', 'tad66k', 'para', 'baid'.
52
+ * Quality assessment datasets: 'spaq', 'koniq10k'.
53
+ * backbone (str): The ViT backbone model (default: 'facebook/dinov2-small').
54
+ * factor (float): The downscaling factor for less important patches (default: 0.5).
55
+ * scales (int): The number of scales used for multiscale processing (default: 2).
56
+ * random_crop_size (tuple): Used for the 'original' patch selection strategy (default: (224, 224)).
57
+ * downscale_shortest_edge (int): Used for the 'original' patch selection strategy (default: 256).
58
+ * without_pad_or_dropping (bool): Whether to avoid padding or dropping patches (default: True).
59
+
60
+ The output is the preprocessed tokens, their corresponding positional embeddings, and a mask token that indicates which patches are in high resolution and which are in low resolution.
61
+ ___
62
+
63
+ * Step 4) Predicting aesthetic/quality score
64
+
65
+ ```python
66
+ from Charm_tokenizer.Backbone import backbone
67
+
68
+ model = backbone(training_dataset='tad66k', device='cpu')
69
+ prediction = model.predict(tokens, pos_embed, mask_token)
70
+ ```
71
+
72
+ **Note:**
73
+ 1. While random patch selection during training helps avoid overfitting,for consistent results during inference, fully deterministic patch selection approaches should be used.
74
+ 2. For the training code, check our [GitHub Page](https://github.com/FBehrad/Charm/).