Update README.md
Browse files
README.md
CHANGED
@@ -66,26 +66,26 @@ output = cf(**{'input_text': input_text, 'image/encoded': image_encoded})
|
|
66 |
|
67 |
<table style="width:100%; border-collapse: collapse; font-family: Arial, sans-serif;">
|
68 |
<tr>
|
69 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
70 |
<td style="border: 1px solid #333; padding: 10px;">A multimodal sequence-to-sequence Transformer model with the mT5 encoder-decoder architecture. It takes text tokens and ViT dense image embeddings as inputs to an encoder and autoregressively predicts discrete text and ink tokens with a decoder.</td>
|
71 |
</tr>
|
72 |
<tr>
|
73 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
74 |
<td style="border: 1px solid #333; padding: 10px;">A pair of image and text.</td>
|
75 |
</tr>
|
76 |
<tr>
|
77 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
78 |
<td style="border: 1px solid #333; padding: 10px;">Generated digital ink and text.</td>
|
79 |
</tr>
|
80 |
<tr>
|
81 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
82 |
<td style="border: 1px solid #333; padding: 10px;">
|
83 |
<strong>Application:</strong> The model is for research prototype, and the public version is released and available for the public.<br>
|
84 |
<strong>Known Caveats:</strong> None.
|
85 |
</td>
|
86 |
</tr>
|
87 |
<tr>
|
88 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
89 |
<td style="border: 1px solid #333; padding: 10px;">
|
90 |
<strong>System Description:</strong> This is a standalone model.<br>
|
91 |
<strong>Upstream Dependencies:</strong> None.<br>
|
@@ -93,7 +93,7 @@ output = cf(**{'input_text': input_text, 'image/encoded': image_encoded})
|
|
93 |
</td>
|
94 |
</tr>
|
95 |
<tr>
|
96 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
97 |
<td style="border: 1px solid #333; padding: 10px;">
|
98 |
<strong>Hardware & Software:</strong> Hardware: TPU v5e.<br>
|
99 |
Software: T5X , JAX/Flax, Flaxformer.<br>
|
@@ -101,19 +101,19 @@ output = cf(**{'input_text': input_text, 'image/encoded': image_encoded})
|
|
101 |
</td>
|
102 |
</tr>
|
103 |
<tr>
|
104 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
105 |
<td style="border: 1px solid #333; padding: 10px;">
|
106 |
<strong>Training Datasets:</strong> The ViT encoder of Small-p is pretrained on ImageNet-21k, mT5 encoder and decoder are initialized from scratch. The entire model is trained on the mixture of publicly available datasets described in next section.
|
107 |
</td>
|
108 |
</tr>
|
109 |
<tr>
|
110 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
111 |
<td style="border: 1px solid #333; padding: 10px;">
|
112 |
<strong>Evaluation Methods:</strong> Human evaluation (reported in Section 4.5.1 of the paper) and automated evaluations (reported in Section 4.5.2 of the paper).
|
113 |
</td>
|
114 |
</tr>
|
115 |
<tr>
|
116 |
-
<th style="width: 30%; border: 1px solid #333; padding: 10px;
|
117 |
<td style="border: 1px solid #333; padding: 10px;">
|
118 |
<strong>Sensitive Use:</strong> The model is capable of converting images to digital inks. This model should not be used for any of the privacy-intruding use cases, e.g., forging handwritings.<br>
|
119 |
<strong>Known Limitations:</strong> Reported in Appendix I of the paper.<br>
|
@@ -122,7 +122,6 @@ output = cf(**{'input_text': input_text, 'image/encoded': image_encoded})
|
|
122 |
</tr>
|
123 |
</table>
|
124 |
|
125 |
-
|
126 |
## Citation
|
127 |
|
128 |
If you find our work useful for your research and applications, please cite using this BibTeX:
|
|
|
66 |
|
67 |
<table style="width:100%; border-collapse: collapse; font-family: Arial, sans-serif;">
|
68 |
<tr>
|
69 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Model Architecture</th>
|
70 |
<td style="border: 1px solid #333; padding: 10px;">A multimodal sequence-to-sequence Transformer model with the mT5 encoder-decoder architecture. It takes text tokens and ViT dense image embeddings as inputs to an encoder and autoregressively predicts discrete text and ink tokens with a decoder.</td>
|
71 |
</tr>
|
72 |
<tr>
|
73 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Input(s)</th>
|
74 |
<td style="border: 1px solid #333; padding: 10px;">A pair of image and text.</td>
|
75 |
</tr>
|
76 |
<tr>
|
77 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Output(s)</th>
|
78 |
<td style="border: 1px solid #333; padding: 10px;">Generated digital ink and text.</td>
|
79 |
</tr>
|
80 |
<tr>
|
81 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Usage</th>
|
82 |
<td style="border: 1px solid #333; padding: 10px;">
|
83 |
<strong>Application:</strong> The model is for research prototype, and the public version is released and available for the public.<br>
|
84 |
<strong>Known Caveats:</strong> None.
|
85 |
</td>
|
86 |
</tr>
|
87 |
<tr>
|
88 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">System Type</th>
|
89 |
<td style="border: 1px solid #333; padding: 10px;">
|
90 |
<strong>System Description:</strong> This is a standalone model.<br>
|
91 |
<strong>Upstream Dependencies:</strong> None.<br>
|
|
|
93 |
</td>
|
94 |
</tr>
|
95 |
<tr>
|
96 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Implementation Frameworks</th>
|
97 |
<td style="border: 1px solid #333; padding: 10px;">
|
98 |
<strong>Hardware & Software:</strong> Hardware: TPU v5e.<br>
|
99 |
Software: T5X , JAX/Flax, Flaxformer.<br>
|
|
|
101 |
</td>
|
102 |
</tr>
|
103 |
<tr>
|
104 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Data Overview</th>
|
105 |
<td style="border: 1px solid #333; padding: 10px;">
|
106 |
<strong>Training Datasets:</strong> The ViT encoder of Small-p is pretrained on ImageNet-21k, mT5 encoder and decoder are initialized from scratch. The entire model is trained on the mixture of publicly available datasets described in next section.
|
107 |
</td>
|
108 |
</tr>
|
109 |
<tr>
|
110 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Evaluation Results</th>
|
111 |
<td style="border: 1px solid #333; padding: 10px;">
|
112 |
<strong>Evaluation Methods:</strong> Human evaluation (reported in Section 4.5.1 of the paper) and automated evaluations (reported in Section 4.5.2 of the paper).
|
113 |
</td>
|
114 |
</tr>
|
115 |
<tr>
|
116 |
+
<th style="width: 30%; border: 1px solid #333; padding: 10px;">Model Usage & Limitations</th>
|
117 |
<td style="border: 1px solid #333; padding: 10px;">
|
118 |
<strong>Sensitive Use:</strong> The model is capable of converting images to digital inks. This model should not be used for any of the privacy-intruding use cases, e.g., forging handwritings.<br>
|
119 |
<strong>Known Limitations:</strong> Reported in Appendix I of the paper.<br>
|
|
|
122 |
</tr>
|
123 |
</table>
|
124 |
|
|
|
125 |
## Citation
|
126 |
|
127 |
If you find our work useful for your research and applications, please cite using this BibTeX:
|