gpriday commited on
Commit
0cc27d5
·
verified ·
1 Parent(s): 6107426

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language: en
4
+ license: apache-2.0
5
+ base_model: google/bert_uncased_L-4_H-256_A-4
6
+ tags:
7
+ - tld
8
+ - embeddings
9
+ - domains
10
+ - multi-task-learning
11
+ - bert
12
+ pipeline_tag: feature-extraction
13
+ widget:
14
+ - text: "com"
15
+ - text: "io"
16
+ - text: "ai"
17
+ - text: "co.za"
18
+ model-index:
19
+ - name: TLD Embedding Model
20
+ results:
21
+ - task:
22
+ type: feature-extraction
23
+ name: TLD Embedding
24
+ metrics:
25
+ - type: spearman_correlation
26
+ value: 0.8976
27
+ name: Average Spearman Correlation
28
+ ---
29
+
30
+ # TLD Embedding Model
31
+
32
+ A state-of-the-art TLD (Top-Level Domain) embedding model that learns rich 96-dimensional representations from multiple data sources through multi-task learning. This model achieved an exceptional **0.8976 average Spearman correlation** across 63 features during training.
33
+
34
+ ## Model Overview
35
+
36
+ This TLD embedding model creates semantic representations by jointly learning from four complementary prediction tasks:
37
+
38
+ 1. **Research Metrics** (18 features): Brand perception, trust scores, memorability, premium brand indices
39
+ 2. **Technical Metrics** (5 features): Registration statistics, domain rankings, usage patterns
40
+ 3. **Economic Indicators** (21 features): Country-level GDP sector breakdowns mapped to TLD registries
41
+ 4. **Price Predictions** (18 features): Industry-specific market value scores from domain sales data
42
+
43
+ The model uses a shared BERT encoder with task-specific prediction heads, enabling the embeddings to capture semantic, technical, economic, and market value aspects of each TLD.
44
+
45
+ ## Training Performance
46
+
47
+ **Final Training Results (Epoch 25/25):**
48
+ - **Overall Average Score**: 0.8976 (89.76% Spearman correlation)
49
+ - **Training Loss**: 0.0034
50
+
51
+ **Task-Specific Performance:**
52
+ - **Research Task**: 0.80+ correlation on trust, adoption, and brand metrics
53
+ - **Technical Task**: 0.93-0.99 correlation on registration and ranking metrics
54
+ - **Economic Task**: 0.89-0.96 correlation on GDP sector predictions
55
+ - **Price Task**: 0.90-0.99 correlation on industry-specific price scores
56
+
57
+ **Best Individual Metrics:**
58
+ - `overall_score`: 0.990 Spearman correlation
59
+ - `global_top_1m_share`: 0.993 Spearman correlation
60
+ - `score_food`: 0.973 Spearman correlation
61
+ - `three_letter_registration_percent`: 0.969 Spearman correlation
62
+
63
+ ## Architecture
64
+
65
+ - **Base Model**: `google/bert_uncased_L-4_H-256_A-4` (Lightweight BERT)
66
+ - **Embedding Dimension**: 96 (optimized for data size)
67
+ - **Max Sequence Length**: 8 tokens (optimized for TLDs)
68
+ - **MLP Hidden Size**: 192 with 15% dropout
69
+ - **Task Weighting**: Research(0.25), Technical(0.20), Economic(0.15), Price(0.40)
70
+
71
+ ## Training Data Sources
72
+
73
+ ### Research Data (`tld_research_data.jsonl`)
74
+ - **Coverage**: 150 TLDs with research metrics
75
+ - **Features**: Trust scores, brand associations, memorability, adoption rates
76
+ - **Source**: Survey data, brand perception studies, market research
77
+
78
+ ### Technical Data (`tld_technical_data.jsonl`)
79
+ - **Coverage**: 716 TLDs with technical metrics
80
+ - **Features**: Registration patterns, domain rankings (Majestic), sales volumes
81
+ - **Source**: Registry statistics, web crawl data, domain marketplaces
82
+
83
+ ### Economic Data (`country_economic_data.jsonl`)
84
+ - **Coverage**: 126 TLDs mapped to country economies
85
+ - **Features**: GDP breakdowns by 21 industry sectors
86
+ - **Source**: World Bank, IMF economic data mapped to ccTLD registries
87
+
88
+ ### Price Data (`tld_price_scores_by_industry_2025.csv`)
89
+ - **Coverage**: 722 TLDs with price predictions
90
+ - **Features**: 18 industry-specific price scores plus overall score
91
+ - **Source**: Domain sales data processed through pairwise neural network (`compute_tld_scores_pairwise.py`)
92
+ - **Industries**: Finance, healthcare, technology, automotive, food, gaming, etc.
93
+
94
+ ## Installation & Usage
95
+
96
+ ### Loading the Model
97
+
98
+ ```python
99
+ from transformers import AutoTokenizer, AutoModel
100
+ import torch
101
+
102
+ # Load model and tokenizer
103
+ model_name = "humbleworth/tld-embedding"
104
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
105
+ model = AutoModel.from_pretrained(model_name)
106
+ model.eval()
107
+ ```
108
+
109
+ ### Getting TLD Embeddings
110
+
111
+ ```python
112
+ def get_tld_embedding(tld, model, tokenizer):
113
+ """Get 96-dimensional embedding for a single TLD"""
114
+ # Use special token format if available, otherwise prefix with dot
115
+ tld_text = f"[TLD_{tld}]" if f"[TLD_{tld}]" in tokenizer.vocab else f".{tld}"
116
+
117
+ inputs = tokenizer(
118
+ tld_text,
119
+ return_tensors="pt",
120
+ padding="max_length",
121
+ truncation=True,
122
+ max_length=8
123
+ )
124
+
125
+ with torch.no_grad():
126
+ outputs = model.encoder(**inputs)
127
+ cls_embedding = outputs.last_hidden_state[:, 0, :]
128
+ tld_embedding = model.projection(cls_embedding)
129
+
130
+ return tld_embedding.squeeze().numpy()
131
+
132
+ # Example
133
+ com_embedding = get_tld_embedding("com", model, tokenizer)
134
+ print(f"Embedding shape: {com_embedding.shape}") # (96,)
135
+ ```
136
+
137
+ ### Batch Processing
138
+
139
+ ```python
140
+ def get_tld_embeddings_batch(tlds, model, tokenizer):
141
+ """Get embeddings for multiple TLDs efficiently"""
142
+ # Use special token format if available, otherwise prefix with dot
143
+ tld_texts = [f"[TLD_{tld}]" if f"[TLD_{tld}]" in tokenizer.vocab else f".{tld}" for tld in tlds]
144
+
145
+ inputs = tokenizer(
146
+ tld_texts,
147
+ return_tensors="pt",
148
+ padding="max_length",
149
+ truncation=True,
150
+ max_length=8
151
+ )
152
+
153
+ with torch.no_grad():
154
+ outputs = model.encoder(**inputs)
155
+ cls_embeddings = outputs.last_hidden_state[:, 0, :]
156
+ tld_embeddings = model.projection(cls_embeddings)
157
+
158
+ return tld_embeddings.numpy()
159
+
160
+ # Process multiple TLDs
161
+ tlds = ["com", "io", "ai", "co.za", "tech"]
162
+ embeddings = get_tld_embeddings_batch(tlds, model, tokenizer)
163
+ print(f"Embeddings shape: {embeddings.shape}") # (5, 96)
164
+ ```
165
+
166
+ ## Key Features
167
+
168
+ ### Multi-Task Learning Benefits
169
+ - **Robust Representations**: Joint learning across diverse tasks creates more stable embeddings
170
+ - **Transfer Learning**: Knowledge from technical metrics improves price prediction and vice versa
171
+ - **Percentile Normalization**: All features converted to percentiles for balanced learning
172
+
173
+ ### Industry-Specific Intelligence
174
+ - **18 Industry Scores**: Specialized predictions for finance, technology, healthcare, etc.
175
+ - **Economic Mapping**: Country-level economic data enhances ccTLD understanding
176
+ - **Market Dynamics**: Real domain sales data captures market preferences
177
+
178
+ ### Technical Optimizations
179
+ - **MPS Support**: Optimized for Apple Silicon (M1/M2) training
180
+ - **Gradient Accumulation**: Stable training with effective batch size of 64
181
+ - **Early Stopping**: Prevents overfitting with patience-based stopping
182
+ - **Task Weighting**: Balanced learning prioritizing price prediction (40% weight)
183
+
184
+ ## Use Cases
185
+
186
+ 1. **Domain Valuation**: Use embeddings as features for ML-based domain appraisal
187
+ 2. **TLD Recommendation**: Find similar TLDs for branding or investment decisions
188
+ 3. **Market Analysis**: Cluster TLDs by business characteristics or market positioning
189
+ 4. **Portfolio Optimization**: Analyze TLD portfolios using semantic similarity
190
+ 5. **Cross-Market Analysis**: Compare TLD performance across different industries
191
+
192
+ ## Training Configuration
193
+
194
+ **Optimal Hyperparameters (Based on Data Analysis):**
195
+ - Epochs: 25 (early stopping at patience=5)
196
+ - Batch Size: 16 (effective 64 with accumulation)
197
+ - Learning Rate: 5e-4 with warmup
198
+ - Warmup Steps: 200
199
+ - Gradient Accumulation: 4 steps
200
+ - Dropout: 15%
201
+
202
+ **Training Command:**
203
+ ```bash
204
+ python train_dual_task_embeddings.py \
205
+ --epochs 25 \
206
+ --batch-size 16 \
207
+ --learning-rate 5e-4 \
208
+ --warmup-steps 200 \
209
+ --output-dir models/tld_embedding_model
210
+ ```
211
+
212
+ ## Model Files
213
+
214
+ ```
215
+ tld_embedding_model/
216
+ ├── config.json # Model configuration
217
+ ├── pytorch_model.bin # Model weights
218
+ ├── tokenizer.json # Tokenizer
219
+ ├── tokenizer_config.json # Tokenizer config
220
+ ├── vocab.txt # Vocabulary
221
+ ├── special_tokens_map.json # Special tokens
222
+ ├── training_metrics.pt # Training metrics
223
+ ├── tld_embeddings.json # Pre-computed embeddings
224
+ └── README.md # This file
225
+ ```
226
+
227
+ ## Citation
228
+
229
+ If you use this model in your research, please cite:
230
+
231
+ ```bibtex
232
+ @software{tld_embedding_2025,
233
+ title = {TLD Embedding Model: Multi-Task Learning for Domain Extensions},
234
+ author = {HumbleWorth},
235
+ year = {2025},
236
+ note = {Achieved 0.8976 average Spearman correlation across 63 features},
237
+ url = {https://huggingface.co/humbleworth/tld-embedding}
238
+ }
239
+ ```
240
+
241
+ ## License
242
+
243
+ This model is released under the Apache 2.0 License.
added_tokens.json ADDED
@@ -0,0 +1,684 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "[TLD_ac.at]": 30523,
3
+ "[TLD_ac.cn]": 30524,
4
+ "[TLD_ac.id]": 30525,
5
+ "[TLD_ac.il]": 30526,
6
+ "[TLD_ac.in]": 30527,
7
+ "[TLD_ac.ir]": 30528,
8
+ "[TLD_ac.jp]": 30529,
9
+ "[TLD_ac.ke]": 30530,
10
+ "[TLD_ac.kr]": 30531,
11
+ "[TLD_ac.nz]": 30532,
12
+ "[TLD_ac.th]": 30533,
13
+ "[TLD_ac.uk]": 30534,
14
+ "[TLD_ac.za]": 30535,
15
+ "[TLD_ac]": 30522,
16
+ "[TLD_academy]": 30536,
17
+ "[TLD_accountants]": 30537,
18
+ "[TLD_ad]": 30538,
19
+ "[TLD_ae]": 30539,
20
+ "[TLD_aero]": 30540,
21
+ "[TLD_africa]": 30541,
22
+ "[TLD_ag]": 30542,
23
+ "[TLD_agency]": 30543,
24
+ "[TLD_ai]": 30544,
25
+ "[TLD_al]": 30545,
26
+ "[TLD_am]": 30546,
27
+ "[TLD_apartments]": 30547,
28
+ "[TLD_app]": 30548,
29
+ "[TLD_ar]": 30549,
30
+ "[TLD_archi]": 30550,
31
+ "[TLD_art]": 30551,
32
+ "[TLD_as]": 30552,
33
+ "[TLD_asia]": 30553,
34
+ "[TLD_asn.au]": 30554,
35
+ "[TLD_associates]": 30555,
36
+ "[TLD_at]": 30556,
37
+ "[TLD_au]": 30557,
38
+ "[TLD_auction]": 30558,
39
+ "[TLD_audio]": 30559,
40
+ "[TLD_autos]": 30560,
41
+ "[TLD_az]": 30561,
42
+ "[TLD_ba]": 30562,
43
+ "[TLD_baby]": 30563,
44
+ "[TLD_band]": 30564,
45
+ "[TLD_bank]": 30565,
46
+ "[TLD_bar]": 30566,
47
+ "[TLD_bargains]": 30567,
48
+ "[TLD_be]": 30568,
49
+ "[TLD_beauty]": 30569,
50
+ "[TLD_beer]": 30570,
51
+ "[TLD_bel.tr]": 30571,
52
+ "[TLD_berlin]": 30572,
53
+ "[TLD_best]": 30573,
54
+ "[TLD_bet]": 30574,
55
+ "[TLD_bg]": 30575,
56
+ "[TLD_bid]": 30576,
57
+ "[TLD_bike]": 30577,
58
+ "[TLD_bingo]": 30578,
59
+ "[TLD_bio]": 30579,
60
+ "[TLD_biz.pl]": 30581,
61
+ "[TLD_biz]": 30580,
62
+ "[TLD_black]": 30582,
63
+ "[TLD_blog]": 30583,
64
+ "[TLD_blue]": 30584,
65
+ "[TLD_bo]": 30585,
66
+ "[TLD_boats]": 30586,
67
+ "[TLD_bond]": 30587,
68
+ "[TLD_boston]": 30588,
69
+ "[TLD_boutique]": 30589,
70
+ "[TLD_br]": 30590,
71
+ "[TLD_brussels]": 30591,
72
+ "[TLD_builders]": 30592,
73
+ "[TLD_business]": 30593,
74
+ "[TLD_buzz]": 30594,
75
+ "[TLD_by]": 30595,
76
+ "[TLD_bz]": 30596,
77
+ "[TLD_bzh]": 30597,
78
+ "[TLD_ca]": 30598,
79
+ "[TLD_cab]": 30599,
80
+ "[TLD_cafe]": 30600,
81
+ "[TLD_cam]": 30601,
82
+ "[TLD_camera]": 30602,
83
+ "[TLD_camp]": 30603,
84
+ "[TLD_capital]": 30604,
85
+ "[TLD_cards]": 30605,
86
+ "[TLD_care]": 30606,
87
+ "[TLD_casa]": 30607,
88
+ "[TLD_cash]": 30608,
89
+ "[TLD_casino]": 30609,
90
+ "[TLD_cat]": 30610,
91
+ "[TLD_cc]": 30611,
92
+ "[TLD_cd]": 30612,
93
+ "[TLD_center]": 30613,
94
+ "[TLD_ceo]": 30614,
95
+ "[TLD_cf]": 30615,
96
+ "[TLD_cfd]": 30616,
97
+ "[TLD_ch]": 30617,
98
+ "[TLD_charity]": 30618,
99
+ "[TLD_chat]": 30619,
100
+ "[TLD_cheap]": 30620,
101
+ "[TLD_christmas]": 30621,
102
+ "[TLD_church]": 30622,
103
+ "[TLD_ci]": 30623,
104
+ "[TLD_city]": 30624,
105
+ "[TLD_cl]": 30625,
106
+ "[TLD_claims]": 30626,
107
+ "[TLD_click]": 30627,
108
+ "[TLD_clinic]": 30628,
109
+ "[TLD_clothing]": 30629,
110
+ "[TLD_cloud]": 30630,
111
+ "[TLD_club]": 30631,
112
+ "[TLD_cm]": 30632,
113
+ "[TLD_cn]": 30633,
114
+ "[TLD_co.at]": 30635,
115
+ "[TLD_co.id]": 30636,
116
+ "[TLD_co.il]": 30637,
117
+ "[TLD_co.in]": 30638,
118
+ "[TLD_co.jp]": 30639,
119
+ "[TLD_co.ke]": 30640,
120
+ "[TLD_co.kr]": 30641,
121
+ "[TLD_co.nz]": 30642,
122
+ "[TLD_co.th]": 30643,
123
+ "[TLD_co.tz]": 30644,
124
+ "[TLD_co.uk]": 30645,
125
+ "[TLD_co.za]": 30646,
126
+ "[TLD_co.zw]": 30647,
127
+ "[TLD_co]": 30634,
128
+ "[TLD_coach]": 30648,
129
+ "[TLD_codes]": 30649,
130
+ "[TLD_coffee]": 30650,
131
+ "[TLD_com.ar]": 30652,
132
+ "[TLD_com.au]": 30653,
133
+ "[TLD_com.az]": 30654,
134
+ "[TLD_com.bd]": 30655,
135
+ "[TLD_com.br]": 30656,
136
+ "[TLD_com.bz]": 30657,
137
+ "[TLD_com.cn]": 30658,
138
+ "[TLD_com.co]": 30659,
139
+ "[TLD_com.cy]": 30660,
140
+ "[TLD_com.do]": 30661,
141
+ "[TLD_com.ec]": 30662,
142
+ "[TLD_com.eg]": 30663,
143
+ "[TLD_com.es]": 30664,
144
+ "[TLD_com.gh]": 30665,
145
+ "[TLD_com.hk]": 30666,
146
+ "[TLD_com.in]": 30667,
147
+ "[TLD_com.kg]": 30668,
148
+ "[TLD_com.mt]": 30669,
149
+ "[TLD_com.mx]": 30670,
150
+ "[TLD_com.my]": 30671,
151
+ "[TLD_com.ng]": 30672,
152
+ "[TLD_com.np]": 30673,
153
+ "[TLD_com.pe]": 30674,
154
+ "[TLD_com.ph]": 30675,
155
+ "[TLD_com.pk]": 30676,
156
+ "[TLD_com.pl]": 30677,
157
+ "[TLD_com.py]": 30678,
158
+ "[TLD_com.sa]": 30679,
159
+ "[TLD_com.sg]": 30680,
160
+ "[TLD_com.tr]": 30681,
161
+ "[TLD_com.tw]": 30682,
162
+ "[TLD_com.ua]": 30683,
163
+ "[TLD_com.uy]": 30684,
164
+ "[TLD_com.vc]": 30685,
165
+ "[TLD_com.ve]": 30686,
166
+ "[TLD_com.vn]": 30687,
167
+ "[TLD_com]": 30651,
168
+ "[TLD_community]": 30688,
169
+ "[TLD_company]": 30689,
170
+ "[TLD_computer]": 30690,
171
+ "[TLD_construction]": 30691,
172
+ "[TLD_consulting]": 30692,
173
+ "[TLD_contact]": 30693,
174
+ "[TLD_contractors]": 30694,
175
+ "[TLD_cooking]": 30695,
176
+ "[TLD_cool]": 30696,
177
+ "[TLD_coop]": 30697,
178
+ "[TLD_country]": 30698,
179
+ "[TLD_coupons]": 30699,
180
+ "[TLD_credit]": 30700,
181
+ "[TLD_cruises]": 30701,
182
+ "[TLD_cu]": 30702,
183
+ "[TLD_cx]": 30703,
184
+ "[TLD_cyou]": 30704,
185
+ "[TLD_cz]": 30705,
186
+ "[TLD_dance]": 30706,
187
+ "[TLD_date]": 30707,
188
+ "[TLD_dating]": 30708,
189
+ "[TLD_de]": 30709,
190
+ "[TLD_deals]": 30710,
191
+ "[TLD_delivery]": 30711,
192
+ "[TLD_dental]": 30712,
193
+ "[TLD_desi]": 30713,
194
+ "[TLD_design]": 30714,
195
+ "[TLD_dev]": 30715,
196
+ "[TLD_diamonds]": 30716,
197
+ "[TLD_diet]": 30717,
198
+ "[TLD_digital]": 30718,
199
+ "[TLD_direct]": 30719,
200
+ "[TLD_directory]": 30720,
201
+ "[TLD_discount]": 30721,
202
+ "[TLD_dj]": 30722,
203
+ "[TLD_dk]": 30723,
204
+ "[TLD_do]": 30724,
205
+ "[TLD_doctor]": 30725,
206
+ "[TLD_dog]": 30726,
207
+ "[TLD_domains]": 30727,
208
+ "[TLD_download]": 30728,
209
+ "[TLD_dz]": 30729,
210
+ "[TLD_earth]": 30730,
211
+ "[TLD_ec]": 30731,
212
+ "[TLD_eco]": 30732,
213
+ "[TLD_ed.jp]": 30733,
214
+ "[TLD_edu.ar]": 30735,
215
+ "[TLD_edu.au]": 30736,
216
+ "[TLD_edu.br]": 30737,
217
+ "[TLD_edu.cn]": 30738,
218
+ "[TLD_edu.co]": 30739,
219
+ "[TLD_edu.ec]": 30740,
220
+ "[TLD_edu.eg]": 30741,
221
+ "[TLD_edu.hk]": 30742,
222
+ "[TLD_edu.in]": 30743,
223
+ "[TLD_edu.mx]": 30744,
224
+ "[TLD_edu.my]": 30745,
225
+ "[TLD_edu.ng]": 30746,
226
+ "[TLD_edu.pe]": 30747,
227
+ "[TLD_edu.ph]": 30748,
228
+ "[TLD_edu.pk]": 30749,
229
+ "[TLD_edu.pl]": 30750,
230
+ "[TLD_edu.sa]": 30751,
231
+ "[TLD_edu.sg]": 30752,
232
+ "[TLD_edu.tr]": 30753,
233
+ "[TLD_edu.tw]": 30754,
234
+ "[TLD_edu.ua]": 30755,
235
+ "[TLD_edu.uy]": 30756,
236
+ "[TLD_edu.vn]": 30757,
237
+ "[TLD_edu]": 30734,
238
+ "[TLD_education]": 30758,
239
+ "[TLD_ee]": 30759,
240
+ "[TLD_email]": 30760,
241
+ "[TLD_energy]": 30761,
242
+ "[TLD_engineering]": 30762,
243
+ "[TLD_enterprises]": 30763,
244
+ "[TLD_equipment]": 30764,
245
+ "[TLD_es]": 30765,
246
+ "[TLD_estate]": 30766,
247
+ "[TLD_et]": 30767,
248
+ "[TLD_eu]": 30768,
249
+ "[TLD_eus]": 30769,
250
+ "[TLD_events]": 30770,
251
+ "[TLD_exchange]": 30771,
252
+ "[TLD_expert]": 30772,
253
+ "[TLD_exposed]": 30773,
254
+ "[TLD_express]": 30774,
255
+ "[TLD_fail]": 30775,
256
+ "[TLD_faith]": 30776,
257
+ "[TLD_family]": 30777,
258
+ "[TLD_fan]": 30778,
259
+ "[TLD_fans]": 30779,
260
+ "[TLD_farm]": 30780,
261
+ "[TLD_fashion]": 30781,
262
+ "[TLD_fi]": 30782,
263
+ "[TLD_finance]": 30783,
264
+ "[TLD_financial]": 30784,
265
+ "[TLD_fish]": 30785,
266
+ "[TLD_fit]": 30786,
267
+ "[TLD_fitness]": 30787,
268
+ "[TLD_flowers]": 30788,
269
+ "[TLD_fm]": 30789,
270
+ "[TLD_football]": 30790,
271
+ "[TLD_forsale]": 30791,
272
+ "[TLD_foundation]": 30792,
273
+ "[TLD_fr]": 30793,
274
+ "[TLD_fun]": 30794,
275
+ "[TLD_fund]": 30795,
276
+ "[TLD_furniture]": 30796,
277
+ "[TLD_fyi]": 30797,
278
+ "[TLD_gal]": 30798,
279
+ "[TLD_gallery]": 30799,
280
+ "[TLD_game]": 30800,
281
+ "[TLD_games]": 30801,
282
+ "[TLD_garden]": 30802,
283
+ "[TLD_gd]": 30803,
284
+ "[TLD_gdn]": 30804,
285
+ "[TLD_ge]": 30805,
286
+ "[TLD_gen.tr]": 30806,
287
+ "[TLD_gg]": 30807,
288
+ "[TLD_gift]": 30808,
289
+ "[TLD_gifts]": 30809,
290
+ "[TLD_gives]": 30810,
291
+ "[TLD_gl]": 30811,
292
+ "[TLD_glass]": 30812,
293
+ "[TLD_global]": 30813,
294
+ "[TLD_go.jp]": 30814,
295
+ "[TLD_gold]": 30815,
296
+ "[TLD_golf]": 30816,
297
+ "[TLD_google]": 30817,
298
+ "[TLD_gov.ae]": 30819,
299
+ "[TLD_gov.ar]": 30820,
300
+ "[TLD_gov.au]": 30821,
301
+ "[TLD_gov.bd]": 30822,
302
+ "[TLD_gov.br]": 30823,
303
+ "[TLD_gov.by]": 30824,
304
+ "[TLD_gov.cn]": 30825,
305
+ "[TLD_gov.co]": 30826,
306
+ "[TLD_gov.eg]": 30827,
307
+ "[TLD_gov.gr]": 30828,
308
+ "[TLD_gov.hk]": 30829,
309
+ "[TLD_gov.il]": 30830,
310
+ "[TLD_gov.in]": 30831,
311
+ "[TLD_gov.it]": 30832,
312
+ "[TLD_gov.kh]": 30833,
313
+ "[TLD_gov.lk]": 30834,
314
+ "[TLD_gov.lv]": 30835,
315
+ "[TLD_gov.mo]": 30836,
316
+ "[TLD_gov.my]": 30837,
317
+ "[TLD_gov.ng]": 30838,
318
+ "[TLD_gov.np]": 30839,
319
+ "[TLD_gov.ph]": 30840,
320
+ "[TLD_gov.pk]": 30841,
321
+ "[TLD_gov.pl]": 30842,
322
+ "[TLD_gov.pt]": 30843,
323
+ "[TLD_gov.rs]": 30844,
324
+ "[TLD_gov.sa]": 30845,
325
+ "[TLD_gov.sg]": 30846,
326
+ "[TLD_gov.tr]": 30847,
327
+ "[TLD_gov.tw]": 30848,
328
+ "[TLD_gov.ua]": 30849,
329
+ "[TLD_gov.uk]": 30850,
330
+ "[TLD_gov.vn]": 30851,
331
+ "[TLD_gov.za]": 30852,
332
+ "[TLD_gov]": 30818,
333
+ "[TLD_gr.jp]": 30854,
334
+ "[TLD_gr]": 30853,
335
+ "[TLD_graphics]": 30855,
336
+ "[TLD_gratis]": 30856,
337
+ "[TLD_green]": 30857,
338
+ "[TLD_group]": 30858,
339
+ "[TLD_gs]": 30859,
340
+ "[TLD_guide]": 30860,
341
+ "[TLD_guru]": 30861,
342
+ "[TLD_gy]": 30862,
343
+ "[TLD_hair]": 30863,
344
+ "[TLD_haus]": 30864,
345
+ "[TLD_health]": 30865,
346
+ "[TLD_healthcare]": 30866,
347
+ "[TLD_help]": 30867,
348
+ "[TLD_hk]": 30868,
349
+ "[TLD_hn]": 30869,
350
+ "[TLD_hockey]": 30870,
351
+ "[TLD_holdings]": 30871,
352
+ "[TLD_holiday]": 30872,
353
+ "[TLD_homes]": 30873,
354
+ "[TLD_horse]": 30874,
355
+ "[TLD_host]": 30875,
356
+ "[TLD_hosting]": 30876,
357
+ "[TLD_house]": 30877,
358
+ "[TLD_hr]": 30878,
359
+ "[TLD_ht]": 30879,
360
+ "[TLD_hu]": 30880,
361
+ "[TLD_icu]": 30881,
362
+ "[TLD_id]": 30882,
363
+ "[TLD_ie]": 30883,
364
+ "[TLD_im]": 30884,
365
+ "[TLD_immobilien]": 30885,
366
+ "[TLD_in]": 30886,
367
+ "[TLD_inc]": 30887,
368
+ "[TLD_info.pl]": 30889,
369
+ "[TLD_info]": 30888,
370
+ "[TLD_ink]": 30890,
371
+ "[TLD_institute]": 30891,
372
+ "[TLD_insure]": 30892,
373
+ "[TLD_int]": 30893,
374
+ "[TLD_international]": 30894,
375
+ "[TLD_investments]": 30895,
376
+ "[TLD_io]": 30896,
377
+ "[TLD_ir]": 30897,
378
+ "[TLD_irish]": 30898,
379
+ "[TLD_is]": 30899,
380
+ "[TLD_it]": 30900,
381
+ "[TLD_je]": 30901,
382
+ "[TLD_jewelry]": 30902,
383
+ "[TLD_jobs]": 30903,
384
+ "[TLD_jp]": 30904,
385
+ "[TLD_kaufen]": 30905,
386
+ "[TLD_kg]": 30906,
387
+ "[TLD_kim]": 30907,
388
+ "[TLD_kitchen]": 30908,
389
+ "[TLD_kr]": 30909,
390
+ "[TLD_kz]": 30910,
391
+ "[TLD_la]": 30911,
392
+ "[TLD_land]": 30912,
393
+ "[TLD_lat]": 30913,
394
+ "[TLD_law]": 30914,
395
+ "[TLD_lawyer]": 30915,
396
+ "[TLD_lc]": 30916,
397
+ "[TLD_lease]": 30917,
398
+ "[TLD_legal]": 30918,
399
+ "[TLD_lg.jp]": 30919,
400
+ "[TLD_lgbt]": 30920,
401
+ "[TLD_li]": 30921,
402
+ "[TLD_life]": 30922,
403
+ "[TLD_lighting]": 30923,
404
+ "[TLD_limited]": 30924,
405
+ "[TLD_limo]": 30925,
406
+ "[TLD_link]": 30926,
407
+ "[TLD_live]": 30927,
408
+ "[TLD_lk]": 30928,
409
+ "[TLD_llc]": 30929,
410
+ "[TLD_loan]": 30930,
411
+ "[TLD_loans]": 30931,
412
+ "[TLD_lol]": 30932,
413
+ "[TLD_london]": 30933,
414
+ "[TLD_love]": 30934,
415
+ "[TLD_lt]": 30935,
416
+ "[TLD_ltd]": 30936,
417
+ "[TLD_lu]": 30937,
418
+ "[TLD_luxe]": 30938,
419
+ "[TLD_luxury]": 30939,
420
+ "[TLD_lv]": 30940,
421
+ "[TLD_ly]": 30941,
422
+ "[TLD_ma]": 30942,
423
+ "[TLD_makeup]": 30943,
424
+ "[TLD_management]": 30944,
425
+ "[TLD_market]": 30945,
426
+ "[TLD_marketing]": 30946,
427
+ "[TLD_mba]": 30947,
428
+ "[TLD_md]": 30948,
429
+ "[TLD_me.uk]": 30950,
430
+ "[TLD_me]": 30949,
431
+ "[TLD_media.pl]": 30952,
432
+ "[TLD_media]": 30951,
433
+ "[TLD_men]": 30953,
434
+ "[TLD_miami]": 30954,
435
+ "[TLD_mil]": 30955,
436
+ "[TLD_mk]": 30956,
437
+ "[TLD_ml]": 30957,
438
+ "[TLD_mn]": 30958,
439
+ "[TLD_mobi]": 30959,
440
+ "[TLD_moda]": 30960,
441
+ "[TLD_moe]": 30961,
442
+ "[TLD_mom]": 30962,
443
+ "[TLD_money]": 30963,
444
+ "[TLD_monster]": 30964,
445
+ "[TLD_mortgage]": 30965,
446
+ "[TLD_motorcycles]": 30966,
447
+ "[TLD_movie]": 30967,
448
+ "[TLD_ms]": 30968,
449
+ "[TLD_mt]": 30969,
450
+ "[TLD_mu]": 30970,
451
+ "[TLD_museum]": 30971,
452
+ "[TLD_mx]": 30972,
453
+ "[TLD_my]": 30973,
454
+ "[TLD_name]": 30974,
455
+ "[TLD_ne.jp]": 30975,
456
+ "[TLD_net.au]": 30977,
457
+ "[TLD_net.br]": 30978,
458
+ "[TLD_net.cn]": 30979,
459
+ "[TLD_net.co]": 30980,
460
+ "[TLD_net.nz]": 30981,
461
+ "[TLD_net.ph]": 30982,
462
+ "[TLD_net.pl]": 30983,
463
+ "[TLD_net.ua]": 30984,
464
+ "[TLD_net]": 30976,
465
+ "[TLD_network]": 30985,
466
+ "[TLD_news]": 30986,
467
+ "[TLD_nf]": 30987,
468
+ "[TLD_ng]": 30988,
469
+ "[TLD_ngo]": 30989,
470
+ "[TLD_nhs.uk]": 30990,
471
+ "[TLD_ninja]": 30991,
472
+ "[TLD_nl]": 30992,
473
+ "[TLD_no]": 30993,
474
+ "[TLD_nrw]": 30994,
475
+ "[TLD_nu]": 30995,
476
+ "[TLD_nyc]": 30996,
477
+ "[TLD_nz]": 30997,
478
+ "[TLD_olsztyn.pl]": 30998,
479
+ "[TLD_one]": 30999,
480
+ "[TLD_onl]": 31000,
481
+ "[TLD_online]": 31001,
482
+ "[TLD_ooo]": 31002,
483
+ "[TLD_opole.pl]": 31003,
484
+ "[TLD_or.at]": 31004,
485
+ "[TLD_or.jp]": 31005,
486
+ "[TLD_org.ar]": 31007,
487
+ "[TLD_org.au]": 31008,
488
+ "[TLD_org.br]": 31009,
489
+ "[TLD_org.cn]": 31010,
490
+ "[TLD_org.co]": 31011,
491
+ "[TLD_org.es]": 31012,
492
+ "[TLD_org.hk]": 31013,
493
+ "[TLD_org.il]": 31014,
494
+ "[TLD_org.in]": 31015,
495
+ "[TLD_org.mx]": 31016,
496
+ "[TLD_org.my]": 31017,
497
+ "[TLD_org.nz]": 31018,
498
+ "[TLD_org.pe]": 31019,
499
+ "[TLD_org.ph]": 31020,
500
+ "[TLD_org.pk]": 31021,
501
+ "[TLD_org.pl]": 31022,
502
+ "[TLD_org.sg]": 31023,
503
+ "[TLD_org.tr]": 31024,
504
+ "[TLD_org.tw]": 31025,
505
+ "[TLD_org.ua]": 31026,
506
+ "[TLD_org.uk]": 31027,
507
+ "[TLD_org.za]": 31028,
508
+ "[TLD_org]": 31006,
509
+ "[TLD_organic]": 31029,
510
+ "[TLD_page]": 31030,
511
+ "[TLD_paris]": 31031,
512
+ "[TLD_partners]": 31032,
513
+ "[TLD_parts]": 31033,
514
+ "[TLD_party]": 31034,
515
+ "[TLD_pe]": 31035,
516
+ "[TLD_pet]": 31036,
517
+ "[TLD_ph]": 31037,
518
+ "[TLD_photo]": 31038,
519
+ "[TLD_photography]": 31039,
520
+ "[TLD_photos]": 31040,
521
+ "[TLD_pics]": 31041,
522
+ "[TLD_pictures]": 31042,
523
+ "[TLD_pink]": 31043,
524
+ "[TLD_pizza]": 31044,
525
+ "[TLD_pk]": 31045,
526
+ "[TLD_pl]": 31046,
527
+ "[TLD_place]": 31047,
528
+ "[TLD_plus]": 31048,
529
+ "[TLD_pm]": 31049,
530
+ "[TLD_poker]": 31050,
531
+ "[TLD_police.uk]": 31051,
532
+ "[TLD_porn]": 31052,
533
+ "[TLD_press]": 31053,
534
+ "[TLD_pro]": 31054,
535
+ "[TLD_promo]": 31055,
536
+ "[TLD_properties]": 31056,
537
+ "[TLD_property]": 31057,
538
+ "[TLD_ps]": 31058,
539
+ "[TLD_pt]": 31059,
540
+ "[TLD_pub]": 31060,
541
+ "[TLD_pw]": 31061,
542
+ "[TLD_qa]": 31062,
543
+ "[TLD_quest]": 31063,
544
+ "[TLD_re]": 31064,
545
+ "[TLD_recipes]": 31065,
546
+ "[TLD_red]": 31066,
547
+ "[TLD_rent]": 31067,
548
+ "[TLD_rentals]": 31068,
549
+ "[TLD_repair]": 31069,
550
+ "[TLD_report]": 31070,
551
+ "[TLD_rest]": 31071,
552
+ "[TLD_restaurant]": 31072,
553
+ "[TLD_review]": 31073,
554
+ "[TLD_reviews]": 31074,
555
+ "[TLD_rip]": 31075,
556
+ "[TLD_ro]": 31076,
557
+ "[TLD_rocks]": 31077,
558
+ "[TLD_rs]": 31078,
559
+ "[TLD_ru]": 31079,
560
+ "[TLD_run]": 31080,
561
+ "[TLD_rzeszow.pl]": 31081,
562
+ "[TLD_sa]": 31082,
563
+ "[TLD_sale]": 31083,
564
+ "[TLD_salon]": 31084,
565
+ "[TLD_sbs]": 31085,
566
+ "[TLD_sc]": 31086,
567
+ "[TLD_school]": 31087,
568
+ "[TLD_science]": 31088,
569
+ "[TLD_scot]": 31089,
570
+ "[TLD_se]": 31090,
571
+ "[TLD_services]": 31091,
572
+ "[TLD_sex]": 31092,
573
+ "[TLD_sexy]": 31093,
574
+ "[TLD_sg]": 31094,
575
+ "[TLD_sh]": 31095,
576
+ "[TLD_shoes]": 31096,
577
+ "[TLD_shop]": 31097,
578
+ "[TLD_shopping]": 31098,
579
+ "[TLD_show]": 31099,
580
+ "[TLD_si]": 31100,
581
+ "[TLD_singles]": 31101,
582
+ "[TLD_site]": 31102,
583
+ "[TLD_sk]": 31103,
584
+ "[TLD_ski]": 31104,
585
+ "[TLD_skin]": 31105,
586
+ "[TLD_sklep.pl]": 31106,
587
+ "[TLD_sn]": 31107,
588
+ "[TLD_so]": 31108,
589
+ "[TLD_soccer]": 31109,
590
+ "[TLD_social]": 31110,
591
+ "[TLD_software]": 31111,
592
+ "[TLD_solar]": 31112,
593
+ "[TLD_solutions]": 31113,
594
+ "[TLD_space]": 31114,
595
+ "[TLD_st]": 31115,
596
+ "[TLD_store]": 31116,
597
+ "[TLD_stream]": 31117,
598
+ "[TLD_studio]": 31118,
599
+ "[TLD_style]": 31119,
600
+ "[TLD_su]": 31120,
601
+ "[TLD_supplies]": 31121,
602
+ "[TLD_supply]": 31122,
603
+ "[TLD_support]": 31123,
604
+ "[TLD_surf]": 31124,
605
+ "[TLD_surgery]": 31125,
606
+ "[TLD_swiss]": 31126,
607
+ "[TLD_sx]": 31127,
608
+ "[TLD_systems]": 31128,
609
+ "[TLD_tax]": 31129,
610
+ "[TLD_taxi]": 31130,
611
+ "[TLD_tc]": 31131,
612
+ "[TLD_team]": 31132,
613
+ "[TLD_tech]": 31133,
614
+ "[TLD_technology]": 31134,
615
+ "[TLD_tel]": 31135,
616
+ "[TLD_tips]": 31136,
617
+ "[TLD_tires]": 31137,
618
+ "[TLD_tj]": 31138,
619
+ "[TLD_tk]": 31139,
620
+ "[TLD_tl]": 31140,
621
+ "[TLD_tm]": 31141,
622
+ "[TLD_tn]": 31142,
623
+ "[TLD_to]": 31143,
624
+ "[TLD_today]": 31144,
625
+ "[TLD_tokyo]": 31145,
626
+ "[TLD_tools]": 31146,
627
+ "[TLD_top]": 31147,
628
+ "[TLD_tours]": 31148,
629
+ "[TLD_town]": 31149,
630
+ "[TLD_toys]": 31150,
631
+ "[TLD_trade]": 31151,
632
+ "[TLD_training]": 31152,
633
+ "[TLD_travel]": 31153,
634
+ "[TLD_tube]": 31154,
635
+ "[TLD_tv]": 31155,
636
+ "[TLD_tw]": 31156,
637
+ "[TLD_ua]": 31157,
638
+ "[TLD_ug]": 31158,
639
+ "[TLD_uk]": 31159,
640
+ "[TLD_university]": 31160,
641
+ "[TLD_uno]": 31161,
642
+ "[TLD_us]": 31162,
643
+ "[TLD_uz]": 31163,
644
+ "[TLD_va]": 31164,
645
+ "[TLD_vacations]": 31165,
646
+ "[TLD_vc]": 31166,
647
+ "[TLD_vegas]": 31167,
648
+ "[TLD_ventures]": 31168,
649
+ "[TLD_vet]": 31169,
650
+ "[TLD_vg]": 31170,
651
+ "[TLD_video]": 31171,
652
+ "[TLD_vin]": 31172,
653
+ "[TLD_vip]": 31173,
654
+ "[TLD_vision]": 31174,
655
+ "[TLD_vn]": 31175,
656
+ "[TLD_voyage]": 31176,
657
+ "[TLD_vu]": 31177,
658
+ "[TLD_wales]": 31178,
659
+ "[TLD_wang]": 31179,
660
+ "[TLD_warszawa.pl]": 31180,
661
+ "[TLD_watch]": 31181,
662
+ "[TLD_waw.pl]": 31182,
663
+ "[TLD_website]": 31183,
664
+ "[TLD_wedding]": 31184,
665
+ "[TLD_wiki]": 31185,
666
+ "[TLD_win]": 31186,
667
+ "[TLD_wine]": 31187,
668
+ "[TLD_work]": 31188,
669
+ "[TLD_works]": 31189,
670
+ "[TLD_world]": 31190,
671
+ "[TLD_wroclaw.pl]": 31191,
672
+ "[TLD_ws]": 31192,
673
+ "[TLD_wtf]": 31193,
674
+ "[TLD_xn--3ds443g]": 31194,
675
+ "[TLD_xn--90ais]": 31195,
676
+ "[TLD_xn--c1avg]": 31196,
677
+ "[TLD_xn--p1ai]": 31197,
678
+ "[TLD_xn--tckwe]": 31198,
679
+ "[TLD_xxx]": 31199,
680
+ "[TLD_xyz]": 31200,
681
+ "[TLD_yachts]": 31201,
682
+ "[TLD_yoga]": 31202,
683
+ "[TLD_zone]": 31203
684
+ }
config.json ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DualTaskTLDModel"
4
+ ],
5
+ "base_model_name": "google/bert_uncased_L-4_H-256_A-4",
6
+ "categorical_features": [],
7
+ "categorical_mappings": {},
8
+ "continuous_features": [
9
+ "abuse_rate_percent",
10
+ "avg_transfer_days",
11
+ "brand_association_strength",
12
+ "community_depth_score",
13
+ "dnssec_adoption_percent",
14
+ "four_letter_registration_percent",
15
+ "global_top_1m_share",
16
+ "hack_usage_popularity",
17
+ "influencer_adoption_rate",
18
+ "innovation_perception_score",
19
+ "iso_country_code",
20
+ "majestic_top_1m_count",
21
+ "market_momentum_score",
22
+ "media_sentiment_score",
23
+ "memorability_score",
24
+ "premium_brand_index",
25
+ "professional_usage_rate",
26
+ "registration_restrictions",
27
+ "registry_marketing_activity",
28
+ "reputation_trust_score",
29
+ "sales_10y_above_10_count",
30
+ "tech_startup_adoption_index",
31
+ "three_letter_registration_percent",
32
+ "tld_class"
33
+ ],
34
+ "economic_features": [
35
+ "gdp_total",
36
+ "healthcare_pharmaceuticals",
37
+ "banking_capital_markets",
38
+ "insurance",
39
+ "investment_wealth_management",
40
+ "education_edtech",
41
+ "retail_ecommerce",
42
+ "consumer_packaged_goods",
43
+ "food_beverage_restaurants",
44
+ "travel_tourism_hospitality",
45
+ "real_estate_proptech",
46
+ "automotive_mobility",
47
+ "technology_software",
48
+ "telecommunications_isps",
49
+ "energy_utilities",
50
+ "industrial_manufacturing_engineering",
51
+ "construction_infrastructure",
52
+ "logistics_shipping_transportation",
53
+ "media_entertainment_streaming",
54
+ "gaming_igaming",
55
+ "professional_legal_services"
56
+ ],
57
+ "embedding_dim": 96,
58
+ "feature_stats": {},
59
+ "mlp_dropout": 0.15,
60
+ "mlp_hidden_size": 192,
61
+ "model_type": "dual_task_tld",
62
+ "ordinal_features": [
63
+ "tld_class",
64
+ "registration_restrictions"
65
+ ],
66
+ "price_features": [
67
+ "overall_score",
68
+ "score_automotive",
69
+ "score_construction",
70
+ "score_education",
71
+ "score_energy",
72
+ "score_engineering",
73
+ "score_fashion",
74
+ "score_finance",
75
+ "score_food",
76
+ "score_gaming",
77
+ "score_healthcare",
78
+ "score_insurance",
79
+ "score_legal",
80
+ "score_media",
81
+ "score_music",
82
+ "score_pets",
83
+ "score_sports",
84
+ "score_technology"
85
+ ],
86
+ "research_features": [
87
+ "abuse_rate_percent",
88
+ "avg_transfer_days",
89
+ "brand_association_strength",
90
+ "community_depth_score",
91
+ "dnssec_adoption_percent",
92
+ "hack_usage_popularity",
93
+ "influencer_adoption_rate",
94
+ "innovation_perception_score",
95
+ "market_momentum_score",
96
+ "media_sentiment_score",
97
+ "memorability_score",
98
+ "premium_brand_index",
99
+ "professional_usage_rate",
100
+ "registration_restrictions",
101
+ "registry_marketing_activity",
102
+ "reputation_trust_score",
103
+ "tech_startup_adoption_index",
104
+ "tld_class"
105
+ ],
106
+ "technical_features": [
107
+ "four_letter_registration_percent",
108
+ "global_top_1m_share",
109
+ "majestic_top_1m_count",
110
+ "sales_10y_above_10_count",
111
+ "three_letter_registration_percent"
112
+ ],
113
+ "torch_dtype": "float32",
114
+ "transformers_version": "4.44.2",
115
+ "vocab_size": 31204
116
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd185cd13924e5a365b7f37644b32fc306f3fed20a8e14912f575e8790b16f70
3
+ size 51066352
preprocessors.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab.txt ADDED
The diff for this file is too large to render. See raw diff