RITRIEVE ZH 微调：古诗 ↔ 现代语

This is a sentence-transformers model finetuned from richinfoai/ritrieve_zh_v1 on the json dataset. It maps sentences & paragraphs to a 1792-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: richinfoai/ritrieve_zh_v1
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1792 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json
Language: zh
License: mit

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1792, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '微信列表翻到底，能说真心话的居然只剩快递群。',
    '代情难重论，人事好乖移。',
    '时应记得长安事，曾向文场属思劳。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1792]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

json

Dataset: json
Size: 225,000 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 14 tokens mean: 26.51 tokens max: 45 tokens	min: 12 tokens mean: 15.23 tokens max: 27 tokens	min: 12 tokens mean: 15.34 tokens max: 34 tokens

Samples:

anchor	positive	negative
`整个人蜷在阳光里，连毛衣都晒出一股蓬松的香味。`	`箕踞拥裘坐，半身在日旸。`	`洛阳女儿对门居，才可容颜十五馀。`
`好像所有的好事都约好了一样，今天一起找上门来。`	`临终极乐宝华迎，观音势至俱来至。`	`身没南朝宅已荒，邑人犹赏旧风光。`
`大家都觉得她太娇气，只有你一直小心照顾着她。`	`弱质人皆弃，唯君手自栽。`	`秦筑长城城已摧，汉武北上单于台。`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

json

Dataset: json
Size: 25,000 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 12 tokens mean: 26.86 tokens max: 46 tokens	min: 12 tokens mean: 15.31 tokens max: 29 tokens	min: 12 tokens mean: 15.3 tokens max: 26 tokens

Samples:

anchor	positive	negative
`看着街边那些孤零零的老人，真怕自己以后也变成那样。`	`垂白乱南翁，委身希北叟。`	`熏香荀令偏怜少，傅粉何郎不解愁。`
`关了灯，屋里黑漆漆的，就听见外面秋虫和落叶在说话。`	`秋虫与秋叶，一夜隔窗闻。`	`未能穷意义，岂敢求瑕痕。`
`虽然爷爷不在了，但他教我做人的道理永远记在心里。`	`惟孝虽遥，灵规不朽。`	`巧类鸳机织，光攒麝月团。`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 6
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 6
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss
0.0284	50	4.4241	-
0.0569	100	3.4415	-
0.0853	150	2.6725	-
0.1138	200	2.4137	2.2686
0.1422	250	2.2701	-
0.1706	300	2.1523	-
0.1991	350	2.0805	-
0.2275	400	2.0513	1.9506
0.2560	450	2.0048	-
0.2844	500	1.9552	-
0.3129	550	1.8778	-
0.3413	600	1.8549	1.7630
0.3697	650	1.822	-
0.3982	700	1.8128	-
0.4266	750	1.7742	-
0.4551	800	1.7076	1.6331
0.4835	850	1.6919	-
0.5119	900	1.64	-
0.5404	950	1.6291	-
0.5688	1000	1.5881	1.5368
0.5973	1050	1.6018	-
0.6257	1100	1.5664	-
0.6542	1150	1.5545	-
0.6826	1200	1.5292	1.4532
0.7110	1250	1.5166	-
0.7395	1300	1.517	-
0.7679	1350	1.4639	-
0.7964	1400	1.4729	1.3687
0.8248	1450	1.4501	-
0.8532	1500	1.3932	-
0.8817	1550	1.4063	-
0.9101	1600	1.3825	1.3003
0.9386	1650	1.3647	-
0.9670	1700	1.3431	-
0.9954	1750	1.3417	-
1.0239	1800	1.0839	1.2431
1.0523	1850	1.0801	-
1.0808	1900	1.0577	-
1.1092	1950	1.0159	-
1.1377	2000	1.0239	1.2132
1.1661	2050	1.0335	-
1.1945	2100	1.0117	-
1.2230	2150	1.0343	-
1.2514	2200	1.0193	1.1808
1.2799	2250	1.0235	-
1.3083	2300	0.9949	-
1.3367	2350	1.0058	-
1.3652	2400	1.0039	1.1428
1.3936	2450	1.0164	-
1.4221	2500	0.9934	-
1.4505	2550	0.9777	-
1.4790	2600	0.9753	1.1101
1.5074	2650	0.9621	-
1.5358	2700	0.9756	-
1.5643	2750	0.9725	-
1.5927	2800	0.9649	1.0813
1.6212	2850	0.9652	-
1.6496	2900	0.9861	-
1.6780	2950	0.916	-
1.7065	3000	0.9417	1.0523
1.7349	3050	0.9599	-
1.7634	3100	0.9275	-
1.7918	3150	0.9247	-
1.8203	3200	0.9417	1.0306
1.8487	3250	0.9275	-
1.8771	3300	0.9431	-
1.9056	3350	0.9147	-
1.9340	3400	0.8957	1.0051
1.9625	3450	0.9169	-
1.9909	3500	0.9079	-
2.0193	3550	0.7057	-
2.0478	3600	0.6037	0.9944
2.0762	3650	0.5888	-
2.1047	3700	0.6134	-
2.1331	3750	0.6209	-
2.1615	3800	0.6163	0.9836
2.1900	3850	0.6271	-
2.2184	3900	0.629	-
2.2469	3950	0.6041	-
2.2753	4000	0.622	0.9792
2.3038	4050	0.6175	-
2.3322	4100	0.627	-
2.3606	4150	0.6339	-
2.3891	4200	0.6325	0.9643
2.4175	4250	0.6044	-
2.4460	4300	0.6124	-
2.4744	4350	0.6326	-
2.5028	4400	0.6349	0.9462
2.5313	4450	0.6286	-
2.5597	4500	0.6325	-
2.5882	4550	0.6399	-
2.6166	4600	0.6184	0.9317
2.6451	4650	0.6292	-
2.6735	4700	0.6017	-
2.7019	4750	0.6305	-
2.7304	4800	0.6152	0.9213
2.7588	4850	0.5972	-
2.7873	4900	0.6048	-
2.8157	4950	0.6096	-
2.8441	5000	0.6156	0.9073
2.8726	5050	0.5942	-
2.9010	5100	0.592	-
2.9295	5150	0.6088	-
2.9579	5200	0.5941	0.8950
2.9863	5250	0.6161	-
3.0148	5300	0.5021	-
3.0432	5350	0.4116	-
3.0717	5400	0.3936	0.9009
3.1001	5450	0.4193	-
3.1286	5500	0.422	-
3.1570	5550	0.432	-
3.1854	5600	0.4281	0.8985
3.2139	5650	0.4091	-
3.2423	5700	0.4305	-
3.2708	5750	0.4203	-
3.2992	5800	0.4193	0.8869
3.3276	5850	0.4238	-
3.3561	5900	0.4274	-
3.3845	5950	0.4124	-
3.4130	6000	0.4241	0.8842
3.4414	6050	0.427	-
3.4699	6100	0.4275	-
3.4983	6150	0.4152	-
3.5267	6200	0.4247	0.8733
3.5552	6250	0.4111	-
3.5836	6300	0.4396	-
3.6121	6350	0.4122	-
3.6405	6400	0.4252	0.8657
3.6689	6450	0.4167	-
3.6974	6500	0.4282	-
3.7258	6550	0.411	-
3.7543	6600	0.4273	0.8540
3.7827	6650	0.4327	-
3.8111	6700	0.431	-
3.8396	6750	0.4347	-
3.8680	6800	0.4264	0.8523
3.8965	6850	0.4213	-
3.9249	6900	0.4285	-
3.9534	6950	0.4138	-
3.9818	7000	0.4051	0.8407
4.0102	7050	0.3779	-
4.0387	7100	0.2957	-
4.0671	7150	0.2939	-
4.0956	7200	0.3065	0.8590
4.1240	7250	0.3081	-
4.1524	7300	0.3043	-
4.1809	7350	0.3176	-
4.2093	7400	0.3067	0.8487
4.2378	7450	0.299	-
4.2662	7500	0.3106	-
4.2947	7550	0.3062	-
4.3231	7600	0.3153	0.8498
4.3515	7650	0.3206	-
4.3800	7700	0.3202	-
4.4084	7750	0.3167	-
4.4369	7800	0.3044	0.8426
4.4653	7850	0.3015	-
4.4937	7900	0.3157	-
4.5222	7950	0.3109	-
4.5506	8000	0.3164	0.8385
4.5791	8050	0.2996	-
4.6075	8100	0.3247	-
4.6359	8150	0.3093	-
4.6644	8200	0.3017	0.8294
4.6928	8250	0.3075	-
4.7213	8300	0.3006	-
4.7497	8350	0.3134	-
4.7782	8400	0.3111	0.8249
4.8066	8450	0.3165	-
4.8350	8500	0.3071	-
4.8635	8550	0.3017	-
4.8919	8600	0.3092	0.8225
4.9204	8650	0.3	-
4.9488	8700	0.2999	-
4.9772	8750	0.3116	-
5.0057	8800	0.3046	0.8173
5.0341	8850	0.2501	-
5.0626	8900	0.2443	-
5.0910	8950	0.2338	-
5.1195	9000	0.2382	0.8248
5.1479	9050	0.2524	-
5.1763	9100	0.2427	-
5.2048	9150	0.2512	-
5.2332	9200	0.2377	0.8218
5.2617	9250	0.2458	-
5.2901	9300	0.2515	-
5.3185	9350	0.2453	-
5.3470	9400	0.244	0.8226
5.3754	9450	0.2389	-
5.4039	9500	0.253	-
5.4323	9550	0.2509	-
5.4608	9600	0.2492	0.8198
5.4892	9650	0.2379	-
5.5176	9700	0.247	-
5.5461	9750	0.2419	-
5.5745	9800	0.244	0.8150
5.6030	9850	0.2498	-
5.6314	9900	0.2381	-
5.6598	9950	0.2425	-
5.6883	10000	0.2451	0.8148
5.7167	10050	0.2468	-
5.7452	10100	0.2404	-
5.7736	10150	0.2397	-
5.8020	10200	0.2417	0.8124
5.8305	10250	0.2446	-
5.8589	10300	0.2443	-
5.8874	10350	0.2465	-
5.9158	10400	0.2472	0.8121

Framework Versions

Python: 3.10.16
Sentence Transformers: 4.1.0
Transformers: 4.51.3
PyTorch: 2.7.0+cu126
Accelerate: 1.7.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

slxhere
/

modern_ancientpoem_encoder

RITRIEVE ZH 微调：古诗 ↔ 现代语

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

json

Evaluation Dataset

json

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for slxhere/modern_ancientpoem_encoder

Space using slxhere/modern_ancientpoem_encoder 1