metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:210
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: >-
What does maintenance refer to in the context of providing for another
person?
sentences:
- >-
-M-
Maintenance: The f urnishing by one person to another the means of
living, or fo od, clothing,
- >-
income and expenses to determine if the debtor may proceed under Chapter
7.
Chapter 7 trustee
A person appointed in a Chapter 7 case to represent the interests of the
bankruptcy estate
and the creditors. The trustee's responsibilities include reviewing the
debtor's petition and
schedules, liquidating the property of the estate, and making
distributions to creditors. The
trustee may also bring actions against creditors or the debtor to
recover property of the
bankruptcy estate.
Chapter 9
- >-
-19-
Trial De Novo: A new trial (See 22NYCRR 28.12).
-U-
Undertaking: Deposit of a sum of money or filing of a bond in court, to
secure some actual or
potential obligation.
-V-
Vacate: To set aside or undo a previous action or order.
Venire: Technically, a writ summoning persons to court to act as
jurors; popularly used as meaning
the body of names thus summoned.
Venue: (a) Geographical place where some legal matter occurs or may be
determined. (b) The
geographical area within which a court has jurisdiction. It relates only
to a place or territory within
which either party may require a case to be tried. A defect in venue may
be waived by the parties.
- source_sentence: 'What does the term "Pro Se" refer to in a legal context? '
sentences:
- >-
Process: A l egal means, such as a s ummons, used to s ubject a de
fendant i n a l awsuit to the
personal jur isdiction o f the c ourt; broa dly, r efers to all writs
iss ued i n the c ourse of a le gal
proceeding - what is served to obtain jurisdiction.
Pro Se (aka Self-Represented): Appearing on one’s own behalf without an
attorney.
Purge: To atone for or correct an offense, to submit to a court's
mandate (i.e., to purge oneself
of contempt of court).
-Q-
None.
-R-
Recuse: To disqualify oneself as a judge.
Redact: To edit, revise or block out written text.
Referee: A person to whom a claim pending in a court is referred by
the court to take testimony,
- >-
-10-
Hearing: A pr eliminary examination where testimony is given and e
vidence presented for the
purpose of determining an issue of fact and reaching a decision on the
basis of that evidence.
Hearsay: Testimony of a witness who relates not what he/she knows
personally, but what others
have told the witness, or what the witness has heard said by others; may
be admissible or
inadmissible in court depending upon rules of evidence.
Hung Jury: A jury whose members cannot reconcile their differences of
opinion and thus cannot
reach a verdict.
-I-
Impaneling: The process by which jurors are selected and sworn to their
task.
Impleader: An addition of another party to an action by the defendant,
a “third party” claim.
- >-
-12-
Jurisdiction, Subject Matter: Whether the court has authority over the
thing or right claimed by
one party against another.
Jury: A prescribed number of persons selected according to law and
sworn to make findings of
fact.
Jury (Advisory): A body of jurors impaneled to hear a case in which the
parties have no right to
a jury trial - the judge remains solely responsible for the findings and
may accept or reject the
jury's verdict.
Jury Instructions: Directions given by the judge to the jury, at the
beginning and end of trial.
-K-
None.
-L-
Laches: The failure to diligently assert a right, which results in a
refusal to allow the right to be
asserted later.
Legal Age: Eighteen (18) years of age. See CPLR Section 1206.
- source_sentence: What is the purpose of a Chapter 11 bankruptcy filing?
sentences:
- >-
condemnation, i.e., the legal process by which real estate of a private
owner is taken for public use
without the owner's consent, but upon the award and payment of just
compensation.
Enjoin: To require a person, by writ of injunction from a court of
equity, to perform or to refrain
from or cease doing some act.
Entry: The formal filing of an order of judgment with the County Clerk.
Equitable Action (Equity Matter): An action which may be brought for
the purpose of restraining
- >-
A legal claim.
Chambers
The offices of a judge and his or her staff.
Chapter 11
A reorganization bankruptcy, usually involving a corporation or
partnership. A Chapter 11
debtor usually proposes a plan of reorganization to keep its business
alive and pay creditors
over time. Individuals or people in business can also seek relief in
Chapter 11.
Chapter 12
The chapter of the Bankruptcy Code providing for adjustment of debts of
a "family farmer"
or "family fisherman," as the terms are defined in the Bankruptcy Code.
Chapter 13
The chapter of the Bankruptcy Code providing for the adjustment of debts
of an individual
with regular income, often referred to as a "wage-earner" plan. Chapter
13 allows a debtor
- >-
Conviction
A judgment of guilt against a criminal defendant.
Counsel
Legal advice; a term also used to refer to the lawyers in a case.
Count
An allegation in an indictment or information, charging a defendant with
a crime. An
indictment or information may contain allegations that the defendant
committed more
than one crime. Each allegation is referred to as a count.
Court
Government entity authorized to resolve legal disputes. Judges sometimes
use "court" to
refer to themselves in the third person, as in "the court has read the
briefs."
Court reporter
A person who makes a word-for-word record of what is said in court,
generally by using a
stenographic machine, shorthand or audio recording, and then produces a
transcript of the
- source_sentence: >-
What types of property may a debtor be able to exempt under the homestead
exemption?
sentences:
- >-
-2-
Affidavit of Service: An affidavit intended to certify or prove that
service of a writ, notice, or other
document has been made.
Affirm: An act of declaring something to be true under the penalty of
perjury by a person who
conscientiously declines to take an oath for religious or other
pertinent reasons; also attorneys are
permitted to affirm rather than swear under oath.
Affirmation: A solemn and formal declaration under penalties of perjury
that a statement is true,
without an oath.
Affirmed: Upheld, agreed with (e.g.,The Appellate Court affirmed the
judgment of the City Court);
also means a challenge to a court decision or order was rejected.
- >-
A formal request for the protection of the federal bankruptcy laws.
(There is an official form
for bankruptcy petitions.)
Bankruptcy trustee
A private individual or corporation appointed in all Chapter 7 and
Chapter 13 cases to
represent the interests of the bankruptcy estate and the debtor's
creditors.
Bench trial
A trial without a jury, in which the judge serves as the fact-finder.
Brief
A written statement submitted in a trial or appellate proceeding that
explains one side's
legal and factual arguments.
Burden of proof
The duty to prove disputed facts. In civil cases, a plaintiff generally
has the burden of
proving his or her case. In criminal cases, the government has the
burden of proving the
defendant's guilt. (See standard of proof.)
- >-
residence (homestead exemption), or some or all "tools of the trade"
used by the debtor to
make a living (i.e., auto tools for an auto mechanic or dental tools for
a dentist). The
availability and amount of property the debtor may exempt depends on the
state the debtor
lives in.
F
Face sheet filing
A bankruptcy case filed either without schedules or with incomplete
schedules listing few
creditors and debts. (Face sheet filings are often made for the purpose
of delaying an
- source_sentence: >-
How does a fraudulent transfer relate to a debtor's intent in bankruptcy
cases?
sentences:
- >-
Glossary of Legal Terms
Find definitions of legal terms to help understand the federal
court system.
A
Acquittal
A jury verdict that a criminal defendant is not guilty, or the finding
of a judge that the
evidence is insufficient to support a conviction.
Active judge
A judge in the full-time service of the court. Compare to senior judge.
Administrative Office of the United States Courts (AO)
Enter legal term to search for definition
Search
- >-
A serious crime, usually punishable by at least one year in prison.
File
To place a paper in the official custody of the clerk of court to enter
into the files or records
of a case.
Fraudulent transfer
A transfer of a debtor's property made with intent to defraud or for
which the debtor
receives less than the transferred property's value.
Fresh start
The characterization of a debtor's status after bankruptcy, i.e., free
of most debts. (Giving
debtors a fresh start is one purpose of the Bankruptcy Code.)
G
Grand jury
A body of 16-23 citizens who listen to evidence of criminal allegations,
which is presented by
the prosecutors, and determine whether there is probable cause to
believe an individual
- >-
-3-
Argument: A reason given in proof or rebuttal to persuade a judge or
jury.
At Issue: Whenever the parties to an action come to a point in the
pleadings or argument which
is affirmed on one side and denied on the other, the points are said to
be "at issue".
Attachment: The taking of property into legal custody by an enforcement
officer (See specialty
section: Recovery of Chattel).
Attestation: The act of witnessing an instrument in writing at the
request of the party making the
instrument and signing it as a witness.
Attorney of Record: Attorney whose name appears in the court’s records
or files of a case.
Award: A decision of an Arbitrator, judge or jury.
-B-
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.9318181818181818
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9318181818181818
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9545454545454546
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9318181818181818
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3106060606060606
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1909090909090909
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999996
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9318181818181818
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9318181818181818
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9545454545454546
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9565434941101226
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9438131313131314
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9438131313131314
name: Cosine Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-l
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("vin00d/snowflake-arctic-legal-ft-1")
# Run inference
sentences = [
"How does a fraudulent transfer relate to a debtor's intent in bankruptcy cases?",
"A serious crime, usually punishable by at least one year in prison.\nFile\nTo place a paper in the official custody of the clerk of court to enter into the files or records\nof a case.\nFraudulent transfer\nA transfer of a debtor's property made with intent to defraud or for which the debtor\nreceives less than the transferred property's value.\nFresh start\nThe characterization of a debtor's status after bankruptcy, i.e., free of most debts. (Giving\ndebtors a fresh start is one purpose of the Bankruptcy Code.)\nG\nGrand jury\nA body of 16-23 citizens who listen to evidence of criminal allegations, which is presented by\nthe prosecutors, and determine whether there is probable cause to believe an individual",
'-3-\nArgument: A reason given in proof or rebuttal to persuade a judge or jury.\nAt Issue: Whenever the parties to an action come to a point in the pleadings or argument which\nis affirmed on one side and denied on the other, the points are said to be "at issue".\nAttachment: The taking of property into legal custody by an enforcement officer (See specialty\nsection: Recovery of Chattel).\nAttestation: The act of witnessing an instrument in writing at the request of the party making the\ninstrument and signing it as a witness.\nAttorney of Record: Attorney whose name appears in the court’s records or files of a case.\nAward: A decision of an Arbitrator, judge or jury.\n-B-',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.9318 |
cosine_accuracy@3 | 0.9318 |
cosine_accuracy@5 | 0.9545 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.9318 |
cosine_precision@3 | 0.3106 |
cosine_precision@5 | 0.1909 |
cosine_precision@10 | 0.1 |
cosine_recall@1 | 0.9318 |
cosine_recall@3 | 0.9318 |
cosine_recall@5 | 0.9545 |
cosine_recall@10 | 1.0 |
cosine_ndcg@10 | 0.9565 |
cosine_mrr@10 | 0.9438 |
cosine_map@100 | 0.9438 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 210 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 210 samples:
sentence_0 sentence_1 type string string details - min: 9 tokens
- mean: 17.36 tokens
- max: 33 tokens
- min: 4 tokens
- mean: 122.9 tokens
- max: 192 tokens
- Samples:
sentence_0 sentence_1 What is the purpose of the glossary of common legal terms provided in the context?
GLOSSARY ‐ COMMON LEGAL TERMS
NOTE: The following definitions are not legal definitions. Rather, these definitions are
intended to give you a general idea of the meanings of common legal words. For
comprehensive Definitions of legal terms, you may wish to consult a legal dictionary
“Black’s Law Dictionary” is one such legal dictionary which is usually available at
most law libraries.
This glossary of common legal terms is also available on‐line at:
http://www.nycourts.gov/lawlibraries/glossary.shtml
ADDITIONAL ON‐LINE RESOURCES:
http://www.nolo.com/glossary.cfm
Nolo’s on‐line legal dictionary.
http://www.law‐dictionary.org/
Free on‐line legal dictionary search engine.
http://www.law.cornell.edu/wexWhere can one find a comprehensive legal dictionary for more detailed definitions of legal terms?
GLOSSARY ‐ COMMON LEGAL TERMS
NOTE: The following definitions are not legal definitions. Rather, these definitions are
intended to give you a general idea of the meanings of common legal words. For
comprehensive Definitions of legal terms, you may wish to consult a legal dictionary
“Black’s Law Dictionary” is one such legal dictionary which is usually available at
most law libraries.
This glossary of common legal terms is also available on‐line at:
http://www.nycourts.gov/lawlibraries/glossary.shtml
ADDITIONAL ON‐LINE RESOURCES:
http://www.nolo.com/glossary.cfm
Nolo’s on‐line legal dictionary.
http://www.law‐dictionary.org/
Free on‐line legal dictionary search engine.
http://www.law.cornell.edu/wexWhat organization maintains the legal dictionary and encyclopedia mentioned in the context?
Legal dictionary and encyclopedia maintained by the
Legal Information Institute at Cornell Law School. - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 10per_device_eval_batch_size
: 10num_train_epochs
: 10multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 10per_device_eval_batch_size
: 10per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | cosine_ndcg@10 |
---|---|---|
1.0 | 21 | 0.9240 |
2.0 | 42 | 0.9628 |
2.3810 | 50 | 0.9628 |
3.0 | 63 | 0.9502 |
4.0 | 84 | 0.9569 |
4.7619 | 100 | 0.9563 |
5.0 | 105 | 0.9556 |
6.0 | 126 | 0.9569 |
7.0 | 147 | 0.9555 |
7.1429 | 150 | 0.9555 |
8.0 | 168 | 0.9565 |
9.0 | 189 | 0.9565 |
9.5238 | 200 | 0.9565 |
10.0 | 210 | 0.9565 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}