SetFit documentation
Utility Functions
Utility Functions
setfit.get_templated_dataset
< source >( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None candidate_labels: typing.Optional[typing.List[str]] = None reference_dataset: typing.Optional[str] = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text' ) → Dataset
Parameters
- dataset (
Dataset, optional) — A Dataset to add templated examples to. - candidate_labels (
List[str], optional) — The list of candidate labels to be fed into the template to construct examples. - reference_dataset (
str, optional) — A dataset to take labels from, ifcandidate_labelsis not supplied. - template (
str, optional, defaults to"This sentence is {}") — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”. - sample_size (
int, optional, defaults to 2) — The number of examples to make for each candidate label. - text_column (
str, optional, defaults to"text") — The name of the column containing the text of the examples. - label_column (
str, optional, defaults to"label") — The name of the column indatasetcontaining the labels of the examples. - multi_label (
bool, optional, defaults toFalse) — Whether or not multiple candidate labels can be true. - label_names_column (
str, optional, defaults to “label_text”) — The name of the label column in thereference_dataset, to be used in case there is no ClassLabel feature for the label column.
Returns
Dataset
A copy of the input Dataset with templated examples added.
Raises
ValueError
ValueError— If the input Dataset is not empty and one or both of the provided column names are missing.
Create templated examples for a reference dataset or reference labels.
If candidate_labels is supplied, use it for generating the templates.
Otherwise, use the labels loaded from reference_dataset.
If input Dataset is supplied, add the examples to it, otherwise create a new Dataset.
The input Dataset is assumed to have a text column with the name text_column and a
label column with the name label_column, which contains one-hot or multi-hot
encoded label sequences.
setfit.sample_dataset
< source >( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )
Samples a Dataset to create an equal number of samples per class (when possible).