Evaluate documentation
Loading methods
Loading methods
Methods for listing and loading evaluation modules:
List
evaluate.list_evaluation_modules
< source >( module_type = None include_community = True with_details = False )
Parameters
- module_type (
str, optional, defaults toNone) — Type of evaluation modules to list. Has to be one of'metric','comparison', or'measurement'. IfNone, all types are listed. - include_community (
bool, optional, defaults toTrue) — Include community modules in the list. - with_details (
bool, optional, defaults toFalse) — Return the full details on the metrics instead of only the ID.
List all evaluation modules available on the Hugging Face Hub.
Load
evaluate.load
< source >( path: str config_name: typing.Optional[str] = None module_type: typing.Optional[str] = None process_id: int = 0 num_process: int = 1 cache_dir: typing.Optional[str] = None experiment_id: typing.Optional[str] = None keep_in_memory: bool = False download_config: typing.Optional[datasets.download.download_config.DownloadConfig] = None download_mode: typing.Optional[datasets.download.download_manager.DownloadMode] = None revision: typing.Union[str, datasets.utils.version.Version, NoneType] = None **init_kwargs )
Parameters
- path (
str) — Path to the evaluation processing script with the evaluation builder. Can be either:- a local path to processing script or the directory containing the script (if the script has the same name as the directory),
e.g.
'./metrics/rouge'or'./metrics/rouge/rouge.py' - a evaluation module identifier on the HuggingFace evaluate repo e.g.
'rouge'or'bleu'that are in either'metrics/','comparisons/', or'measurements/'depending on the providedmodule_type
- a local path to processing script or the directory containing the script (if the script has the same name as the directory),
e.g.
- config_name (
str, optional) — Selecting a configuration for the metric (e.g. the GLUE metric has a configuration for each subset). - module_type (
str, default'metric') — Type of evaluation module, can be one of'metric','comparison', or'measurement'. - process_id (
int, optional) — For distributed evaluation: id of the process. - num_process (
int, optional) — For distributed evaluation: total number of processes. - cache_dir (
str, optional) — Path to store the temporary predictions and references (default to~/.cache/huggingface/evaluate/). - experiment_id (
str) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute metrics in distributed setups (in particular non-additive metrics like F1). - keep_in_memory (
bool) — Whether to store the temporary results in memory (defaults toFalse). - download_config (
~evaluate.DownloadConfig, optional) — Specific download configuration parameters. - download_mode (
DownloadMode, defaults toREUSE_DATASET_IF_EXISTS) — Download/generate mode. - revision (
Union[str, evaluate.Version], optional) — If specified, the module will be loaded from the datasets repository at this version. By default it is set to the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues.
Load a EvaluationModule.