sahithkumar7's picture
Add new SentenceTransformer model
fbe478e verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:100
  - loss:MultipleNegativesRankingLoss
base_model: microsoft/mpnet-base
widget:
  - source_sentence: >-
      How many different active substances were detected in surface water across
      all catchment areas?
    sentences:
      - >-
        metabolites were not detected in the water bodies.

        2.1.1. Antibiotics/Enzyme-Inhibitors and

        Abacavir in Surface-Water

        Fifty detections were found in all catchment areas in surface water,
        which corresponds to 15 different active substances:

        12 antibiotics, two enzyme inhibitors, and one antiviral. The number of
        detections per sampling station ranged from 0 to 7

        different active substances. The Ave river-Prazins (Santo Tirso) and
        Serzedelo I and II (Guimar ã es) as well as Ria

        Formosa-coastal water (Faro and Olh ã o), each one with two sampling
        sites, showed the most detected compounds in
      - >-
        2. Results

        2.1. Frequency of Detections:

        Antibiotics/Enzyme-Inhibitors and Abacavir

        in Surface-Groundwater

        During the screening framework beyond the antibiotics/enzyme-inhibitors,
        the antiviral abacavir was detected. Therefore,

        given the relevance of this compound, it was included in the present
        study. Although enzyme inhibitors belong to the

        antibiotic group, their specific pharmacological properties and
        detection were sorted apart. In the present study, antibiotic

        metabolites were not detected in the water bodies.

        2.1.1. Antibiotics/Enzyme-Inhibitors and

        Abacavir in Surface-Water
      - >-
        surface water. The relatively higher detection of substances downstream
        of the effluent discharge points compared with a

        low detection in upstream samples could be attributed to the low
        efficiency in urban wastewater treatment plants or

        agricultural pressure. The environmental impact is more critical due to
        active substances in drinking water or premix

        medicated feeds in the veterinary site.

        Furthermore, the detection of substances of exclusive human use
        (abacavir, tazobactam and cilastatin) prove the weak
  - source_sentence: >-
      What group of pharmaceuticals was sulfamethazine matched to when its
      quantity was missing?
    sentences:
      - >-
        ciprofloxacin

        43%

        (3/7), enrofloxacin, norfloxacin, trimethoprim, lincomycin (29% (2/7),
        abacavir and tetracycline

        14% (1/7). The enzyme inhibitors, namely clavulanic acid and cilastatin,
        were detected once in an urban region located

        well. This catchment point showed the most significant

        number of pharmaceuticals. West/Tejo and Centre were the regions with
        the most considerable number of substances in

        groundwater, accounting for 43%. All groundwater

        samples were contaminated by at least one antibiotic. Supplemental
        Tables S2 and S4 contain a detailed description of

        the
      - >-
        clarithromycin) were the only ones that demonstrated the potential to
        concentrate in living organisms (log Kow ≥ 3) [14].

        All the remaining antibiotics showed a relatively low log Kow and were
        expected to be present mainly in surface water.

        However, the soil mobility/adsorption detected The detected
        pharmaceuticals showed high to moderate water solubility

        and are small ionisable molecules (MW  900 g/mol). Regarding the
        octanol/water partitioning coefficient (log Kow) data,
      - >-
        missing quantity for sulfamethazine, the sulfonamides group has been
        matched.

        Consumption (Kg) of the detected pharmaceuticals in Portugal (2017).

        1 Amount from ESVAC Report-2017; 2 Match the sulfonamides amount; NA-not
        available.

        Amount of detected pharmaceuticals consumption per Portuguese region.
        Amount of detected pharmaceuticals

        consumption per Portuguese region.
  - source_sentence: >-
      What directive sets environmental quality standards for substances in
      surface waters?
    sentences:
      - >-
        As much as the specificities of each member state should be considered
        this issue has become one of the European

        community's main concerns [8].

        The strategies against water pollution are provided in the Water
        Framework Directive [9] and the Directive on

        Environmental Quality Standards that set environmental quality standards
        (EQS) for the substances in surface waters

        and confirm their designation as priority or priority hazardous
        substances [10]. Evidence of potential impacts and
      - >-
        seems to undertake a similar fate in the environment.

        Nevertheless, due to stronger adsorption, with higher emergence in
        sediment, its occurrence in the surface water is lower

        [71]. The use of tetracyclines, mainly as medicated premix and oral
        solution for food-producing animals [72], and the very

        low bioavailability (e.g. in pig feed) [43] contribute to increasing its
        release into the environment. Regarding macrolides,

        erythromycin and clarithromycin exhibit a remarkable frequency of
        detection in surface water samples. The most
      - >-
        low flows; otherwise, POCIS might be damage. In ground-waters was used
        one POCIS unit/well. Due to the high sorption

        capacity, POCIS was deployed approximately for 30 days, allowing the
        polar organic compounds adsorbed to be in the

        equilibrium stage with the active substances in an aqueous medium. In
        the laboratory, POCIS disks were frozen until

        extraction.

        4.2.2. Qualitative Analysis Method Used

        for the Characterisation of Antibiotics in

        Surface-Groundwater
  - source_sentence: What is the molecular weight range of the detected pharmaceuticals?
    sentences:
      - >-
        2.3. Physicochemical Properties and Key Pharmacokinetic Features of
        Detected Pharmaceuticals 2.3. Physicochemical

        Properties and Key Pharmacokinetic Features of Detected Pharmaceuticals

        The detected pharmaceuticals showed high to moderate water solubility
        and are small ionisable molecules (MW  900

        g/mol). Regarding the octanol/water partitioning coefficient (log Kow)
        data, macrolide antibiotics (azithromycin and

        clarithromycin) were the only ones that demonstrated the potential to
        concentrate in living organisms (log Kow  3) [14].
      - >-
        As much as the specificities of each member state should be considered
        this issue has become one of the European

        community's main concerns [8].

        The strategies against water pollution are provided in the Water
        Framework Directive [9] and the Directive on

        Environmental Quality Standards that set environmental quality standards
        (EQS) for the substances in surface waters

        and confirm their designation as priority or priority hazardous
        substances [10]. Evidence of potential impacts and
      - >-
        passive samplers in groundwater considered the well technical features;
        the depth and groundwater level were previously

        determined since they should be detected at the superficial levels. The
        passive sampler was placed using a water level

        meter, 2 m below the groundwater level. The sampler always remained
        immersed in water, avoiding extractions and the

        regional lowering of the water table [104]. For the sampling stations,
        sites of different environmental pressures were

        considered, specifically urban, agricultural area/animal production, and
        aquaculture. The information regarding the
  - source_sentence: >-
      What was the most frequently identified pharmaceutical in the groundwater
      samples?
    sentences:
      - >-
        Pharmacokinetic characteristics may represent key features in
        understanding antibiotics occurrence [62]. Most antibiotics

        are not completely metabolised in humans and animals; thus, a high
        percentage of the active substance (40-90%) is

        excreted in urine/faeces in the unchanged form. These molecules are
        discharged into water and soil through wastewater,

        animal manure, and sewage sludge, frequently used as fertilisers to
        agricultural lands. Also, it is expected that the

        hospital effluent will contribute partly to the pharmaceutical load in
        the wastewater treatment plant influence [63].
      - >-
        many domestic and livestock animals. Several formulations of powder for
        administration in drinking water and medicated

        premix are available for poultry and pigs. The excretion of amoxicillin
        is predominantly renal; more than 80% of the parent

        drug is recovered unchanged in the urine. While bioavailability of 75 to
        80% is reported in humans, a low value (~30%)

        was observed in pigs, calves, foals, and pigeons [26,52]. Maybe this
        last group of animals contribute more sharply to the
      - >-
        from one to five compounds. The most frequently identified
        pharmaceuticals, in decreasing order, were ciprofloxacin 43%

        (3/7), enrofloxacin, norfloxacin, trimethoprim, lincomycin (29% (2/7),
        abacavir and tetracycline 14% (1/7). The enzyme

        inhibitors, namely clavulanic acid and cilastatin, were detected once in
        an urban region located well. This catchment point

        showed the most significant number of pharmaceuticals. West/Tejo and
        Centre were the regions with the most

        considerable number of substances in groundwater, accounting for 43%.
        All groundwater samples were contaminated by
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on microsoft/mpnet-base
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: antibiotics test
          type: antibiotics_test
        metrics:
          - type: cosine_accuracy
            value: 0.75
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: mpnet base smartbots/iter1
          type: mpnet-base-smartbots/iter1
        metrics:
          - type: cosine_accuracy
            value: 0.9333333373069763
            name: Cosine Accuracy
          - type: cosine_accuracy
            value: 0.9333333373069763
            name: Cosine Accuracy

SentenceTransformer based on microsoft/mpnet-base

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sahithkumar7/mpnet-base-smartbots-iter01")
# Run inference
sentences = [
    'What was the most frequently identified pharmaceutical in the groundwater samples?',
    'from one to five compounds. The most frequently identified pharmaceuticals, in decreasing order, were ciprofloxacin 43%\n(3/7), enrofloxacin, norfloxacin, trimethoprim, lincomycin (29% (2/7), abacavir and tetracycline 14% (1/7). The enzyme\ninhibitors, namely clavulanic acid and cilastatin, were detected once in an urban region located well. This catchment point\nshowed the most significant number of pharmaceuticals. West/Tejo and Centre were the regions with the most\nconsiderable number of substances in groundwater, accounting for 43%. All groundwater samples were contaminated by',
    'Pharmacokinetic characteristics may represent key features in understanding antibiotics occurrence [62]. Most antibiotics\nare not completely metabolised in humans and animals; thus, a high percentage of the active substance (40-90%) is\nexcreted in urine/faeces in the unchanged form. These molecules are discharged into water and soil through wastewater,\nanimal manure, and sewage sludge, frequently used as fertilisers to agricultural lands. Also, it is expected that the\nhospital effluent will contribute partly to the pharmaceutical load in the wastewater treatment plant influence [63].',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: antibiotics_test, mpnet-base-smartbots/iter1 and mpnet-base-smartbots/iter1
  • Evaluated with TripletEvaluator
Metric antibiotics_test mpnet-base-smartbots/iter1
cosine_accuracy 0.75 0.9333

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 100 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 100 samples:
    anchor positive negative
    type string string string
    details
    • min: 9 tokens
    • mean: 16.14 tokens
    • max: 33 tokens
    • min: 48 tokens
    • mean: 125.65 tokens
    • max: 218 tokens
    • min: 48 tokens
    • mean: 122.97 tokens
    • max: 211 tokens
  • Samples:
    anchor positive negative
    Which two macrolide antibiotics are frequently detected in surface water samples? seems to undertake a similar fate in the environment.
    Nevertheless, due to stronger adsorption, with higher emergence in sediment, its occurrence in the surface water is lower
    [71]. The use of tetracyclines, mainly as medicated premix and oral solution for food-producing animals [72], and the very
    low bioavailability (e.g. in pig feed) [43] contribute to increasing its release into the environment. Regarding macrolides,
    erythromycin and clarithromycin exhibit a remarkable frequency of detection in surface water samples. The most
    Nonetheless, besides the sorption capacity, these antibiotics have high solubility in water. Crucial routes for these
    substances into the environment are manure from animal production and sewage sludge from wastewater treatment
    plant (WWTP) used as fertilisers. Therefore, these substances have been evidenced in topsoil samples [68]. These
    quinolones and other antibiotics, for instance, norfloxacin and tetracycline, have been identified in groundwater samples
    despite being influenced by sorption processes. They were not readily degraded; instead, the input into groundwater
    What antimicrobial drugs were identified in the survey besides macrolides? is one of the most frequently pharmaceutical in representative rivers [74,75]. The three macrolides identified in our
    detection survey are included since 2018 in the first 'watch list' [76].
    Another group of antimicrobial drugs identified in our survey were sulfamethoxazole/trimethoprim and sulfamethazine.
    Sulfamethoxazole/trimethoprim are often used combined since the effectiveness of sulfonamides is enhanced. In the
    present study, the detection of both substances was comparable; however, trimethoprim was detected in groundwater.
    upstream samples obtained in rural locations was demonstrated and could be attributed to a low efficiency in the urban
    wastewater treatment plants or due to agricultural pressure.
    The higher frequency of detection for most substances was observed in the Ave river and Ria Formosa, confirming that
    several effluents impact these water bodies from urban wastewater treatment plants and livestock production.
    Pharmacokinetic characteristics may represent key features in understanding antibiotics occurrence [62]. Most antibiotics
    How long was the observational period of the antibiotic survey in Portugal? of antibiotics and their metabolites in surface- groundwater. It seeks to reflect the current demographic, spatial, drug
    consumption, and drug profile on an observational period of 3 years in Portugal. The greatest challenge of this survey
    data will be to promote the ecopharmacovigilance framework development shortly to implement measures for avoiding
    misuse/overuse of antibiotics and slow down emission and antibiotic resistance.
    2. Results
    2.1. Frequency of Detections:
    Antibiotics/Enzyme-Inhibitors and Abacavir
    in Surface-Groundwater
    despite being influenced by sorption processes. They were not readily degraded; instead, the input into groundwater
    could be due to livestock farming pressure, namely by spreading manure in the soil or the possible sewage sludge
    application in the area. High clay and low sand content in soils can decrease the mobility of pharmaceuticals, which is
    attributed to clay intense exchange capacity. Thus, soil properties (e.g. particle composition) are a significant, influential
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

json

  • Dataset: json
  • Size: 100 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 100 samples:
    anchor positive negative
    type string string string
    details
    • min: 11 tokens
    • mean: 16.4 tokens
    • max: 25 tokens
    • min: 76 tokens
    • mean: 113.65 tokens
    • max: 148 tokens
    • min: 89 tokens
    • mean: 118.8 tokens
    • max: 162 tokens
  • Samples:
    anchor positive negative
    What percentage of unchanged excretion did the most significant number of detected substances show? coefficients were not available for lincomycin, clavulanic acid and cilastatin.
    Physicochemical properties of detected pharmaceuticals.
    1 Data retrieved from [16]; 2 Data retrieved from [17]; 3 Data retrieved from [18]; 4 Data retrieved from [19]; 5
    Data retrieved from [20];
    6 Data retrieved from [21]; 7 Data retrieved from [22]; 8 Data retrieved from [23]; 9 Data retrieved from [24]; 10
    Data retrieved from [25];
    NA-not available.
    The most significant number of detected substances showed a percentage of unchanged excretion higher than 40%.
    1. Introduction
    Antibiotics are a critical component of human and veterinary modern medicine, developed to produce desirable or
    beneficial effects on infections induced by pathogens. Like most pharmaceuticals, antibiotics tend to be small organic
    polar compounds, generally ionisable, ordinarily subject to a metabolism or biotransformation process by the organism to
    be eliminated more efficiently [1,2]. The excretion of these compounds and their metabolites occurs mainly through urine,
    How many kilograms of abacavir were detected in Portugal in 2017? Regarding the different regions, it has been concluded that North and West/Tejo were the regions with the higher
    consuming values. Both regions presented a significant value (33%) for the abacavir. For the detected antiviral abacavir,
    an amount of 1458 kg has been observed.
    Regarding antibiotics used in veterinary medicine, the regional amount was not available. Likewise, due to the reported
    missing quantity for sulfamethazine, the sulfonamides group has been matched.
    Consumption (Kg) of the detected pharmaceuticals in Portugal (2017).
    43%
    (3/7), enrofloxacin, norfloxacin, trimethoprim, lincomycin (29% (2/7), abacavir and tetracycline
    14% (1/7). The enzyme inhibitors, namely clavulanic acid and cilastatin, were detected once in an urban region located
    well. This catchment point showed the most significant
    number of pharmaceuticals. West/Tejo and Centre were the regions with the most considerable number of substances in
    groundwater, accounting for 43%. All groundwater
    samples were contaminated by at least one antibiotic. Supplemental Tables S2 and S4 contain a detailed description of
    the
    What must marketing authorisation procedures for medicines include since 2006? substances in passive samplers [7]. Since 2006, marketing authorisation procedures for both human and veterinary
    medicines must include an environmental risk assessment that comprises a prospective exposure assessment,
    underestimating the possible impact and the occurrence of antibiotics after years of consumption. Ultimately, the potential
    risk may not be correctly anticipated. It becomes urgent to generate new data, mainly to refine exposure assessments.
    As much as the specificities of each member state should be considered this issue has become one of the European
    clarithromycin/erythromycin, tetracycline, sulfamethoxazole, and abacavir. In groundwater, enrofloxacin/ciprofloxacin,
    norfloxacin, trimethoprim, lincomycin, abacavir and tetracycline were recovered. Metabolites were not detected in water
    bodies. Noticeable was the detection of enzyme inhibitors, tazobactam and cilastatin, which are both for exclusive
    hospital use. The North region and Algarve (South) were the areas with the most significant frequency of substances in
    surface water. The relatively higher detection of substances downstream of the effluent discharge points compared with a
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step antibiotics_test_cosine_accuracy mpnet-base-smartbots/iter1_cosine_accuracy
-1 -1 0.75 0.9333

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}