INFO: 2024-07-14 10:23:20,130: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-14 10:23:20,131: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:20,132: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:21,238: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-14 10:23:21,239: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:21,239: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:24,579: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-14 10:23:24,580: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:24,580: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:25,247: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-14 10:23:25,259: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:25,259: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:27,051: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-14 10:23:27,051: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:27,051: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:28,704: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-14 10:23:28,704: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:28,704: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:30,887: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-14 10:23:30,887: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:23:30,887: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:23:34,905: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.02s INFO: 2024-07-14 10:23:40,350: llmtf.base.daru/treewayextractive: Loading Dataset: 11.65s INFO: 2024-07-14 10:23:41,282: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.15s INFO: 2024-07-14 10:23:44,620: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.57s INFO: 2024-07-14 10:24:47,423: llmtf.base.darumeru/ruMMLU: Loading Dataset: 86.18s INFO: 2024-07-14 10:26:47,793: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 202.53s INFO: 2024-07-14 10:27:33,250: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 248.67s INFO: 2024-07-14 10:29:44,105: llmtf.base.darumeru/MultiQ: Processing Dataset: 362.82s INFO: 2024-07-14 10:29:44,107: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-07-14 10:29:44,126: llmtf.base.darumeru/MultiQ: {'f1': 0.569265566933374, 'em': 0.4655831739961759} INFO: 2024-07-14 10:29:44,137: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:29:44,137: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:29:47,161: llmtf.base.darumeru/PARus: Loading Dataset: 3.02s INFO: 2024-07-14 10:29:59,369: llmtf.base.darumeru/PARus: Processing Dataset: 12.21s INFO: 2024-07-14 10:29:59,371: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-07-14 10:29:59,397: llmtf.base.darumeru/PARus: {'acc': 0.77} INFO: 2024-07-14 10:29:59,399: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:29:59,399: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:30:03,489: llmtf.base.darumeru/RCB: Loading Dataset: 4.09s INFO: 2024-07-14 10:30:09,527: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 394.62s INFO: 2024-07-14 10:30:09,530: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-07-14 10:30:09,551: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.854204461218866, 'len': 0.3139758726899384, 'lcs': 0.3140327089331601} INFO: 2024-07-14 10:30:09,554: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:30:09,554: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:30:09,713: llmtf.base.daru/treewayextractive: Processing Dataset: 389.35s INFO: 2024-07-14 10:30:09,714: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-07-14 10:30:09,932: llmtf.base.daru/treewayextractive: {'r-prec': 0.4072662337662338} INFO: 2024-07-14 10:30:09,977: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 10:30:09,982: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/cp_sent_ru 0.502 0.407 0.517 0.770 0.314 INFO: 2024-07-14 10:30:12,945: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.39s INFO: 2024-07-14 10:30:24,638: llmtf.base.darumeru/RCB: Processing Dataset: 21.15s INFO: 2024-07-14 10:30:24,641: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-07-14 10:30:24,676: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.4105113251051133} INFO: 2024-07-14 10:30:24,688: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:30:24,688: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:30:38,213: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 13.52s INFO: 2024-07-14 10:32:43,135: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 124.91s INFO: 2024-07-14 10:32:43,137: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-07-14 10:32:43,165: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7074742268041238, 'f1_macro': 0.7072263442662465} INFO: 2024-07-14 10:32:43,181: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:32:43,181: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:32:50,754: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.57s INFO: 2024-07-14 10:36:41,628: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 388.68s INFO: 2024-07-14 10:36:41,661: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-07-14 10:36:41,680: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.414480033245095, 'len': 0.24456935975609756, 'lcs': 0.2492914705708347} INFO: 2024-07-14 10:36:41,682: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:36:41,682: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:36:45,508: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.83s INFO: 2024-07-14 10:37:21,211: llmtf.base.darumeru/ruTiE: Processing Dataset: 270.44s INFO: 2024-07-14 10:37:21,212: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: INFO: 2024-07-14 10:37:21,241: llmtf.base.darumeru/ruTiE: {'acc': 0.42093023255813955} INFO: 2024-07-14 10:37:21,244: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:37:21,244: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:37:23,894: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.65s INFO: 2024-07-14 10:37:31,204: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.31s INFO: 2024-07-14 10:37:31,206: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-07-14 10:37:31,211: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8380952380952381, 'f1_macro': 0.8343115676204449} INFO: 2024-07-14 10:37:31,213: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:37:31,213: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:37:34,764: llmtf.base.darumeru/RWSD: Loading Dataset: 3.55s INFO: 2024-07-14 10:37:53,226: llmtf.base.darumeru/RWSD: Processing Dataset: 18.46s INFO: 2024-07-14 10:37:53,244: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-07-14 10:37:53,261: llmtf.base.darumeru/RWSD: {'acc': 0.49019607843137253} INFO: 2024-07-14 10:37:53,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:37:53,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:38:07,648: llmtf.base.darumeru/USE: Loading Dataset: 14.38s INFO: 2024-07-14 10:42:11,801: llmtf.base.darumeru/USE: Processing Dataset: 244.14s INFO: 2024-07-14 10:42:11,819: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-07-14 10:42:11,839: llmtf.base.darumeru/USE: {'grade_norm': 0.10882352941176472} INFO: 2024-07-14 10:42:11,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:42:11,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:42:30,996: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 19.15s INFO: 2024-07-14 10:43:13,353: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 985.56s INFO: 2024-07-14 10:43:13,356: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-07-14 10:43:13,402: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.696296 astronomy 0.697368 business_ethics 0.650000 clinical_knowledge 0.754717 college_biology 0.770833 college_chemistry 0.470000 college_computer_science 0.470000 college_mathematics 0.340000 college_medicine 0.647399 college_physics 0.500000 computer_security 0.800000 conceptual_physics 0.595745 econometrics 0.526316 electrical_engineering 0.655172 elementary_mathematics 0.441799 formal_logic 0.492063 global_facts 0.330000 high_school_biology 0.777419 high_school_chemistry 0.551724 high_school_computer_science 0.680000 high_school_european_history 0.769697 high_school_geography 0.808081 high_school_government_and_politics 0.891192 high_school_macroeconomics 0.653846 high_school_mathematics 0.392593 high_school_microeconomics 0.731092 high_school_physics 0.450331 high_school_psychology 0.849541 high_school_statistics 0.541667 high_school_us_history 0.857843 high_school_world_history 0.827004 human_aging 0.713004 human_sexuality 0.770992 international_law 0.851240 jurisprudence 0.759259 logical_fallacies 0.736196 machine_learning 0.517857 management 0.883495 marketing 0.888889 medical_genetics 0.790000 miscellaneous 0.831418 moral_disputes 0.719653 moral_scenarios 0.412291 nutrition 0.767974 philosophy 0.749196 prehistory 0.734568 professional_accounting 0.482270 professional_law 0.468709 professional_medicine 0.716912 professional_psychology 0.722222 public_relations 0.718182 security_studies 0.759184 sociology 0.865672 us_foreign_policy 0.870000 virology 0.572289 world_religions 0.818713 INFO: 2024-07-14 10:43:13,410: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.553473 humanities 0.707418 other (business, health, misc.) 0.694619 social sciences 0.763860 INFO: 2024-07-14 10:43:13,431: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6798423546157744} INFO: 2024-07-14 10:43:13,499: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 10:43:13,515: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU 0.492 0.407 0.517 0.770 0.412 0.490 0.109 0.245 0.314 0.707 0.421 0.836 0.680 INFO: 2024-07-14 10:45:03,288: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1215.85s INFO: 2024-07-14 10:45:03,291: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-07-14 10:45:03,300: llmtf.base.darumeru/ruMMLU: {'acc': 0.5003491968472513} INFO: 2024-07-14 10:45:03,379: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 10:45:03,395: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU 0.493 0.407 0.517 0.770 0.412 0.490 0.109 0.245 0.314 0.500 0.707 0.421 0.836 0.680 INFO: 2024-07-14 10:45:17,598: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 166.60s INFO: 2024-07-14 10:45:17,602: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-07-14 10:45:17,620: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7240760674560459, 'mcc': 0.36043904403572885} INFO: 2024-07-14 10:45:17,631: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 10:45:17,732: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.497 0.407 0.517 0.770 0.412 0.490 0.109 0.245 0.314 0.500 0.707 0.421 0.836 0.680 0.542 INFO: 2024-07-14 10:51:53,998: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1460.75s INFO: 2024-07-14 10:51:54,015: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-07-14 10:51:54,061: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.350000 anatomy 0.444444 astronomy 0.638158 business_ethics 0.630000 clinical_knowledge 0.581132 college_biology 0.569444 college_chemistry 0.410000 college_computer_science 0.430000 college_mathematics 0.340000 college_medicine 0.549133 college_physics 0.323529 computer_security 0.700000 conceptual_physics 0.527660 econometrics 0.438596 electrical_engineering 0.537931 elementary_mathematics 0.394180 formal_logic 0.420635 global_facts 0.330000 high_school_biology 0.658065 high_school_chemistry 0.433498 high_school_computer_science 0.660000 high_school_european_history 0.727273 high_school_geography 0.691919 high_school_government_and_politics 0.683938 high_school_macroeconomics 0.548718 high_school_mathematics 0.400000 high_school_microeconomics 0.525210 high_school_physics 0.357616 high_school_psychology 0.662385 high_school_statistics 0.504630 high_school_us_history 0.705882 high_school_world_history 0.742616 human_aging 0.560538 human_sexuality 0.625954 international_law 0.743802 jurisprudence 0.666667 logical_fallacies 0.558282 machine_learning 0.526786 management 0.757282 marketing 0.709402 medical_genetics 0.620000 miscellaneous 0.629630 moral_disputes 0.598266 moral_scenarios 0.392179 nutrition 0.643791 philosophy 0.617363 prehistory 0.583333 professional_accounting 0.375887 professional_law 0.384615 professional_medicine 0.503676 professional_psychology 0.503268 public_relations 0.572727 security_studies 0.669388 sociology 0.696517 us_foreign_policy 0.780000 virology 0.500000 world_religions 0.672515 INFO: 2024-07-14 10:51:54,068: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.486750 humanities 0.601033 other (business, health, misc.) 0.559637 social sciences 0.616552 INFO: 2024-07-14 10:51:54,076: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5659927988894412} INFO: 2024-07-14 10:51:54,155: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 10:51:54,169: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.501 0.407 0.517 0.770 0.412 0.490 0.109 0.245 0.314 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 10:59:26,805: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 1361.30s INFO: 2024-07-14 10:59:26,809: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-07-14 10:59:26,814: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.8209077531855185, 'len': 0.1755224609375, 'lcs': 0.1860311194855058} INFO: 2024-07-14 10:59:26,815: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 10:59:26,815: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 10:59:30,799: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.98s INFO: 2024-07-14 11:06:06,415: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-14 11:06:06,418: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:06,418: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:06,474: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-14 11:06:06,475: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:06,475: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:06,800: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-14 11:06:06,801: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:06,801: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:06,963: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-14 11:06:06,963: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:06,963: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:08,918: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-14 11:06:08,935: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:08,935: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:10,139: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-14 11:06:10,140: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:10,140: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:12,863: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-14 11:06:12,864: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:06:12,864: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-14 11:06:17,073: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.21s INFO: 2024-07-14 11:06:22,184: llmtf.base.daru/treewayextractive: Loading Dataset: 12.04s INFO: 2024-07-14 11:06:26,015: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.08s INFO: 2024-07-14 11:06:28,367: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.89s INFO: 2024-07-14 11:07:31,487: llmtf.base.darumeru/ruMMLU: Loading Dataset: 85.07s INFO: 2024-07-14 11:09:34,167: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 207.20s INFO: 2024-07-14 11:10:20,566: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 253.76s INFO: 2024-07-14 11:13:00,146: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-14 11:13:00,147: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:00,148: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:00,763: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-14 11:13:00,764: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:00,764: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:01,156: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-14 11:13:01,157: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:01,157: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:03,344: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-14 11:13:03,344: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:03,345: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:05,847: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-14 11:13:05,847: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:05,847: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:07,688: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-14 11:13:07,689: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:07,690: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:08,542: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-14 11:13:08,556: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:13:08,556: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:13:13,247: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.69s INFO: 2024-07-14 11:13:19,620: llmtf.base.daru/treewayextractive: Loading Dataset: 11.93s INFO: 2024-07-14 11:13:21,465: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.32s INFO: 2024-07-14 11:13:22,631: llmtf.base.daru/treewayabstractive: Loading Dataset: 16.78s INFO: 2024-07-14 11:14:26,160: llmtf.base.darumeru/ruMMLU: Loading Dataset: 85.40s INFO: 2024-07-14 11:16:29,276: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 205.93s INFO: 2024-07-14 11:17:11,778: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 250.62s INFO: 2024-07-14 11:19:17,117: llmtf.base.darumeru/MultiQ: Processing Dataset: 355.65s INFO: 2024-07-14 11:19:17,119: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-07-14 11:19:17,142: llmtf.base.darumeru/MultiQ: {'f1': 0.5670386205042036, 'em': 0.4646271510516252} INFO: 2024-07-14 11:19:17,153: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:19:17,153: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:19:20,859: llmtf.base.darumeru/PARus: Loading Dataset: 3.70s INFO: 2024-07-14 11:19:32,984: llmtf.base.darumeru/PARus: Processing Dataset: 12.11s INFO: 2024-07-14 11:19:32,999: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-07-14 11:19:33,016: llmtf.base.darumeru/PARus: {'acc': 0.77} INFO: 2024-07-14 11:19:33,019: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:19:33,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:19:37,041: llmtf.base.darumeru/RCB: Loading Dataset: 4.02s INFO: 2024-07-14 11:19:40,945: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 387.69s INFO: 2024-07-14 11:19:40,949: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-07-14 11:19:40,956: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.852724111665359, 'len': 0.3139758726899384, 'lcs': 0.3142940152947527} INFO: 2024-07-14 11:19:40,959: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:19:40,959: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:19:44,820: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.86s INFO: 2024-07-14 11:19:54,656: llmtf.base.daru/treewayextractive: Processing Dataset: 395.02s INFO: 2024-07-14 11:19:54,657: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-07-14 11:19:54,894: llmtf.base.daru/treewayextractive: {'r-prec': 0.4072662337662338} INFO: 2024-07-14 11:19:54,939: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:19:54,959: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.481 0.407 0.516 0.770 0.412 0.490 0.109 0.186 0.245 0.314 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 11:19:58,004: llmtf.base.darumeru/RCB: Processing Dataset: 20.95s INFO: 2024-07-14 11:19:58,006: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-07-14 11:19:58,016: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.4105113251051133} INFO: 2024-07-14 11:19:58,018: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:19:58,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:20:11,977: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 13.96s INFO: 2024-07-14 11:22:15,668: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 123.69s INFO: 2024-07-14 11:22:15,686: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-07-14 11:22:15,718: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7074742268041238, 'f1_macro': 0.7072263442662465} INFO: 2024-07-14 11:22:15,734: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:22:15,734: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:22:22,911: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.18s INFO: 2024-07-14 11:31:34,412: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-14 11:31:34,413: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:34,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:36,398: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-14 11:31:36,399: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:36,399: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:36,412: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-14 11:31:36,413: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:36,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:36,456: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-14 11:31:36,457: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:36,457: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:36,781: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-14 11:31:36,782: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:36,782: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:37,004: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-14 11:31:37,005: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:37,005: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:37,196: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-14 11:31:37,196: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:31:37,196: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:31:38,743: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.33s INFO: 2024-07-14 11:31:48,581: llmtf.base.daru/treewayextractive: Loading Dataset: 12.12s INFO: 2024-07-14 11:31:53,872: llmtf.base.daru/treewayabstractive: Loading Dataset: 16.87s INFO: 2024-07-14 11:31:57,994: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.59s INFO: 2024-07-14 11:33:00,585: llmtf.base.darumeru/ruMMLU: Loading Dataset: 84.17s INFO: 2024-07-14 11:35:00,486: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 203.70s INFO: 2024-07-14 11:35:45,376: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 248.18s INFO: 2024-07-14 11:37:58,433: llmtf.base.darumeru/MultiQ: Processing Dataset: 360.42s INFO: 2024-07-14 11:37:58,449: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-07-14 11:37:58,477: llmtf.base.darumeru/MultiQ: {'f1': 0.5675109413637138, 'em': 0.4655831739961759} INFO: 2024-07-14 11:37:58,487: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:37:58,488: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:38:02,069: llmtf.base.darumeru/PARus: Loading Dataset: 3.58s INFO: 2024-07-14 11:38:11,194: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 392.45s INFO: 2024-07-14 11:38:11,198: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-07-14 11:38:11,219: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8238989214765224, 'len': 0.9998130402818972, 'lcs': 0.9997733257303255} INFO: 2024-07-14 11:38:11,222: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:38:11,222: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:38:14,071: llmtf.base.darumeru/PARus: Processing Dataset: 12.00s INFO: 2024-07-14 11:38:14,073: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-07-14 11:38:14,103: llmtf.base.darumeru/PARus: {'acc': 0.77} INFO: 2024-07-14 11:38:14,105: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:38:14,105: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:38:15,216: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.99s INFO: 2024-07-14 11:38:17,615: llmtf.base.darumeru/RCB: Loading Dataset: 3.51s INFO: 2024-07-14 11:38:22,819: llmtf.base.daru/treewayextractive: Processing Dataset: 394.24s INFO: 2024-07-14 11:38:22,820: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-07-14 11:38:23,092: llmtf.base.daru/treewayextractive: {'r-prec': 0.4072662337662338} INFO: 2024-07-14 11:38:23,137: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:38:23,245: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.524 0.407 0.517 0.770 0.412 0.490 0.109 0.186 0.245 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 11:38:38,403: llmtf.base.darumeru/RCB: Processing Dataset: 20.75s INFO: 2024-07-14 11:38:38,418: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-07-14 11:38:38,456: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.4105113251051133} INFO: 2024-07-14 11:38:38,458: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:38:38,458: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:38:52,408: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 13.95s INFO: 2024-07-14 11:40:55,434: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 123.01s INFO: 2024-07-14 11:40:55,437: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-07-14 11:40:55,468: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7074742268041238, 'f1_macro': 0.7072263442662465} INFO: 2024-07-14 11:40:55,484: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:40:55,484: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:41:02,650: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.16s INFO: 2024-07-14 11:44:41,336: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 386.12s INFO: 2024-07-14 11:44:41,339: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-07-14 11:44:41,372: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424142837938139, 'len': 0.9984438516260162, 'lcs': 0.9974371974918181} INFO: 2024-07-14 11:44:41,375: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:44:41,375: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:44:44,557: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.18s INFO: 2024-07-14 11:45:27,448: llmtf.base.darumeru/ruTiE: Processing Dataset: 264.80s INFO: 2024-07-14 11:45:27,449: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: INFO: 2024-07-14 11:45:27,510: llmtf.base.darumeru/ruTiE: {'acc': 0.42093023255813955} INFO: 2024-07-14 11:45:27,513: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:45:27,513: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:45:30,838: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.32s INFO: 2024-07-14 11:45:38,023: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.18s INFO: 2024-07-14 11:45:38,024: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-07-14 11:45:38,045: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8380952380952381, 'f1_macro': 0.8343115676204449} INFO: 2024-07-14 11:45:38,046: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:45:38,046: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:45:41,831: llmtf.base.darumeru/RWSD: Loading Dataset: 3.78s INFO: 2024-07-14 11:45:59,953: llmtf.base.darumeru/RWSD: Processing Dataset: 18.12s INFO: 2024-07-14 11:45:59,971: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-07-14 11:45:59,977: llmtf.base.darumeru/RWSD: {'acc': 0.49019607843137253} INFO: 2024-07-14 11:45:59,979: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:45:59,979: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:46:14,583: llmtf.base.darumeru/USE: Loading Dataset: 14.60s INFO: 2024-07-14 11:50:15,530: llmtf.base.darumeru/USE: Processing Dataset: 240.93s INFO: 2024-07-14 11:50:15,533: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-07-14 11:50:15,541: llmtf.base.darumeru/USE: {'grade_norm': 0.1019607843137255} INFO: 2024-07-14 11:50:15,547: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 11:50:15,547: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 11:50:34,918: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 19.37s INFO: 2024-07-14 11:51:25,782: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 985.28s INFO: 2024-07-14 11:51:25,784: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-07-14 11:51:25,832: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.696296 astronomy 0.697368 business_ethics 0.650000 clinical_knowledge 0.754717 college_biology 0.770833 college_chemistry 0.470000 college_computer_science 0.470000 college_mathematics 0.340000 college_medicine 0.647399 college_physics 0.500000 computer_security 0.800000 conceptual_physics 0.595745 econometrics 0.526316 electrical_engineering 0.655172 elementary_mathematics 0.441799 formal_logic 0.492063 global_facts 0.330000 high_school_biology 0.777419 high_school_chemistry 0.551724 high_school_computer_science 0.680000 high_school_european_history 0.769697 high_school_geography 0.808081 high_school_government_and_politics 0.891192 high_school_macroeconomics 0.653846 high_school_mathematics 0.392593 high_school_microeconomics 0.731092 high_school_physics 0.450331 high_school_psychology 0.849541 high_school_statistics 0.541667 high_school_us_history 0.857843 high_school_world_history 0.827004 human_aging 0.713004 human_sexuality 0.770992 international_law 0.851240 jurisprudence 0.759259 logical_fallacies 0.736196 machine_learning 0.517857 management 0.883495 marketing 0.888889 medical_genetics 0.790000 miscellaneous 0.831418 moral_disputes 0.719653 moral_scenarios 0.412291 nutrition 0.767974 philosophy 0.749196 prehistory 0.734568 professional_accounting 0.482270 professional_law 0.468709 professional_medicine 0.716912 professional_psychology 0.722222 public_relations 0.718182 security_studies 0.759184 sociology 0.865672 us_foreign_policy 0.870000 virology 0.572289 world_religions 0.818713 INFO: 2024-07-14 11:51:25,840: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.553473 humanities 0.707418 other (business, health, misc.) 0.694619 social sciences 0.763860 INFO: 2024-07-14 11:51:25,916: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6798423546157744} INFO: 2024-07-14 11:51:25,986: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:51:26,157: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.571 0.407 0.517 0.770 0.412 0.490 0.102 0.186 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 11:53:16,466: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1215.85s INFO: 2024-07-14 11:53:16,498: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-07-14 11:53:16,524: llmtf.base.darumeru/ruMMLU: {'acc': 0.5003491968472513} INFO: 2024-07-14 11:53:16,602: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:53:16,635: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.571 0.407 0.517 0.770 0.412 0.490 0.102 0.186 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 11:53:18,760: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 163.84s INFO: 2024-07-14 11:53:18,763: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-07-14 11:53:18,803: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7240760674560459, 'mcc': 0.36043904403572885} INFO: 2024-07-14 11:53:18,814: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:53:18,828: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.571 0.407 0.517 0.770 0.412 0.490 0.102 0.186 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 11:59:54,999: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1449.62s INFO: 2024-07-14 11:59:55,003: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-07-14 11:59:55,049: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.350000 anatomy 0.444444 astronomy 0.638158 business_ethics 0.630000 clinical_knowledge 0.581132 college_biology 0.569444 college_chemistry 0.410000 college_computer_science 0.430000 college_mathematics 0.340000 college_medicine 0.549133 college_physics 0.323529 computer_security 0.700000 conceptual_physics 0.527660 econometrics 0.438596 electrical_engineering 0.537931 elementary_mathematics 0.394180 formal_logic 0.420635 global_facts 0.330000 high_school_biology 0.658065 high_school_chemistry 0.433498 high_school_computer_science 0.660000 high_school_european_history 0.727273 high_school_geography 0.691919 high_school_government_and_politics 0.683938 high_school_macroeconomics 0.548718 high_school_mathematics 0.400000 high_school_microeconomics 0.525210 high_school_physics 0.357616 high_school_psychology 0.662385 high_school_statistics 0.504630 high_school_us_history 0.705882 high_school_world_history 0.742616 human_aging 0.560538 human_sexuality 0.625954 international_law 0.743802 jurisprudence 0.666667 logical_fallacies 0.558282 machine_learning 0.526786 management 0.757282 marketing 0.709402 medical_genetics 0.620000 miscellaneous 0.629630 moral_disputes 0.598266 moral_scenarios 0.392179 nutrition 0.643791 philosophy 0.617363 prehistory 0.583333 professional_accounting 0.375887 professional_law 0.384615 professional_medicine 0.503676 professional_psychology 0.503268 public_relations 0.572727 security_studies 0.669388 sociology 0.696517 us_foreign_policy 0.780000 virology 0.500000 world_religions 0.672515 INFO: 2024-07-14 11:59:55,057: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.486750 humanities 0.601033 other (business, health, misc.) 0.559637 social sciences 0.616552 INFO: 2024-07-14 11:59:55,069: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5659927988894412} INFO: 2024-07-14 11:59:55,148: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 11:59:55,163: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.571 0.407 0.517 0.770 0.412 0.490 0.102 0.186 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 12:07:14,213: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 1349.64s INFO: 2024-07-14 12:07:14,218: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-07-14 12:07:14,240: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.965935841307524, 'len': 0.9998800850030358, 'lcs': 0.9964476909825747} INFO: 2024-07-14 12:07:14,242: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 12:07:14,242: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 12:07:18,918: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.68s INFO: 2024-07-14 12:07:58,685: llmtf.base.daru/treewayabstractive: Processing Dataset: 2164.81s INFO: 2024-07-14 12:07:58,687: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-07-14 12:07:58,692: llmtf.base.daru/treewayabstractive: {'rouge1': 0.35742299153264667, 'rouge2': 0.14485242187705508} INFO: 2024-07-14 12:07:58,696: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 12:07:58,708: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.600 0.251 0.407 0.517 0.770 0.412 0.490 0.102 0.996 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 12:29:54,865: llmtf.base.darumeru/cp_para_en: Processing Dataset: 1355.93s INFO: 2024-07-14 12:29:54,869: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-07-14 12:29:54,873: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.484972311760252, 'len': 0.999859659310879, 'lcs': 0.9881793213641535} INFO: 2024-07-14 12:29:54,874: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 12:29:54,887: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.621 0.251 0.407 0.517 0.770 0.412 0.490 0.102 0.988 0.996 0.998 1.000 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:12:57,969: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-14 14:12:57,971: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:12:57,971: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:12:58,529: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-14 14:12:58,529: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:12:58,529: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:12:59,265: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-14 14:12:59,266: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:12:59,266: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:13:01,346: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-14 14:13:01,347: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:13:01,347: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:13:04,501: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-14 14:13:04,501: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:13:04,501: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:13:05,549: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-14 14:13:05,549: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:13:05,549: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:13:07,592: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-14 14:13:07,592: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:13:07,592: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:13:12,035: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.44s INFO: 2024-07-14 14:13:17,777: llmtf.base.daru/treewayextractive: Loading Dataset: 12.23s INFO: 2024-07-14 14:13:18,790: llmtf.base.darumeru/MultiQ: Loading Dataset: 20.82s INFO: 2024-07-14 14:13:21,041: llmtf.base.daru/treewayabstractive: Loading Dataset: 16.54s INFO: 2024-07-14 14:14:23,369: llmtf.base.darumeru/ruMMLU: Loading Dataset: 84.84s INFO: 2024-07-14 14:16:24,369: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 203.02s INFO: 2024-07-14 14:17:09,995: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 250.73s INFO: 2024-07-14 14:19:20,099: llmtf.base.darumeru/MultiQ: Processing Dataset: 361.28s INFO: 2024-07-14 14:19:20,100: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-07-14 14:19:20,106: llmtf.base.darumeru/MultiQ: {'f1': 0.5648542039178045, 'em': 0.4608030592734226} INFO: 2024-07-14 14:19:20,117: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:19:20,117: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:19:23,169: llmtf.base.darumeru/PARus: Loading Dataset: 3.05s INFO: 2024-07-14 14:19:35,315: llmtf.base.darumeru/PARus: Processing Dataset: 12.09s INFO: 2024-07-14 14:19:35,318: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-07-14 14:19:35,332: llmtf.base.darumeru/PARus: {'acc': 0.77} INFO: 2024-07-14 14:19:35,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:19:35,334: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:19:39,019: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 386.98s INFO: 2024-07-14 14:19:39,021: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-07-14 14:19:39,026: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.886186230509937, 'len': 0.9638393987832617, 'lcs': 0.9997711394078869} INFO: 2024-07-14 14:19:39,030: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:19:39,030: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:19:39,650: llmtf.base.darumeru/RCB: Loading Dataset: 4.32s INFO: 2024-07-14 14:19:43,492: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.46s INFO: 2024-07-14 14:19:51,430: llmtf.base.daru/treewayextractive: Processing Dataset: 393.65s INFO: 2024-07-14 14:19:51,431: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-07-14 14:19:51,654: llmtf.base.daru/treewayextractive: {'r-prec': 0.4072662337662338} INFO: 2024-07-14 14:19:51,698: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:19:51,742: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.619 0.251 0.407 0.513 0.770 0.412 0.490 0.102 0.988 0.996 0.998 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:20:00,600: llmtf.base.darumeru/RCB: Processing Dataset: 20.95s INFO: 2024-07-14 14:20:00,602: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-07-14 14:20:00,610: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.4105113251051133} INFO: 2024-07-14 14:20:00,612: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:20:00,612: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:20:15,143: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.53s INFO: 2024-07-14 14:22:18,985: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 123.84s INFO: 2024-07-14 14:22:18,996: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-07-14 14:22:19,089: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7074742268041238, 'f1_macro': 0.7072263442662465} INFO: 2024-07-14 14:22:19,105: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:22:19,106: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:22:26,540: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.43s INFO: 2024-07-14 14:26:04,768: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 381.27s INFO: 2024-07-14 14:26:04,771: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-07-14 14:26:04,776: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.556837515131998, 'len': 0.9592170801454492, 'lcs': 0.9978536640150768} INFO: 2024-07-14 14:26:04,779: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:26:04,779: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:26:08,974: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.19s INFO: 2024-07-14 14:26:54,269: llmtf.base.darumeru/ruTiE: Processing Dataset: 267.71s INFO: 2024-07-14 14:26:54,270: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: INFO: 2024-07-14 14:26:54,301: llmtf.base.darumeru/ruTiE: {'acc': 0.42093023255813955} INFO: 2024-07-14 14:26:54,305: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:26:54,305: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:26:56,946: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.64s INFO: 2024-07-14 14:27:04,165: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.22s INFO: 2024-07-14 14:27:04,166: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-07-14 14:27:04,173: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8380952380952381, 'f1_macro': 0.8343115676204449} INFO: 2024-07-14 14:27:04,174: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:27:04,174: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:27:07,701: llmtf.base.darumeru/RWSD: Loading Dataset: 3.53s INFO: 2024-07-14 14:27:25,947: llmtf.base.darumeru/RWSD: Processing Dataset: 18.24s INFO: 2024-07-14 14:27:25,955: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-07-14 14:27:26,055: llmtf.base.darumeru/RWSD: {'acc': 0.49019607843137253} INFO: 2024-07-14 14:27:26,057: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:27:26,057: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:27:40,312: llmtf.base.darumeru/USE: Loading Dataset: 14.25s INFO: 2024-07-14 14:31:38,677: llmtf.base.darumeru/USE: Processing Dataset: 238.36s INFO: 2024-07-14 14:31:38,694: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-07-14 14:31:38,700: llmtf.base.darumeru/USE: {'grade_norm': 0.10588235294117647} INFO: 2024-07-14 14:31:38,707: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:31:38,707: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:31:58,236: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 19.53s INFO: 2024-07-14 14:32:44,406: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 980.03s INFO: 2024-07-14 14:32:44,409: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-07-14 14:32:44,455: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.696296 astronomy 0.697368 business_ethics 0.650000 clinical_knowledge 0.754717 college_biology 0.770833 college_chemistry 0.470000 college_computer_science 0.470000 college_mathematics 0.340000 college_medicine 0.647399 college_physics 0.500000 computer_security 0.800000 conceptual_physics 0.595745 econometrics 0.526316 electrical_engineering 0.655172 elementary_mathematics 0.441799 formal_logic 0.492063 global_facts 0.330000 high_school_biology 0.777419 high_school_chemistry 0.551724 high_school_computer_science 0.680000 high_school_european_history 0.769697 high_school_geography 0.808081 high_school_government_and_politics 0.891192 high_school_macroeconomics 0.653846 high_school_mathematics 0.392593 high_school_microeconomics 0.731092 high_school_physics 0.450331 high_school_psychology 0.849541 high_school_statistics 0.541667 high_school_us_history 0.857843 high_school_world_history 0.827004 human_aging 0.713004 human_sexuality 0.770992 international_law 0.851240 jurisprudence 0.759259 logical_fallacies 0.736196 machine_learning 0.517857 management 0.883495 marketing 0.888889 medical_genetics 0.790000 miscellaneous 0.831418 moral_disputes 0.719653 moral_scenarios 0.412291 nutrition 0.767974 philosophy 0.749196 prehistory 0.734568 professional_accounting 0.482270 professional_law 0.468709 professional_medicine 0.716912 professional_psychology 0.722222 public_relations 0.718182 security_studies 0.759184 sociology 0.865672 us_foreign_policy 0.870000 virology 0.572289 world_religions 0.818713 INFO: 2024-07-14 14:32:44,463: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.553473 humanities 0.707418 other (business, health, misc.) 0.694619 social sciences 0.763860 INFO: 2024-07-14 14:32:44,473: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6798423546157744} INFO: 2024-07-14 14:32:44,541: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:32:44,569: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.251 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.996 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:34:31,976: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1208.60s INFO: 2024-07-14 14:34:31,979: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-07-14 14:34:32,006: llmtf.base.darumeru/ruMMLU: {'acc': 0.5003491968472513} INFO: 2024-07-14 14:34:32,084: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:34:32,098: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.251 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.996 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:34:43,299: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 165.06s INFO: 2024-07-14 14:34:43,302: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-07-14 14:34:43,316: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7240760674560459, 'mcc': 0.36043904403572885} INFO: 2024-07-14 14:34:43,327: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:34:43,336: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.251 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.996 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:41:14,321: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1444.32s INFO: 2024-07-14 14:41:14,324: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-07-14 14:41:14,373: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.350000 anatomy 0.444444 astronomy 0.638158 business_ethics 0.630000 clinical_knowledge 0.581132 college_biology 0.569444 college_chemistry 0.410000 college_computer_science 0.430000 college_mathematics 0.340000 college_medicine 0.549133 college_physics 0.323529 computer_security 0.700000 conceptual_physics 0.527660 econometrics 0.438596 electrical_engineering 0.537931 elementary_mathematics 0.394180 formal_logic 0.420635 global_facts 0.330000 high_school_biology 0.658065 high_school_chemistry 0.433498 high_school_computer_science 0.660000 high_school_european_history 0.727273 high_school_geography 0.691919 high_school_government_and_politics 0.683938 high_school_macroeconomics 0.548718 high_school_mathematics 0.400000 high_school_microeconomics 0.525210 high_school_physics 0.357616 high_school_psychology 0.662385 high_school_statistics 0.504630 high_school_us_history 0.705882 high_school_world_history 0.742616 human_aging 0.560538 human_sexuality 0.625954 international_law 0.743802 jurisprudence 0.666667 logical_fallacies 0.558282 machine_learning 0.526786 management 0.757282 marketing 0.709402 medical_genetics 0.620000 miscellaneous 0.629630 moral_disputes 0.598266 moral_scenarios 0.392179 nutrition 0.643791 philosophy 0.617363 prehistory 0.583333 professional_accounting 0.375887 professional_law 0.384615 professional_medicine 0.503676 professional_psychology 0.503268 public_relations 0.572727 security_studies 0.669388 sociology 0.696517 us_foreign_policy 0.780000 virology 0.500000 world_religions 0.672515 INFO: 2024-07-14 14:41:14,381: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.486750 humanities 0.601033 other (business, health, misc.) 0.559637 social sciences 0.616552 INFO: 2024-07-14 14:41:14,391: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5659927988894412} INFO: 2024-07-14 14:41:14,470: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:41:14,485: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.251 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.996 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 14:48:17,879: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 1328.90s INFO: 2024-07-14 14:48:17,896: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-07-14 14:48:17,934: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9865072713630245, 'len': 0.989199175688307, 'lcs': 0.9976086956521739} INFO: 2024-07-14 14:48:17,936: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-07-14 14:48:17,936: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-07-14 14:48:22,484: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.55s INFO: 2024-07-14 14:49:28,678: llmtf.base.daru/treewayabstractive: Processing Dataset: 2167.63s INFO: 2024-07-14 14:49:28,679: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-07-14 14:49:28,685: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3516549334515103, 'rouge2': 0.1390946104887656} INFO: 2024-07-14 14:49:28,690: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 14:49:28,699: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.245 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.998 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-07-14 15:10:39,600: llmtf.base.darumeru/cp_para_en: Processing Dataset: 1337.10s INFO: 2024-07-14 15:10:39,604: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-07-14 15:10:39,639: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.528028725817485, 'len': 0.9872908812117563, 'lcs': 0.9883058202112522} INFO: 2024-07-14 15:10:39,641: llmtf.base.evaluator: Ended eval INFO: 2024-07-14 15:10:39,662: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.245 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.998 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-08-15 15:13:13,578: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru'] INFO: 2024-08-15 15:13:13,584: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-08-15 15:13:13,584: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-08-15 15:13:13,783: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-08-15 15:13:13,784: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-08-15 15:13:13,784: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-08-15 15:13:13,787: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en'] INFO: 2024-08-15 15:13:13,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-08-15 15:13:13,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-08-15 15:13:14,027: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_en'] INFO: 2024-08-15 15:13:14,027: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 198, 271] INFO: 2024-08-15 15:13:14,027: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-08-15 15:13:17,231: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.45s INFO: 2024-08-15 15:13:17,243: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.22s INFO: 2024-08-15 15:13:17,538: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.95s INFO: 2024-08-15 15:13:17,585: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.80s INFO: 2024-08-15 15:19:34,325: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 376.74s INFO: 2024-08-15 15:19:34,329: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-08-15 15:19:34,352: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.55679665642621, 'len': 0.9586421484801841, 'lcs': 0.9979674796747967} INFO: 2024-08-15 15:19:34,355: llmtf.base.evaluator: Ended eval INFO: 2024-08-15 15:19:34,393: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.245 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.998 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-08-15 15:19:45,060: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 387.52s INFO: 2024-08-15 15:19:45,063: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-08-15 15:19:45,085: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.886186230509937, 'len': 0.9638393987832617, 'lcs': 0.997946611909651} INFO: 2024-08-15 15:19:45,087: llmtf.base.evaluator: Ended eval INFO: 2024-08-15 15:19:45,096: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.617 0.245 0.407 0.513 0.770 0.412 0.490 0.106 0.988 0.998 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-08-15 15:25:05,511: llmtf.base.darumeru/cp_para_en: Processing Dataset: 708.27s INFO: 2024-08-15 15:25:05,558: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-08-15 15:25:05,628: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.5805091480824744, 'len': 0.9931879393823224, 'lcs': 1.0} INFO: 2024-08-15 15:25:05,629: llmtf.base.evaluator: Ended eval INFO: 2024-08-15 15:25:05,657: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.618 0.245 0.407 0.513 0.770 0.412 0.490 0.106 1.000 0.998 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542 INFO: 2024-08-15 15:25:55,376: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 758.14s INFO: 2024-08-15 15:25:55,378: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-08-15 15:25:55,389: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.8969006110233777, 'len': 0.9947877222260457, 'lcs': 1.0} INFO: 2024-08-15 15:25:55,390: llmtf.base.evaluator: Ended eval INFO: 2024-08-15 15:25:55,399: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.618 0.245 0.407 0.513 0.770 0.412 0.490 0.106 1.000 1.000 0.959 0.964 0.500 0.707 0.421 0.836 0.680 0.566 0.542