SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
UNDETERMINED
  • 'Professor Emeritus of Cognitive Sciences at the University of California Irvine Research Visual perception evolutionary psychology consciousness AI Irvine CA'
  • 'Emeritus Professor of War Studies Kings College London just published Command The Politics of Military Operations from Korea to Ukraine UK Penguin US OUP '
  • 'XML apologist Erlang enthusiast Currently JVMs Performance stuff at Netflix Previously JVMs performative stuff at Twitter Hehim San Francisco California'
NFT_ARTIST
  • 'Artist Web3 Marketing Advisor Educator Making history everyday Trapped in the blockchain'
  • 'OwnYourAssets TokenGatedFile Access For CrossPlatformInteroperableGaming C5isComing CYBΞRVΞRSΞ'
  • 'Pronounced Akossya artist Zurich'
ONCHAIN_ANALYST
  • 'I write about onchain stuff fixer AleoHQ prev rabbithole_gg and plenty of DAOs youve heard of '
  • 'cofounder 3pochLabs onchain'
  • 'onchain data farcer building mosaicdrops media CryptoSapiens_ OntologyNetwork OrangeProtocol banklessDAO s0 _buildspace s4 Mosaicverse'
BUSINESS_DEVELOPER
  • 'Prev opensea TheBlock__ amazon '
  • 'Building HxroNetwork variable'
  • 'Building something old CoFounder alongsidefi '
NFT_COLLECTOR
  • 'Building glitchmarfa Collecting brightopps prev brtmoments '
  • 'My soul is a cat My two children rpcnftclub ChainFeedsxyz Bangkok'
  • 'prev OpenSea NYC'
DEVELOPER
  • 'Architect DoraHacks DoraFactory The everlasting hacker movement Menlo Park'
  • 'Engineer at Inria scikitlearn developer supported by Python and Machine Learning Between Vannes Paris France'
  • 'Working paritytech on substrate Views are my own I working mostly with rustlang nowadays '
TRADER
  • 'Applied game theorist blog occasionally at formerly not a very serious person Scott Alexander '
  • 'Crypto Trading Bitcoin class of 2013 insilicotrading COO Banana Cabana'
  • 'token maxi '
COMMUNITY_MANAGER
  • 'chutzpah controlled chaos connoisseur arbitrum chinshilling chinchillin thoughts are my own Rio de Janeiro Brazil'
  • 'commonsstack CoFounder tecmns Founding Steward KERNEL0x KB5 trustedseed tamaralens '
  • 'Community Admin at The Arbitrum Foundation Helping to scale Ethereum at Arbitrum Feed KOL Binance WEB3'
SECURITY_AUDITOR
  • 'founder adjacentfi cofounder former auditor osec_io MEV on solana '
  • 'Security Researcher Googles Threat Analysis Group 0days all day Love all things bytes assembly and glitter sheher '
  • '採用マーケ得意仮想通貨エンジニア4社1社ホワイトハッカーとして月110万達成現在歯科衛生士の妻と事業開始 実績年商1億超えのマーケ担当 開始5ヶ月で6名見学開始2年で累計DH11名見学6名採用 ハイライト要チェック ブログに今までの有益投稿をまとめました 岩手長野福岡ドバイ沖縄'
VENTURE_CAPITALIST
  • 'Liquid Crypto Brevan Howard Prev dragonfly_xyz consensys Arena'
  • 'maverick LA'
  • 'Founder of SavvyBooks Degen dcv_capital Summoner ElasticDAO metafam Judge code4rena Contributor CantoPublic Nomadic'
INVESTOR
  • 'Crypto Investor at Tephra Digital Ex Head of Research Grayscale DCGco FMR Head of Digital Asset Strategy Fundstrat New York NY'
  • 'Capital Allocators New York NY'
  • 'Director of Research Autonomous Technology Robotics ARKinvest Automation robotics energy storage alternative energy and space Disclosure New York NY'
ANGEL_INVESTOR
  • 'larp LawliettesLab angel uvocapital '
  • 'Initiator inverternetwork I Angel Investor I ex Gitcoin '
  • 'VP Head of BD AleoHQ Mainnet Launch Soon Strategic Advisor VoxiesNFT Angel Investor rcsdao ExOP ExCoinbase Professionally CuriousOpinions My Own Manhattan NY'
EXECUTIVE
  • 'Chief Strategy Marketing Officer of Liquidity Group Im also the cofounder of Hudson Rock RockHudsonRock a cybercrime intelligence company TelAviv'
  • 'CEO Polymarket Ethereum since 14 I love music and collect art new york'
  • 'CEO StartaleHQ Founder AstarNetwork All things for Web3 for billions Japanese Sota_Web3 Earth'
MARKETER
  • 'Director General en Kayum comparador de seguros insurance PPC tech crypto f1 Mexico City Mexico'
  • 'Insights about Web3 data economy and AI by oceanprotocol Currently in Marcom oceanprotocol ocean Ocean '
  • 'f加速 ethereum China internet culture history podcast growth marketing realmasknetwork prev newsbreakapp smartnews Zuzalu human Palo Alto USA'
DATA_SCIENTIST
  • 'data uniswap prev theTIEIO go bears New York NY'
  • 'engineering data science a16zcrypto '
  • 'LangChainAI previously robusthq kensho MLOps Generative AI sports analytics '
EDUCATOR
  • ' London'
  • 'MSc Immunology student Past cofounder prof director USF Center Applied Data Ethics math PhD math_rachelmastodonsocial sheher Brisbane Australia'
  • 'Here to build shared intelligence listen learn share via community tokenengineering KERNEL0x OptimismGov publicgoods education valuesmatter CyberDyn0x tauranga teikaamaui'
INFLUENCER
  • 'the destroyer Titan'
  • 'Healthy life style healthier bags Cape Town South Africa'
  • 'Beauty Brains Bitcoin Beauty in an anonymous world'
ADVISOR
  • 'A decentralized onchain governance consultant Health Wealth RunItUp The only Alpha discord youll ever need to joingametheoryweb3 squanchland Profit Land'
  • 'Design director Startup Advisor Midjourney Sharing learnings and prompts In my free time working on offscreenai Vancouver Canada'
  • 'I help fix and grow crypto portfolios through premium research and strategies 1000 members Founder cshift_io Podcast benandbergs Join 10k Crypto Investors '
BLOGGER
  • 'NOW Editor Forbes Writer Stripe HarvardBiz Back on Twitter after ignoring it for a decade I will try my best London'
  • 'larp coindesk '
  • ' '
RESEARCHER
  • 'Roblox Chief Scientist UWaterloo McGill Prof morgan3dbsky Known for NVIDIA Unity Graphics Codex Markdeep G3D Skylanders E Ink Titan Quest Williams Ontario Canada'
  • 'Simple human Simple life I am trying to do good around me Empathy creativity inspiration ArigatōMerci For ever apprenti researcher Nulle part ailleurs Nowhere'
  • 'Research community And we have our own NFT collection Telegram'
METAVERSE_ENTHUSIAST
  • 'fluent speaker of http and color virtual world evangelist game developer painter writer cj5 driver San Diego'
  • 'Blockchain Gaming Evangelist CritTheory Gaming CoFounder Earth'
  • 'We are a peeple obsessed recruiting service collective Treating everyone like a DMs checked infrequently Metaverse'
NODE_OPERATOR
  • 'into protocools and shitposting at nodeguardians '
  • ' CoFounder of onivalidator Filmmaker People Maxi Los Angeles CA'
  • 'I attest to block 247 Hobby involves the occasional block proposal Have commercial agreements with the MEV trade association Members of Sync Committees Los Angeles'
LAWYER
  • 'Law professor at Cal BerkeleyLaw Berkeley California'
  • 'IP litigator first sale doctrine respecter schedule a disrespecter wife mom to the tiny boss likes design patents needlework yarn new hampshire'
  • 'Lawyer FINTConsulting TechPolicy E4EProject upcoming GRC CybersecurityAnalyst ex InstituteGC Tweet law tech policy GRC Cybersecurity Decentralized'
DATA_ANALYST
  • 'Llama pilot at and '
  • 'blockchain data opensea kqian on Dune my views are my own dyor nfa data only wagmi open sea'
  • 'Blockchain analyst Cat and dog dad Taylor Swift fan Army veteran Pittsburgh PA'
MINER
  • 'Blockchain bitcoin mining since 2011 analyst 35 years in IT UnixNetwork engineer fpgachip design exCIO Bitfury BitfuryGroup LNSegWit taproot California USA'
  • 'Founder and CEO of Austin TX'
  • '在币圈捡矿泉水瓶子的人 0xb38544ccf295d78b7ae7b2bae5dbebdb1f09910dcrossbell Member of 33daoweb3 Metaverse'
SHITCOINER
  • 'Degen ETH and SOL lover '
  • 'VMPX mrjacklevin Draculaborg'
  • 'gripto alt notapornfolder_ '
FINANCIAL_ANALYST
  • 'Enrolled Agent Crypto Enthusiast Tax EXPERT StackingSats Chopping Tax Since 2016 NoSatoshiLeftBehind hodlmore payless crypto taxes Longmont CO'
  • 'Politico financial services editor zwarmbrodtpoliticocom zacharywarmbrodtprotonmailcom Washington DC'
  • 'Im just lookin for clues at the scene of the crime Sedona Arizona'
BUSINESS_ANALYST
  • 'Biz Analyst by day web3crypto learner by nightweekend Optimistic about Crypto FanVajpayeeji NaMo M Andreessen E Musk C Dixon Balaji S web3SF Bay Area'

Evaluation

Metrics

Label Accuracy
all 0.5565

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("kasparas12/crypto_individual_infer_model_setfit")
# Run inference
preds = model("producer business and elsewhere  on leave  views my own la gran manzana")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 13.3415 65
Label Training Sample Count
DEVELOPER 2111
DATA_SCIENTIST 93
DATA_ANALYST 25
NODE_OPERATOR 71
MINER 47
SECURITY_AUDITOR 352
INVESTOR 484
ANGEL_INVESTOR 160
VENTURE_CAPITALIST 941
TRADER 270
SHITCOINER 88
BUSINESS_DEVELOPER 917
BUSINESS_ANALYST 1
COMMUNITY_MANAGER 401
MARKETER 190
FINANCIAL_ANALYST 72
ADVISOR 150
RESEARCHER 691
ONCHAIN_ANALYST 45
EXECUTIVE 741
INFLUENCER 834
LAWYER 137
BLOGGER 198
NFT_COLLECTOR 335
NFT_ARTIST 598
EDUCATOR 281
METAVERSE_ENTHUSIAST 132
UNDETERMINED 2216

Training Hyperparameters

  • batch_size: (64, 64)
  • num_epochs: (1, 1)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0001 1 0.2625 -
0.0064 50 0.2677 -
0.0127 100 0.2515 -
0.0191 150 0.2413 -
0.0254 200 0.2374 -
0.0318 250 0.2383 -
0.0381 300 0.222 -
0.0445 350 0.1972 -
0.0509 400 0.2268 -
0.0572 450 0.2333 -
0.0636 500 0.199 -
0.0699 550 0.2035 -
0.0763 600 0.1676 -
0.0827 650 0.1566 -
0.0890 700 0.1909 -
0.0954 750 0.189 -
0.1017 800 0.1872 -
0.1081 850 0.1576 -
0.1144 900 0.1382 -
0.1208 950 0.1603 -
0.1272 1000 0.155 -
0.1335 1050 0.1764 -
0.1399 1100 0.1506 -
0.1462 1150 0.1439 -
0.1526 1200 0.1581 -
0.1590 1250 0.1494 -
0.1653 1300 0.1622 -
0.1717 1350 0.1503 -
0.1780 1400 0.1094 -
0.1844 1450 0.1576 -
0.1907 1500 0.1194 -
0.1971 1550 0.1515 -
0.2035 1600 0.1662 -
0.2098 1650 0.1642 -
0.2162 1700 0.0943 -
0.2225 1750 0.1472 -
0.2289 1800 0.1622 -
0.2352 1850 0.0809 -
0.2416 1900 0.1623 -
0.2480 1950 0.1444 -
0.2543 2000 0.1304 -
0.2607 2050 0.1175 -
0.2670 2100 0.078 -
0.2734 2150 0.1189 -
0.2798 2200 0.141 -
0.2861 2250 0.1233 -
0.2925 2300 0.1446 -
0.2988 2350 0.1076 -
0.3052 2400 0.1016 -
0.3115 2450 0.0818 -
0.3179 2500 0.1384 -
0.3243 2550 0.1065 -
0.3306 2600 0.1029 -
0.3370 2650 0.1227 -
0.3433 2700 0.0982 -
0.3497 2750 0.0959 -
0.3561 2800 0.0851 -
0.3624 2850 0.1028 -
0.3688 2900 0.1136 -
0.3751 2950 0.1111 -
0.3815 3000 0.115 -
0.3878 3050 0.1183 -
0.3942 3100 0.0689 -
0.4006 3150 0.1004 -
0.4069 3200 0.1079 -
0.4133 3250 0.112 -
0.4196 3300 0.0758 -
0.4260 3350 0.09 -
0.4323 3400 0.1267 -
0.4387 3450 0.1024 -
0.4451 3500 0.1352 -
0.4514 3550 0.0681 -
0.4578 3600 0.0483 -
0.4641 3650 0.0937 -
0.4705 3700 0.0744 -
0.4769 3750 0.0926 -
0.4832 3800 0.0764 -
0.4896 3850 0.0814 -
0.4959 3900 0.108 -
0.5023 3950 0.0936 -
0.5086 4000 0.0687 -
0.5150 4050 0.0607 -
0.5214 4100 0.0829 -
0.5277 4150 0.0772 -
0.5341 4200 0.0309 -
0.5404 4250 0.0797 -
0.5468 4300 0.063 -
0.5532 4350 0.071 -
0.5595 4400 0.0667 -
0.5659 4450 0.121 -
0.5722 4500 0.0565 -
0.5786 4550 0.0915 -
0.5849 4600 0.0613 -
0.5913 4650 0.0479 -
0.5977 4700 0.0622 -
0.6040 4750 0.0687 -
0.6104 4800 0.0635 -
0.6167 4850 0.1233 -
0.6231 4900 0.0351 -
0.6295 4950 0.0717 -
0.6358 5000 0.0906 -
0.6422 5050 0.0712 -
0.6485 5100 0.1133 -
0.6549 5150 0.0757 -
0.6612 5200 0.0809 -
0.6676 5250 0.112 -
0.6740 5300 0.0893 -
0.6803 5350 0.0591 -
0.6867 5400 0.0872 -
0.6930 5450 0.0937 -
0.6994 5500 0.038 -
0.7057 5550 0.0793 -
0.7121 5600 0.0569 -
0.7185 5650 0.0861 -
0.7248 5700 0.1022 -
0.7312 5750 0.0759 -
0.7375 5800 0.0451 -
0.7439 5850 0.08 -
0.7503 5900 0.058 -
0.7566 5950 0.0423 -
0.7630 6000 0.043 -
0.7693 6050 0.109 -
0.7757 6100 0.072 -
0.7820 6150 0.0342 -
0.7884 6200 0.0833 -
0.7948 6250 0.0643 -
0.8011 6300 0.1069 -
0.8075 6350 0.0713 -
0.8138 6400 0.0807 -
0.8202 6450 0.0518 -
0.8266 6500 0.0796 -
0.8329 6550 0.0954 -
0.8393 6600 0.0709 -
0.8456 6650 0.0541 -
0.8520 6700 0.0503 -
0.8583 6750 0.0737 -
0.8647 6800 0.0931 -
0.8711 6850 0.0636 -
0.8774 6900 0.0579 -
0.8838 6950 0.1168 -
0.8901 7000 0.0751 -
0.8965 7050 0.0945 -
0.9028 7100 0.0396 -
0.9092 7150 0.0623 -
0.9156 7200 0.0641 -
0.9219 7250 0.0697 -
0.9283 7300 0.0675 -
0.9346 7350 0.0544 -
0.9410 7400 0.0803 -
0.9474 7450 0.0549 -
0.9537 7500 0.0612 -
0.9601 7550 0.0721 -
0.9664 7600 0.0692 -
0.9728 7650 0.07 -
0.9791 7700 0.0476 -
0.9855 7750 0.0673 -
0.9919 7800 0.0606 -
0.9982 7850 0.1001 -

Framework Versions

  • Python: 3.9.16
  • SetFit: 1.0.3
  • Sentence Transformers: 2.2.2
  • Transformers: 4.21.3
  • PyTorch: 1.12.1+cu116
  • Datasets: 2.4.0
  • Tokenizers: 0.12.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
23
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kasparas12/crypto_individual_infer_model_setfit

Finetuned
(134)
this model

Evaluation results