File size: 27,018 Bytes
1d4388e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
---
library_name: setfit
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
metrics:
- accuracy
widget:
- text: ' '
- text: quantitative algorithmic hustle trading dot com
- text: cryptoart since early 2020 founder of ENCODE_graphics red_heart EARTH
- text: 'Chief Legal Officer krakenfx Not your lawyer Assumptions opinions prevarications
and predictions are mine not my employers '
- text: 'Chief of Staff at Remilia Corporation remiliacorp333 Warlord Commander at
YAYO Corporation YayoCorp THIS IS NOT A PROMISE OF EQUITY OR OWNERSHIP IN ANYTHING '
pipeline_tag: text-classification
inference: true
base_model: BAAI/bge-small-en-v1.5
model-index:
- name: SetFit with BAAI/bge-small-en-v1.5
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Unknown
type: unknown
split: test
metrics:
- type: accuracy
value: 0.4891640866873065
name: Accuracy
---
# SetFit with BAAI/bge-small-en-v1.5
This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned Sentence Transformer.
## Model Details
### Model Description
- **Model Type:** SetFit
- **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
- **Maximum Sequence Length:** 512 tokens
- **Number of Classes:** 27 classes
<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
### Model Labels
| Label | Examples |
|:---------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NFT_ARTIST | <ul><li>'Onchain music cooprecordsxyz invinmusic cooprecsmusic Los Angeles'</li><li>'Artist etc Chicago'</li><li>'Where NFTs meet DeFi on SecretNetwork Are you Legendary nfts gamefi defi SCRT LGND Secret Network'</li></ul> |
| UNDETERMINED | <ul><li>'100 readerfunded writer All works free to republish bootleg use or cp All works coauthored by Tim Foley Patreon Australia'</li><li>'Pro skater father husband videogame character CEO philanthropist public skatepark advocate Old AF and still skating San Diego world at large'</li><li>'Sandy and blue '</li></ul> |
| DEVELOPER | <ul><li>'BUIDL HODL CELR CelerNetwork BNB bnb48club BTC ETC ATOM No gods no masters only hashes Shadowy Coder trolling by day coding by night Metalhead Cosmos'</li><li>'crosspost to Farcaster Bluesky Twitter Lens Threads decentralized social in one feed iOS '</li><li>'VP of Engineering at avara space nerd my astrobin profile Opinions strictly my own '</li></ul> |
| EXECUTIVE | <ul><li>'Epic Games founder and CEO '</li><li>'CEO compoundfinance San Francisco USA'</li><li>'DeFi FounderUnited_States bornraised Bahai Los Angeles'</li></ul> |
| INFLUENCER | <ul><li>'That rug really tied the room together weekly news channel on hiatus '</li><li>'Conecto con personas de habla hispana con perfil propio dentro de bitcoin y comparto su valor preguntasbtc para respuestas en 24h lunaticoingetalbycom 2515 4D1C 6C36 C024'</li><li>'sols 学习新事物 不带偏见的去看币圈 macau'</li></ul> |
| BUSINESS_DEVELOPER | <ul><li>'Chief of Growth fuel_network Ex0xMantle Ex0xPolygon Tea connoisseur Dog Lover Web3 Degen Views are my own Metaverse'</li><li>'Experience Bitcoin like never before '</li><li>'Bitcoin Blockchain bitcoinmining Bitpay Founded the Original WomenInBitcoin etc Los Angeles '</li></ul> |
| TRADER | <ul><li>'your greater fool goblin town'</li><li>'Commander in chiefing Periodically 1 ranked trader on ByBit Zaza City'</li><li>'Ethereum Maximalist Synthetix Spartan '</li></ul> |
| ONCHAIN_ANALYST | <ul><li>'technical and onchain analyst crypto stock real estate investor global head of news beincrypto spreading alpha United States'</li><li>'Cofounder reflexivityres acq by defitechglobal Using velodata New York NY'</li><li>'Cofounder ensuser research y2z_ventures Full time onchain moron ethereum'</li></ul> |
| RESEARCHER | <ul><li>'Lets skip witty repartee discuss fundamental questions Views are mine not GMUs or Virginias Books Fairfax VA'</li><li>'research paradigm '</li><li>'Crypto Data Research 0x'</li></ul> |
| INVESTOR | <ul><li>'peer to peer electronic cash enthusiast light__nh hethey '</li><li>'GoldLover '</li><li>'I enjoy business innovation lifelong learning to ChangeTheWorld to help others Entrepreneur interim CxO investor adventurer thinker doer NO DMs Global citizen'</li></ul> |
| SECURITY_AUDITOR | <ul><li>'security researcher nascentsecurity EVM Enthusiast Gas Optimizoor Puzzle Cracker Fan of all things Static Analysis Fuzzing Symbolic Execution '</li><li>'think bad do good cofounder openpathsec los angeles'</li><li>'Head of GTM CyfrinAudits Ex Lead Dev Rel AlchemyPlatform Created cyfrinupdraft and AlchemyLearn Making web3 mainstream Ethereum'</li></ul> |
| EDUCATOR | <ul><li>'Professor of Practice at Harvard Teaches Ec 10 some tweets might be educational Also Senior Fellow PIIE Was Chair of President Obamas CEA Cambridge MA'</li><li>'Jarrête des carrières Je vulgarise et décortique à vos côtés les nouvelles tokenomics et les influvoleurs crypto de notre époque Bitcoin'</li><li>' Bretton Woods NH'</li></ul> |
| LAWYER | <ul><li>'Author of Digital Money Demystified DickinsonLaw AdvantageEvans AtTechIntersect Crypto IP Law As seen on Coindesk TV Yahoo Finance Bloomberg CNBC Nomad Team '</li><li>'UVa Vanderbilt Law Your guide to other worlds Crypto l Metaverse Web3 Not legal or financial advice I am A lawyerjust not YOUR lawyer USA'</li><li>'The Crypto Lawyer Юрист Rechtsanwältin محامية Advising Entrepreneurs Investors and Governments on Bitcoin Crypto since 2016 Contributor Forbes UAE Switzerland '</li></ul> |
| ADVISOR | <ul><li>'Director of Government Relations at BlockchainAssn Author of the Token Taxonomy Act Former WarrenDavidson and Board of Advisors JoinSeedstarter Washington DC'</li><li>'Calculated Degen 2x cancer survivor 5x rug pull survivor Paper hands diamond wrist Building WumboLabs Advisoooor arcade_xyz '</li><li>'Doggfather Analytics Founderorange_square OrdData '</li></ul> |
| COMMUNITY_MANAGER | <ul><li>'Contributor to the Optimism Collective OP'</li><li>'ecosystem growth indexcoop music NFT enjoyer wavWRLD_ not financial advice typos are my owm '</li><li>'Founder KryptoSeoul ericaplanet Ericaverse Organizer buidl_asia eth_seoul_ Seoul Chapter Lead She__Fi Alum Stanford Ewha Where is Erica'</li></ul> |
| MARKETER | <ul><li>'Positivity Pusher CoFounder PurpleHorizons Future Tech Marketing Strategist Trend Spotter Storyteller Once a DJ Always a DJ Miami FL'</li><li>'elissa emm Head of Marketing at spruceid building decentralized identity Seattle WA'</li><li>'Marketing Superfluid_HQ Safaryclub solhotgirlclub She__Fi Cohort 9 Words in banklesshq PFP miladymaker 104 NFA Views are my own Brooklyn NY'</li></ul> |
| ANGEL_INVESTOR | <ul><li>'Developer entrepreneur angel investor crypto enthusiast '</li><li>'larp LawliettesLab angel uvocapital '</li><li>'cofounder jokerace_io ecodao_ write on web3 angel thecowfund berlin'</li></ul> |
| VENTURE_CAPITALIST | <ul><li>'visionary at core playful at the surface just launched GetCohosts WalkinEvents prev fabric_vc nothingnyc London UK New York USA'</li><li>'Crypto web3 Partner ColliderVC Standing on the shoulders of giants World State'</li><li>'startup investor and builder founder w_conviction before GP greylockVC accelerating AI adoption tech podcastchains'</li></ul> |
| NFT_COLLECTOR | <ul><li>'FINE BITCOIN GOODS Get in THE BANTER Scarce City'</li><li>'Cofounder RKOTax omega based spicymargeth '</li><li>'Like a shadow following the light Time is actually another dimension Nikennftyeth Niken32lens bcard id 275 Multiverse of Madness'</li></ul> |
| BLOGGER | <ul><li>'Reporter at Bloomberg business covering crypto blockchain companies Formerly CoinDesk DM open Opinions are my own New York USA'</li><li>'viamirror '</li><li>'senior writer NFT lead BanklessHQ '</li></ul> |
| METAVERSE_ENTHUSIAST | <ul><li>'SMOL by Treasure_DAO Smolverse'</li><li>'Epic SciFi MMO strategy game from Pixelmatic ExordiumHQ Take command of a fleet of spaceships and fight for humanity NOW Sol System'</li><li>'Time to post tweets and save lives Creative Director PlayShadowWar Where dreams come true'</li></ul> |
| FINANCIAL_ANALYST | <ul><li>'Editor of FTAlphaville Norwegian despite the Harry Potteresque name Author of TRILLIONS Views mine bla bla Oslo Norway'</li><li>'markets macro business anchor of 10am ET and ETF IQ on bloombergtv haverfordedu columbiajourn alum ktkaos on InstaThreads Opinions mine Midtown East Manhattan'</li><li>'Curious on how behavioral fallacies challenge financial markets and cryptos Always learning new things getting to know new people and having a bit of fun London England'</li></ul> |
| DATA_SCIENTIST | <ul><li>' '</li><li>'Data Wizardry variantfund Chicago IL'</li><li>'NLP ML StatArb Math Bowdoincollege Team Doobro_CN Prev first intern Bybit_Official Plucking a feather from every goose but follow no one absolute New York NY'</li></ul> |
| NODE_OPERATOR | <ul><li>'Founder ClayStack_HQ Building Liquid Staking long before they were called LSDs Running validator nodes at Vibing ClayClanDAO Metaverse'</li><li>'Restake ETH Never Worry about EigenLayer Caps EigenLayer'</li><li>'HonigdachsPod cohost Making Bitcoin green today netposmon Find me on nostr npub1cear2n95zcyze86s5hry2a0pdgs7euhnc0p7ewcq2284pp845t5szt8rhr '</li></ul> |
| SHITCOINER | <ul><li>'Eternity belongs to those who live in the present I tweet once per week when Im pooping Results in occasional shitposting '</li><li>'Lets hold hands and be enemies enemieswithbenefitseth '</li><li>'16th Chair of the Central Bank of Retards When I see chaos forming on the timeline I rush in to shitpost adding fuel to the fire Hyperbolic Time Chamber'</li></ul> |
| MINER | <ul><li>' bitcoin beyonder economic futurist metagame winner rose'</li><li>'Steady lads deploying more hashrate Hashrate merchant luxortechnology btc Miami'</li><li>'SVP foundryservices I am a miner like my father before me previously greenidge_GREE Bitcoin '</li></ul> |
| DATA_ANALYST | <ul><li>'Director of Research at proof_xyz Building charts that make NFTs a bit easier to understand '</li><li>'Shadowy mediocre Analyst tangent_xyz '</li><li>'Lead Analyst CryptoSlate Previously Saidler Bitcoin London'</li></ul> |
## Evaluation
### Metrics
| Label | Accuracy |
|:--------|:---------|
| **all** | 0.4892 |
## Uses
### Direct Use for Inference
First install the SetFit library:
```bash
pip install setfit
```
Then you can load this model and run inference.
```python
from setfit import SetFitModel
# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("kasparas12/crypto_individual_infer_model_setfit")
# Run inference
preds = model(" ")
```
<!--
### Downstream Use
*List how someone could finetune this model on their own dataset.*
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Set Metrics
| Training set | Min | Median | Max |
|:-------------|:----|:--------|:----|
| Word count | 2 | 13.6494 | 65 |
| Label | Training Sample Count |
|:---------------------|:----------------------|
| DEVELOPER | 702 |
| DATA_SCIENTIST | 34 |
| DATA_ANALYST | 8 |
| NODE_OPERATOR | 18 |
| MINER | 22 |
| SECURITY_AUDITOR | 129 |
| INVESTOR | 212 |
| ANGEL_INVESTOR | 84 |
| VENTURE_CAPITALIST | 467 |
| TRADER | 168 |
| SHITCOINER | 34 |
| BUSINESS_DEVELOPER | 306 |
| BUSINESS_ANALYST | 0 |
| COMMUNITY_MANAGER | 122 |
| MARKETER | 70 |
| FINANCIAL_ANALYST | 32 |
| ADVISOR | 79 |
| RESEARCHER | 227 |
| ONCHAIN_ANALYST | 29 |
| EXECUTIVE | 393 |
| INFLUENCER | 510 |
| LAWYER | 47 |
| BLOGGER | 55 |
| NFT_COLLECTOR | 174 |
| NFT_ARTIST | 312 |
| EDUCATOR | 134 |
| METAVERSE_ENTHUSIAST | 57 |
| UNDETERMINED | 740 |
### Training Hyperparameters
- batch_size: (64, 64)
- num_epochs: (1, 1)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 20
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
### Training Results
| Epoch | Step | Training Loss | Validation Loss |
|:------:|:----:|:-------------:|:---------------:|
| 0.0011 | 1 | 0.2537 | - |
| 0.0562 | 50 | 0.2412 | - |
| 0.1124 | 100 | 0.2242 | - |
| 0.1685 | 150 | 0.2066 | - |
| 0.2247 | 200 | 0.1811 | - |
| 0.2809 | 250 | 0.205 | - |
| 0.3371 | 300 | 0.1789 | - |
| 0.3933 | 350 | 0.1831 | - |
| 0.4494 | 400 | 0.1829 | - |
| 0.5056 | 450 | 0.1506 | - |
| 0.5618 | 500 | 0.1474 | - |
| 0.6180 | 550 | 0.0989 | - |
| 0.6742 | 600 | 0.1094 | - |
| 0.7303 | 650 | 0.1316 | - |
| 0.7865 | 700 | 0.1207 | - |
| 0.8427 | 750 | 0.1262 | - |
| 0.8989 | 800 | 0.1229 | - |
| 0.9551 | 850 | 0.0989 | - |
| 0.0003 | 1 | 0.2061 | - |
| 0.0155 | 50 | 0.2073 | - |
| 0.0310 | 100 | 0.1844 | - |
| 0.0465 | 150 | 0.1891 | - |
| 0.0619 | 200 | 0.1975 | - |
| 0.0774 | 250 | 0.1772 | - |
| 0.0929 | 300 | 0.2304 | - |
| 0.1084 | 350 | 0.2085 | - |
| 0.1239 | 400 | 0.1851 | - |
| 0.1394 | 450 | 0.1463 | - |
| 0.1548 | 500 | 0.1216 | - |
| 0.1703 | 550 | 0.1648 | - |
| 0.1858 | 600 | 0.1359 | - |
| 0.2013 | 650 | 0.163 | - |
| 0.2168 | 700 | 0.1563 | - |
| 0.2323 | 750 | 0.2 | - |
| 0.2478 | 800 | 0.1425 | - |
| 0.2632 | 850 | 0.1614 | - |
| 0.2787 | 900 | 0.1881 | - |
| 0.2942 | 950 | 0.133 | - |
| 0.3097 | 1000 | 0.1348 | - |
| 0.3252 | 1050 | 0.1256 | - |
| 0.3407 | 1100 | 0.1065 | - |
| 0.3561 | 1150 | 0.0932 | - |
| 0.3716 | 1200 | 0.122 | - |
| 0.3871 | 1250 | 0.0969 | - |
| 0.4026 | 1300 | 0.1386 | - |
| 0.4181 | 1350 | 0.1116 | - |
| 0.4336 | 1400 | 0.0866 | - |
| 0.4491 | 1450 | 0.084 | - |
| 0.4645 | 1500 | 0.1073 | - |
| 0.4800 | 1550 | 0.1065 | - |
| 0.4955 | 1600 | 0.1063 | - |
| 0.5110 | 1650 | 0.1235 | - |
| 0.5265 | 1700 | 0.0918 | - |
| 0.5420 | 1750 | 0.078 | - |
| 0.5574 | 1800 | 0.1358 | - |
| 0.5729 | 1850 | 0.0664 | - |
| 0.5884 | 1900 | 0.1123 | - |
| 0.6039 | 1950 | 0.0996 | - |
| 0.6194 | 2000 | 0.0471 | - |
| 0.6349 | 2050 | 0.1068 | - |
| 0.6504 | 2100 | 0.0933 | - |
| 0.6658 | 2150 | 0.0836 | - |
| 0.6813 | 2200 | 0.0858 | - |
| 0.6968 | 2250 | 0.0421 | - |
| 0.7123 | 2300 | 0.08 | - |
| 0.7278 | 2350 | 0.0902 | - |
| 0.7433 | 2400 | 0.0949 | - |
| 0.7587 | 2450 | 0.116 | - |
| 0.7742 | 2500 | 0.0733 | - |
| 0.7897 | 2550 | 0.101 | - |
| 0.8052 | 2600 | 0.0709 | - |
| 0.8207 | 2650 | 0.079 | - |
| 0.8362 | 2700 | 0.0706 | - |
| 0.8517 | 2750 | 0.0338 | - |
| 0.8671 | 2800 | 0.0812 | - |
| 0.8826 | 2850 | 0.063 | - |
| 0.8981 | 2900 | 0.075 | - |
| 0.9136 | 2950 | 0.081 | - |
| 0.9291 | 3000 | 0.1264 | - |
| 0.9446 | 3050 | 0.0766 | - |
| 0.9600 | 3100 | 0.0873 | - |
| 0.9755 | 3150 | 0.0512 | - |
| 0.9910 | 3200 | 0.0816 | - |
### Framework Versions
- Python: 3.9.16
- SetFit: 1.0.3
- Sentence Transformers: 2.2.2
- Transformers: 4.21.3
- PyTorch: 1.12.1+cu116
- Datasets: 2.4.0
- Tokenizers: 0.12.1
## Citation
### BibTeX
```bibtex
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |