|
--- |
|
language: |
|
- en |
|
tags: |
|
- roberta |
|
- marketing mix |
|
- multi-label |
|
- classification |
|
- microblog |
|
- tweets |
|
|
|
widget: |
|
- text: "Best cushioning ever!!! ๐ค๐ค๐ค my zoom vomeros are the bomb๐๐ฝโโ๏ธ๐จ!!! @nike #run #training" |
|
- text: "Why is @BestBuy always sold-out of Apple's new airpods in their online shop ๐คฏ๐ก?" |
|
- text: "Theyโre closing the @Aldo at the Lehigh Vally Mall and KOP ๐ญ" |
|
- text: "@Sonyโs XM3โs ainโt as sweet as my broโs airpod pros but got a real steal ๐ค the other day #deal #headphonez" |
|
- text: "Nike needs to sponsor more e-sports atheletes with Air Jordans! #nike #esports" |
|
- text: "Say what you want about @Abercrombie's 90s shirtless males ads, they made dang good woll sweaters back in the day. This is one of 3 I have from the late 90s." |
|
- text: "To celebrate this New Year, @Nordstrom is DOUBLING all donations up to $25,000! ๐ Your donation will help us answer 2X the calls, texts, and chats that come in, and allow us to train 2X more volunteers!" |
|
- text: "It's inspiring to see religious leaders speaking up for workers' rights and fair wages. Every voice matters in the #FightFor15! ๐ช๐ฝโ๐ผ #Solidarity #WorkersRights" |
|
--- |
|
# Model Card for: mmx_classifier_microblog_ENv02 |
|
Multi-label classifier that identifies which marketing mix variable(s) a microblog post pertains to. |
|
|
|
Version: 0.2 from August 16, 2023 |
|
|
|
## Model Details |
|
You can use this classifier to determine which of the 4P's of marketing, also known as marketing mix variables, a microblog post (e.g., Tweet) pertains to: |
|
|
|
1. Product |
|
2. Place |
|
3. Price |
|
4. Promotion |
|
|
|
### Model Description |
|
This classifier is a fine-tuned checkpoint of [cardiffnlp/twitter-roberta-large-2022-154m] (https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m). |
|
It was trained on 15K Tweets that mentioned at least one of 699 brands. The Tweets were first cleaned and then labeled using OpenAI's GPT4. |
|
|
|
Because this is a multi-label classification problem, we use binary cross-entropy (BCE) with logits loss for the fine-tuning. We basically combine a sigmoid layer with BCELoss in a single class. |
|
To obtain the probabilities for each label (i.e., marketing mix variable), you need to "push" the predictions through a sigmoid function. This is already done in the accompanying python notebook. |
|
|
|
***IMPORTANT*** At the time of writing this description, Huggingface's pipeline did not support multi-label classifiers. |
|
|
|
### Working Paper |
|
Download the working paper from SSRN: ["Creating Synthetic Experts with Generative AI"](https://papers.ssrn.com/abstract_id=4542949) |
|
|
|
### Quickstart |
|
```python |
|
# Imports |
|
import pandas as pd, numpy as np, warnings, torch, re |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
from bs4 import BeautifulSoup |
|
warnings.filterwarnings("ignore", category=UserWarning, module='bs4') |
|
# Helper Functions |
|
def clean_and_parse_tweet(tweet): |
|
tweet = re.sub(r"https?://\S+|www\.\S+", " URL ", tweet) |
|
parsed = BeautifulSoup(tweet, "html.parser").get_text() if "filename" not in str(BeautifulSoup(tweet, "html.parser")) else None |
|
return re.sub(r" +", " ", re.sub(r'^[.:]+', '', re.sub(r"\\n+|\n+", " ", parsed or tweet)).strip()) if parsed else None |
|
def predict_tweet(tweet, model, tokenizer, device, threshold=0.5): |
|
inputs = tokenizer(tweet, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device) |
|
probs = torch.sigmoid(model(**inputs).logits).detach().cpu().numpy()[0] |
|
return probs, [id2label[i] for i, p in enumerate(probs) if id2label[i] in {'Product', 'Place', 'Price', 'Promotion'} and p >= threshold] |
|
# Setup |
|
device = "mps" if torch.backends.mps.is_built() and torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu" |
|
synxp = "dmr76/mmx_classifier_microblog_ENv02" |
|
model = AutoModelForSequenceClassification.from_pretrained(synxp).to(device) |
|
tokenizer = AutoTokenizer.from_pretrained(synxp) |
|
id2label = model.config.id2label |
|
# ---->>> Define your Tweet <<<---- |
|
tweet = "Best cushioning ever!!! ๐ค๐ค๐ค my zoom vomeros are the bomb๐๐ฝโโ๏ธ๐จ!!! \n @nike #run #training https://randomurl.ai" |
|
# Clean and Predict |
|
cleaned_tweet = clean_and_parse_tweet(tweet) |
|
probs, labels = predict_tweet(cleaned_tweet, model, tokenizer, device) |
|
# Print Labels and Probabilities |
|
print("Please don't forget to cite the paper: https://ssrn.com/abstract=4542949 in you use this code") |
|
print(labels, probs) |
|
``` |
|
*Conveniently predict thousands tweets with the ***batch processing python notebook***, available in my* [GitHub Repository](https://github.com/dringel/Synthetic-Experts) |
|
|
|
### Citation |
|
Please cite the following reference if you use synthetic experts in your work: |
|
``` |
|
Ringel, Daniel, Creating Synthetic Experts with Generative Artificial Intelligence (July 15, 2023). Available at SSRN: https://ssrn.com/abstract=4542949 |
|
``` |
|
|
|
### Additional Ressources |
|
[www.synthetic-experts.ai](http://www.synthetic-experts.ai) |
|
[GitHub Repository](https://github.com/dringel/Synthetic-Experts) |
|
|
|
|
|
|
|
|