Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset irony
, split test
).
You can find a full version of scan report here.
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.74% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.217 |
Transform to uppercase |
170/782 tested samples (21.74%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
1 |
Just walked in to #Starbucks and asked for a "tall blonde" Hahahaha #irony |
JUST WALKED IN TO #STARBUCKS AND ASKED FOR A "TALL BLONDE" HAHAHAHA #IRONY |
irony (p = 0.65) |
non_irony (p = 0.78) |
9 |
People who tell people with anxiety to "just stop worrying about it" are my favorite kind of people #not #educateyourself |
PEOPLE WHO TELL PEOPLE WITH ANXIETY TO "JUST STOP WORRYING ABOUT IT" ARE MY FAVORITE KIND OF PEOPLE #NOT #EDUCATEYOURSELF |
irony (p = 0.87) |
non_irony (p = 0.51) |
10 |
Most important thing I've learned in school |
MOST IMPORTANT THING I'VE LEARNED IN SCHOOL |
irony (p = 0.91) |
non_irony (p = 0.71) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 14.43% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.144 |
Transform to title case |
113/783 tested samples (14.43%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
1 |
Just walked in to #Starbucks and asked for a "tall blonde" Hahahaha #irony |
Just Walked In To #Starbucks And Asked For A "Tall Blonde" Hahahaha #Irony |
irony (p = 0.65) |
non_irony (p = 0.54) |
21 |
The definition of #IRONY would be if a 77-year-old rapper went #viral and took #BITCOIN mainstream. Maybe only way #babyboomers will buy in. |
The Definition Of #Irony Would Be If A 77-Year-Old Rapper Went #Viral And Took #Bitcoin Mainstream. Maybe Only Way #Babyboomers Will Buy In. |
irony (p = 0.82) |
non_irony (p = 0.58) |
22 |
Pretty excited about how you gave up on me. File Under: #sarcasm |
Pretty Excited About How You Gave Up On Me. File Under: #Sarcasm |
irony (p = 0.51) |
non_irony (p = 0.80) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.25% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.132 |
Add typos |
95/717 tested samples (13.25%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
3 |
@user
He is exactly that sort of person. Weirdo! |
@user
He is exatvly that sort of person. Weirdo! |
non_irony (p = 0.79) |
irony (p = 0.81) |
22 |
Pretty excited about how you gave up on me. File Under: #sarcasm |
Pretty xecite dabout how you gave up kn me. File Under: #sarcasm |
irony (p = 0.51) |
non_irony (p = 0.78) |
27 |
How dare Charles Barkley have an intelligent conversation about race. #sarcasm #CharlesBarkley |
How dare Charles Barkley have an intelilgent onvsersation about race. #wrcasm #CarlesBarkley |
non_irony (p = 0.64) |
irony (p = 0.69) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.31% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.093 |
Punctuation Removal |
59/634 tested samples (9.31%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Punctuation Removal(text) |
Original prediction |
Prediction after perturbation |
23 |
Who told the #hipsters that #irony was a thing of the Clinton years? Do they not carry history books in used bookstores in #brooklyn ? |
Who told the #hipsters that #irony was a thing of the Clinton years Do they not carry history books in used bookstores in #brooklyn |
non_irony (p = 0.65) |
irony (p = 0.94) |
39 |
On the train and surrounded by posh people, I'm so at home! #not #stickoutlikeasorethumb |
On the train and surrounded by posh people I m so at home #not #stickoutlikeasorethumb |
irony (p = 0.66) |
non_irony (p = 0.81) |
40 |
Stupid #doctors visits is gonna bury me!! Now that's #irony |
Stupid #doctors visits is gonna bury me Now that s #irony |
irony (p = 0.55) |
non_irony (p = 0.77) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.47% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.055 |
Transform to lowercase |
39/713 tested samples (5.47%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to lowercase(text) |
Original prediction |
Prediction after perturbation |
19 |
@user
Guess they didn't get the memo reg non-nuclear Baltic sea #sarcasm |
@user
guess they didn't get the memo reg non-nuclear baltic sea #sarcasm |
non_irony (p = 0.51) |
irony (p = 0.60) |
22 |
Pretty excited about how you gave up on me. File Under: #sarcasm |
pretty excited about how you gave up on me. file under: #sarcasm |
irony (p = 0.51) |
non_irony (p = 0.54) |
30 |
Nooooooooooo again it's on!!! #PickANewSong #CantStandIt |
nooooooooooo again it's on!!! #pickanewsong #cantstandit |
non_irony (p = 0.73) |
irony (p = 0.75) |
👉Performance issues (3)
For records in the dataset where text
contains "user", the Recall is 34.98% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "user" |
Recall = 0.366 |
-34.98% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
19 |
@user
Guess they didn't get the memo reg non-nuclear Baltic sea #sarcasm |
irony |
non_irony (p = 0.51) |
25 |
@user
hmm... let me think about that #sarcasm |
irony |
non_irony (p = 0.91) |
47 |
@user
180 dead on 26/11 n more than 10k our ppl killed in terror attacks till date but not 1 paki show sympathy 2 them #irony |
irony |
non_irony (p = 0.71) |
For records in the dataset where text
contains "irony", the Accuracy is 27.73% lower than the global Accuracy.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "irony" |
Accuracy = 0.531 |
-27.73% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
23 |
Who told the #hipsters that #irony was a thing of the Clinton years? Do they not carry history books in used bookstores in #brooklyn ? |
irony |
non_irony (p = 0.65) |
47 |
@user
180 dead on 26/11 n more than 10k our ppl killed in terror attacks till date but not 1 paki show sympathy 2 them #irony |
irony |
non_irony (p = 0.71) |
65 |
#Irony RT
@user
If you're going to give someone a scathing, 1-Star review for poor grammar, FFS use proper grammar. |
irony |
non_irony (p = 0.71) |
For records in the dataset where text
contains "sarcasm", the Accuracy is 12.15% lower than the global Accuracy.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "sarcasm" |
Accuracy = 0.645 |
-12.15% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
4 |
So much #sarcasm at work mate 10/10 #boring 100% #dead mate full on #shit absolutely #sleeping mate can't handle the #sarcasm |
irony |
non_irony (p = 0.93) |
6 |
People complain about my backround pic and all I feel is like "hey don't blame me, Albert E might have spoken those words" #sarcasm #life |
irony |
non_irony (p = 0.73) |
19 |
@user
Guess they didn't get the memo reg non-nuclear Baltic sea #sarcasm |
irony |
non_irony (p = 0.51) |
Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.