License Conflict: MIT vs CC BY-NC-SA 3.0
Hi, Iβd like to report a potential license conflict in Jean-Baptiste/roberta-large-financial-news-sentiment-en
. Based on the model card and documentation, this model is licensed under the permissive MIT License. However, the training dataset used Jean-Baptiste/financial_news_sentiment_mixte_with_phrasebank_75
is distributed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0).
This creates a potential license mismatch, because CC BY-NC-SA 3.0 includes restrictions on commercial use and requires derivative works to adopt a compatible ShareAlike license, both of which are incompatible with the permissive nature of the MIT License.
β οΈ Key incompatibilities between MIT and CC BY-NC-SA 3.0:
CC BY-NC-SA 3.0 β Core Clauses:
β’ Prohibits commercial use of the dataset and derivative works
β’ Requires derivatives to carry the same or compatible license (ShareAlike)
β’ Requires attribution to the original dataset creators
MIT License β Core Features:
β’ Allows unrestricted commercial use
β’ Permits sublicensing and proprietary use
β’ Does not preserve upstream licensing conditions (e.g., SA or NC)
This might confuse downstream users regarding:
β’ Whether commercial use is permitted (MIT suggests yes, data license says no)
β’ Whether attribution or share-alike terms apply
β’ Whether the model as published is in full compliance with the dataset terms
Since machine learning models can be treated as derivative works of their training data, the obligations of CC BY-NC-SA 3.0 β especially the NonCommercial and ShareAlike requirements β may extend to the model itself.
πΉ Suggestion:
To help align the licensing of the model with upstream data terms, here are a few options to consider:
1. Clarify in the README or model card that the model was trained on a CC BY-NC-SA 3.0 dataset, and that its use is therefore limited to non-commercial purposes.
2. Consider replacing the MIT License with a more restrictive or custom license that reflects the NonCommercial and ShareAlike requirements.
3. Include proper attribution for the training dataset, per the dataset license requirements.
4. If commercial redistribution or usage is intended, retraining the model on commercially licensed datasets would be necessary.
Hope this helps! Let me know if you have any questions or need more info.
Thanks for your attention!
Thanks, it's fixed now.