Potential Inconsistencies Model and Datasets License
Hi, while reviewing the licenses for this model and datasets it depends on, I noticed a potential inconsistency that could cause confusion or legal risks in some situations.
Your model utilizes the dataset mlfoundations/datacomp_pools is licensed under the cc-by-4.0. However, the license of your model is mit, i.e., less strict than cc-by-4.0 on license terms, which may impact the whole license compatibility in your repository, thus confusing subsequent users and bringing possible legal and financial risks.
If possible, you can fix them in one of the following ways:
1.It could be helpful to select another proper license for your repository.
2.You may want to gently remind users that, in some cases, they should check both the model license and the base model license, especially when redistributing or modifying the model.
I do not speak for the organization, but I speak for myself, and my expertise in copyright law, which is vast.
The datasets may be licensed 'cc-by-4.0', but it does not hold that the model is a 'derivative work' of the dataset, because model weights are non expressive and on the opposite side of the 'idea/expression' dichotomy, and model weights which are something that is not a 'work of authorship' by a human, cannot be copyrighted whatsoever anyways.
@yueyangchen AFAIK there is no legal precedent on this one way or the other as of yet. The majority of model weights in existence likely have similar concerns. Will leave as status quo.
Additionally, I believe the license on the datacomp pools is probably refering only to their contribution on top of the raw images & captions, ie the currated pools and genarated metadata themselves. The actual images and captions almost certainly do not fall under cc-by-4.0 at that scale.
Also, speaking for myself and not the HF organization.