Did your team consider using an encodec + embedding model (e.g. like moshi?)

#27
by RonanMcGovern - opened

Thanks for releasing this model. I'm curious why you went with the encoder systems rather than going for a tokenised approach (would that be too slow)?

Also, the two-part transformer (think + talk) is quite unusual, did you try and just use one unified transformer there?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment