view post Post 1672 The Concept behind xLSTM has recently turn into the xLSTM-7B model that showcase the performance in the category of the similar-scale Gemma 7B, LLama2 7B, FlaconMamba 7B but with higher performing Inference Kernel Model: NX-AI/xLSTM-7bPaper: https://arxiv.org/abs/2503.13427 See translation 1 reply ยท ๐ 5 5 ๐ 3 3 + Reply
view post Post 3829 Some things are simple See translation ๐ฅ 8 8 ๐ 4 4 ๐ง 3 3 ๐ 2 2 ๐ 2 2 ๐คฏ 1 1 ๐ค 1 1 ๐ 1 1 + Reply
view reply just following up on something earlier, did you ever get around to decreasing the model size and adding dropout?
view post Post 1609 DeepSeek R1 on how to build conscious AGIhttps://huggingface.co/blog/KnutJaegersberg/deepseek-r1-on-conscious-agi See translation 3 replies ยท ๐ค 5 5 ๐ 1 1 + Reply
view post Post 1889 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 4 replies ยท ๐ฅ 5 5 ๐ 1 1 ๐ 1 1 + Reply