llm course @ HSE and vk llm
A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness
Daniil Tsesarev
tsessk
AI & ML interests
transformers)
Recent Activity
updated
a model
about 2 months ago
tsessk/SmolLM2-FT-Summarization-Aligned
published
a model
about 2 months ago
tsessk/SmolLM2-FT-Summarization-Aligned
updated
a model
about 2 months ago
tsessk/SmolLM2-FT-Summarization
Organizations
None yet