BEEspoke Data

community

https://www.bees.org/

Activity Feed

AI & ML interests

'an LLM is only as good as the dataset it was trained on' - Sun Tzu

Recent Activity

pszemraj updated a dataset about 1 month ago

BEE-spoke-data/govdocs1-pdf-source

pszemraj updated a dataset about 1 month ago

BEE-spoke-data/govdocs1-by-extension

amazingvince updated a dataset about 2 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

View all activity

pszemraj

updated 2 datasets about 1 month ago

BEE-spoke-data/govdocs1-pdf-source

Viewer • Updated Aug 3 • 235k • 5.84k • 2

BEE-spoke-data/govdocs1-by-extension

Viewer • Updated Jul 27 • 733k • 195 • 2

amazingvince

updated a dataset about 2 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Jul 14 • 11.7k • 138 • 3

pszemraj

updated a collection about 2 months ago

Survivor Library Books - OCR

Collection

Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs • 2 items • Updated Jul 14 • 5

pszemraj

updated 2 datasets 2 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Jul 14 • 11.7k • 138 • 3

BEE-spoke-data/SurvivorLib-rolmOCR

Viewer • Updated Jul 8 • 13.3k • 42 • 2

amazingvince

published a dataset 2 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Jul 14 • 11.7k • 138 • 3

pszemraj

published a dataset 2 months ago

BEE-spoke-data/SurvivorLib-rolmOCR

Viewer • Updated Jul 8 • 13.3k • 42 • 2

kenhktsui

authored a paper 2 months ago

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3 • 9

huu-ontocord

authored 2 papers 3 months ago

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Paper • 2505.20033 • Published May 26 • 4

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Paper • 2506.09827 • Published Jun 11 • 18

huu-ontocord

authored 3 papers 6 months ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 57

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Paper • 2412.15035 • Published Dec 19, 2024 • 4

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published Feb 26 • 20

qnguyen3

posted an update about 1 year ago

Post

4774

nanoLLaVA-1.5 is here! Same size (1B), better performance 🔥🔥🔥
And it is much more powerful than v1.0
Try it out now on HF Spaces: qnguyen3/nanoLLaVA
Model: qnguyen3/nanoLLaVA-1.5

3 replies

huu-ontocord

authored a paper over 1 year ago

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Paper • 2404.08676 • Published Apr 6, 2024 • 3

qnguyen3

posted an update over 1 year ago

Post

6161

🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻

Model: qnguyen3/nanoLLaVA 🔍
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝

1 reply

tanmaylaud

authored 2 papers over 1 year ago

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 43

ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English

Paper • 2301.04253 • Published Jan 11, 2023 • 1

AI & ML interests

Recent Activity

Team members 9

BEE-spoke-data's activity