5 1 14

Trong Vu

tattrongvu

AI & ML interests

LLM, Reinforcement Learning, Robotics, Self-driving car, Computer Vision

Recent Activity

reacted to tomaarsen's post with 🔥 8 days ago

I just released Sentence Transformers v4.1; featuring ONNX and OpenVINO backends for rerankers offering 2-3x speedups and improved hard negatives mining which helps prepare stronger training datasets. Details: 🏎️ ONNX, OpenVINO, Optimization, Quantization - I've added ONNX and OpenVINO support with just one extra argument: "backend" when loading the CrossEncoder reranker, e.g.: `CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")` - The `export_optimized_onnx_model`, `export_dynamic_quantized_onnx_model`, and `export_static_quantized_openvino_model` functions now work with CrossEncoder rerankers, allowing you to optimize (e.g. fusions, gelu approximations, etc.) or quantize (int8 weights) rerankers. - I've uploaded ~340 ONNX & OpenVINO models for all existing models under the cross-encoder Hugging Face organization. You can use these without having to export when loading. ⛏ Improved Hard Negatives Mining - Added 'absolute_margin' and 'relative_margin' arguments to `mine_hard_negatives`. - `absolute_margin` ensures that `sim(query, negative) < sim(query, positive) - absolute_margin`, i.e. an absolute margin between the negative & positive similarities. - `relative_margin` ensures that `sim(query, negative) < sim(query, positive) * (1 - relative_margin)`, i.e. a relative margin between the negative & positive similarities. - Inspired by the excellent NV-Retriever paper from NVIDIA. And several other small improvements. Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v4.1.0 With this release, I introduce near-feature parity between the SentenceTransformer embedding & CrossEncoder reranker models, which I've wanted to do for quite some time! With rerankers very strongly supported now, it's time to look forward to other useful architectures!

liked a model 27 days ago

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

updated a Space about 1 month ago

tsystems/visual_document_retrieval

View all activity

Organizations

tattrongvu's activity

reacted to tomaarsen's post with 🔥 8 days ago

Post

2585

I just released Sentence Transformers v4.1; featuring ONNX and OpenVINO backends for rerankers offering 2-3x speedups and improved hard negatives mining which helps prepare stronger training datasets. Details:

🏎️ ONNX, OpenVINO, Optimization, Quantization
- I've added ONNX and OpenVINO support with just one extra argument: "backend" when loading the CrossEncoder reranker, e.g.: CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")
- The export_optimized_onnx_model, export_dynamic_quantized_onnx_model, and export_static_quantized_openvino_model functions now work with CrossEncoder rerankers, allowing you to optimize (e.g. fusions, gelu approximations, etc.) or quantize (int8 weights) rerankers.
- I've uploaded ~340 ONNX & OpenVINO models for all existing models under the cross-encoder Hugging Face organization. You can use these without having to export when loading.

⛏ Improved Hard Negatives Mining
- Added 'absolute_margin' and 'relative_margin' arguments to mine_hard_negatives.
- absolute_margin ensures that sim(query, negative) < sim(query, positive) - absolute_margin, i.e. an absolute margin between the negative & positive similarities.
- relative_margin ensures that sim(query, negative) < sim(query, positive) * (1 - relative_margin), i.e. a relative margin between the negative & positive similarities.
- Inspired by the excellent NV-Retriever paper from NVIDIA.

And several other small improvements. Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v4.1.0

With this release, I introduce near-feature parity between the SentenceTransformer embedding & CrossEncoder reranker models, which I've wanted to do for quite some time! With rerankers very strongly supported now, it's time to look forward to other useful architectures!

liked a model 27 days ago

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • Updated Feb 12 • 9.79k • 32

updated a Space about 1 month ago

Visual Document Retrieval

📉

Demo for multimodal embedding models

updated a collection about 1 month ago

ColQwen2

Collection

Collection of multimodal embedding model base on ColBert & Qwen2-VL • 4 items • Updated Mar 10

updated a collection about 2 months ago

ColQwen2

Collection

Collection of multimodal embedding model base on ColBert & Qwen2-VL • 4 items • Updated Mar 10

updated 2 models about 2 months ago

tsystems/colqwen2.5-3b-base

Updated Mar 9 • 4.57k

tsystems/colqwen2.5-3b-multilingual-v1.0

Visual Document Retrieval • Updated Mar 9 • 11.8k • 6

updated a collection about 2 months ago

ColQwen2 Multimodal Embedding

Collection

Collection of multimodal embedding models base on QwenVL and ColBert • 6 items • Updated Mar 9

updated a model about 2 months ago

tsystems/colqwen2.5-3b-multilingual-v1.0-merged

Visual Document Retrieval • Updated Mar 9 • 49

published a model about 2 months ago

tsystems/colqwen2.5-3b-multilingual-v1.0

Visual Document Retrieval • Updated Mar 9 • 11.8k • 6

updated a collection about 2 months ago

ColQwen2 Multimodal Embedding

Collection

Collection of multimodal embedding models base on QwenVL and ColBert • 6 items • Updated Mar 9

published 2 models about 2 months ago

tsystems/colqwen2.5-3b-multilingual-v1.0-merged

Visual Document Retrieval • Updated Mar 9 • 49

tsystems/colqwen2.5-3b-base

Updated Mar 9 • 4.57k

reacted to openfree's post with 🚀 about 2 months ago

Post

7896

Datasets Convertor 🚀

openfree/Datasets-Convertor

Welcome to Datasets Convertor, the cutting-edge solution engineered for seamless and efficient data format conversion. Designed with both data professionals and enthusiasts in mind, our tool simplifies the transformation process between CSV, Parquet, and JSONL, XLS file formats, ensuring that your data is always in the right shape for your next analytical or development challenge. 💻✨

Why Choose Datasets Convertor?
In today’s data-driven world, managing and converting large datasets can be a daunting task. Our converter is built on top of robust technologies like Pandas and Gradio, delivering reliable performance with a modern, intuitive interface. Whether you’re a data scientist, analyst, or developer, Datasets Convertor empowers you to effortlessly switch between formats while maintaining data integrity and optimizing storage.

Key Features and Capabilities:
CSV ⇆ Parquet Conversion:
Easily transform your CSV files into the highly efficient Parquet format and vice versa. Parquet’s columnar storage not only reduces file size but also accelerates query performance—a critical advantage for big data analytics. 🔄📂

CSV to JSONL Conversion:
Convert CSV files to JSONL (newline-delimited JSON) to facilitate efficient, line-by-line data processing. This format is particularly useful for streaming data applications, logging systems, and scenarios where incremental data processing is required. Each CSV row is meticulously converted into an individual JSON record, preserving all the metadata and ensuring compatibility with modern data pipelines. 📄➡️📝

Parquet to JSONL Conversion:
For those working with Parquet files, our tool offers a streamlined conversion to JSONL.

Parquet to XLS Conversion.