AskBit FAQ Retriever

A fast, interpretable FAQ retriever using bit vector encoding of SBERT sentence embeddings combined with a binary KNN classifier. This repository hosts a model artifact from the AskBit project.

πŸ“š This model was created as part of an educational journey exploring efficient semantic FAQ matching with bitwise vector representations and KNN classification.

  • πŸ”’ Uses SBERT (all-MiniLM-L6-v2) to embed question-answer pairs as dense semantic vectors.
  • 🧠 Converts dense embeddings into binarized bit vectors for fast similarity search.
  • ⚑ Uses a K-Nearest Neighbors classifier with Hamming distance over bit vectors.
  • πŸ’‘ Fully open source, efficient, and suitable for lightweight semantic FAQ retrieval.
  • πŸ—‚οΈ Model file: model.pkl
  • πŸ“„ Training data file: faq.json

πŸ“ Files in This Repository

File Description
model.pkl Trained KNN classifier model over SBERT-based bit vectors.
faq.json FAQ question-answer dataset used for training and evaluation.
requirements.txt Python dependencies to load and use the model.
README.md Model usage instructions, background, and examples.

🧠 How It Works

Semantic Bit Vector Encoding (SbertBitEncoder)

  • Uses the Sentence-BERT model (all-MiniLM-L6-v2) to generate dense semantic embeddings of entire question-answer pairs.
  • Embeddings capture meaningful sentence-level semantics, enabling effective retrieval beyond simple word overlap.
  • Each dense embedding vector is binarized by thresholding (e.g., bits set to 1 if value > 0) to produce a compact, fixed-length bit vector.
  • Both the FAQ entries and queries are encoded this way, ensuring semantic similarity maps to bitwise proximity.

Binary K-Nearest Neighbors Classifier (FAQClassifier)

  • Implements a KNN classifier using Hamming distance as the similarity metric on bit vectors.
  • Learns to associate bit-encoded queries with their corresponding answers.
  • Supports retrieving the best matching answer or top-k candidates with similarity scores.

πŸš€ Usage Example

import pickle
import numpy as np

# Load the trained model artifact
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

# Bit vector input: binarized SBERT embeddings (e.g., 384-bit vector)
query_vec = np.array([1, 0, 1, 1, 0, ..., 0])  # Must match training bit vector format

# Predict (get best matching answer)
answer = model.predict(query_vec)
print("Predicted answer:", answer)

⚠️ Important: Ensure you encode new queries with the same SBERT bit-vector encoder used at training for consistent results.


πŸ“¦ Dependencies

Install dependencies with:

pip install -r requirements.txt

Main dependencies:

  • sentence-transformers
  • scikit-learn
  • numpy
  • yake
  • spacy (for optional text preprocessing)

πŸ“š Related Project

This model is part of the AskBit project on GitHub:

  • βœ… Full source code with CLI and training scripts
  • βœ… Debug and inspect bit vectors and retrieval results
  • βœ… Lightweight, interpretable semantic FAQ search

πŸ“œ License

MIT License β€” free to use, modify, or contribute.


🀝 Contributing

This model is intended for learning and experimentation. Feel free to fork, improve, or build upon it!

Model trained and shared by @Shanvit

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support