AskBit FAQ Retriever
A fast, interpretable FAQ retriever using bit vector encoding of SBERT sentence embeddings combined with a binary KNN classifier. This repository hosts a model artifact from the AskBit project.
π This model was created as part of an educational journey exploring efficient semantic FAQ matching with bitwise vector representations and KNN classification.
- π’ Uses SBERT (
all-MiniLM-L6-v2
) to embed question-answer pairs as dense semantic vectors. - π§ Converts dense embeddings into binarized bit vectors for fast similarity search.
- β‘ Uses a K-Nearest Neighbors classifier with Hamming distance over bit vectors.
- π‘ Fully open source, efficient, and suitable for lightweight semantic FAQ retrieval.
- ποΈ Model file:
model.pkl
- π Training data file:
faq.json
π Files in This Repository
File | Description |
---|---|
model.pkl |
Trained KNN classifier model over SBERT-based bit vectors. |
faq.json |
FAQ question-answer dataset used for training and evaluation. |
requirements.txt |
Python dependencies to load and use the model. |
README.md |
Model usage instructions, background, and examples. |
π§ How It Works
Semantic Bit Vector Encoding (SbertBitEncoder
)
- Uses the Sentence-BERT model (
all-MiniLM-L6-v2
) to generate dense semantic embeddings of entire question-answer pairs. - Embeddings capture meaningful sentence-level semantics, enabling effective retrieval beyond simple word overlap.
- Each dense embedding vector is binarized by thresholding (e.g., bits set to 1 if value > 0) to produce a compact, fixed-length bit vector.
- Both the FAQ entries and queries are encoded this way, ensuring semantic similarity maps to bitwise proximity.
Binary K-Nearest Neighbors Classifier (FAQClassifier
)
- Implements a KNN classifier using Hamming distance as the similarity metric on bit vectors.
- Learns to associate bit-encoded queries with their corresponding answers.
- Supports retrieving the best matching answer or top-k candidates with similarity scores.
π Usage Example
import pickle
import numpy as np
# Load the trained model artifact
with open("model.pkl", "rb") as f:
model = pickle.load(f)
# Bit vector input: binarized SBERT embeddings (e.g., 384-bit vector)
query_vec = np.array([1, 0, 1, 1, 0, ..., 0]) # Must match training bit vector format
# Predict (get best matching answer)
answer = model.predict(query_vec)
print("Predicted answer:", answer)
β οΈ Important: Ensure you encode new queries with the same SBERT bit-vector encoder used at training for consistent results.
π¦ Dependencies
Install dependencies with:
pip install -r requirements.txt
Main dependencies:
sentence-transformers
scikit-learn
numpy
yake
spacy
(for optional text preprocessing)
π Related Project
This model is part of the AskBit project on GitHub:
- β Full source code with CLI and training scripts
- β Debug and inspect bit vectors and retrieval results
- β Lightweight, interpretable semantic FAQ search
π License
MIT License β free to use, modify, or contribute.
π€ Contributing
This model is intended for learning and experimentation. Feel free to fork, improve, or build upon it!
Model trained and shared by @Shanvit