semantic-search-rDepression

This repository contains my personal implementation of a semantic search system using FAISS, trained on a Kaggle dataset of approximately 32,000 Reddit posts from r/depression, collected over a three-month period in 2019. While many of these posts are now deleted, the dataset was ideal for local training on my RTX 3080 (10 GB VRAM).

What does the model do?

The idea is simple: you type out what you're feeling, whether it's to vent or to make sense of what's going on internally and instead of posting it publicly, the model tries to find similar posts from the dataset.

Sometimes, just seeing that someone else has felt the same way can make a difference. The model returns the top 3 most semantically similar posts, along with their original Reddit URLs. If the post still exists, you can visit it and read the discussions and comments of others who've been through something similar.

Example Output

Query:

i cant eat

Training Setup & Hyperparameters

Loss Function: PyTorch built-in TripletMarginLoss (margin = 1.0)
Optimizer: AdamW (learning rate = 2e-5)
Scheduler: Cosine with 5% warmup
Epochs: 3
Effective Batch Size: 32 (8 per step with 4-step accumulation)
Monitoring:
- eval_steps = 500

Results

Final Epoch (Epoch 3):

Train Loss: 0.4466 | Train Accuracy: 95.00%
Val Loss: 0.4358 | Val Accuracy: 94.59%

Test Set Evaluation:

Test Accuracy: 99.07%
Mean Positive Similarity: 0.7044
Mean Negative Similarity: 0.4952
Mean Margin: 0.2092
Std Dev Margin: 0.0795
Correct: 956/965
Incorrect: 9
Challenging Cases (margin < 0.1): 90/965 (9.33%)
MRR: 0.9855
Recall@5: 1.0000

Note: Accuracy is very high because it's very easy to label positive data, but in reality similarity score is what is important. Plus the test dataset is quite small only 3% (~930 posts)

Full Project (Training,Evaluation)

GitHub Repository

Feedback, corrections, and suggestions are always welcome!

waellejmi
/

all-MiniLM-L6-v2-finetuned-on-rDepression

semantic-search-rDepression

What does the model do?

Example Output

Query:

Top 3 Similar Posts:

Similar Post 1 (Similarity: 0.9772)

Similar Post 2 (Similarity: 0.9146)

Similar Post 3 (Similarity: 0.8786)

Training Setup & Hyperparameters

Results

Final Epoch (Epoch 3):

Test Set Evaluation:

Full Project (Training,Evaluation)

Model tree for waellejmi/all-MiniLM-L6-v2-finetuned-on-rDepression

Space using waellejmi/all-MiniLM-L6-v2-finetuned-on-rDepression 1