Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,2 +1,60 @@ | |
| 1 | 
            -
             | 
| 2 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            language: en
         | 
| 3 | 
            +
            license: mit
         | 
| 4 | 
            +
            tags:
         | 
| 5 | 
            +
              - keras
         | 
| 6 | 
            +
              - lstm
         | 
| 7 | 
            +
              - spam-classification
         | 
| 8 | 
            +
              - text-classification
         | 
| 9 | 
            +
              - binary-classification
         | 
| 10 | 
            +
              - email
         | 
| 11 | 
            +
              - deep-learning
         | 
| 12 | 
            +
            library_name: keras
         | 
| 13 | 
            +
            pipeline_tag: text-classification
         | 
| 14 | 
            +
            model_name: Spam Email Classifier (BiLSTM)
         | 
| 15 | 
            +
            datasets:
         | 
| 16 | 
            +
              - SetFit/enron_spam
         | 
| 17 | 
            +
            ---
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            # 📧 Spam Email Classifier using BiLSTM
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            ---
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            ## 🧠 Model Architecture
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            - **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset  
         | 
| 28 | 
            +
            - **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
         | 
| 29 | 
            +
            - **Model**: `Embedding → BiLSTM → Dropout → Dense(sigmoid)`
         | 
| 30 | 
            +
            - **Input**: English email/message text  
         | 
| 31 | 
            +
            - **Output**: `0 = Ham`, `1 = Spam`
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            ---
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ## 🧪 Example Usage
         | 
| 36 | 
            +
             | 
| 37 | 
            +
            ```python
         | 
| 38 | 
            +
            from tensorflow.keras.models import load_model
         | 
| 39 | 
            +
            from huggingface_hub import hf_hub_download
         | 
| 40 | 
            +
            import pickle
         | 
| 41 | 
            +
            from tensorflow.keras.preprocessing.sequence import pad_sequences
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            # Load files from HF Hub
         | 
| 44 | 
            +
            model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
         | 
| 45 | 
            +
            tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            # Load model and tokenizer
         | 
| 48 | 
            +
            model = load_model(model_path)
         | 
| 49 | 
            +
            with open(tokenizer_path, "rb") as f:
         | 
| 50 | 
            +
                tokenizer = pickle.load(f)
         | 
| 51 | 
            +
             | 
| 52 | 
            +
            # Prediction function
         | 
| 53 | 
            +
            def predict_spam(text):
         | 
| 54 | 
            +
                seq = tokenizer.texts_to_sequences([text])
         | 
| 55 | 
            +
                padded = pad_sequences(seq, maxlen=50)  # must match training maxlen
         | 
| 56 | 
            +
                pred = model.predict(padded)[0][0]
         | 
| 57 | 
            +
                return "🚫 Spam" if pred > 0.5 else "✅ Not Spam"
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            # Example
         | 
| 60 | 
            +
            print(predict_spam("Win a free iPhone now!"))
         | 
