Spaces:
Build error
Build error
| _\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_ | |
| _\\----------- **Resume Parser** ----------\\_ | |
| _\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_ | |
| # Overview: | |
| This project is a comprehensive Resume Parsing tool built using Python, | |
| integrating the Mistral-Nemo-Instruct-2407 model for primary parsing. | |
| If Mistral fails or encounters issues, | |
| the system falls back to a custom-trained spaCy model to ensure continued functionality. | |
| The tool is wrapped with a Flask API and has a user interface built using HTML and CSS. | |
| # Installation Guide: | |
| 1. Create and Activate a Virtual Environment | |
| python -m venv venv | |
| source venv/bin/activate # For Linux/Mac | |
| # or | |
| venv\Scripts\activate # For Windows | |
| # NOTE: If the virtual environment (venv) is already created, you can skip the creation step and just activate. | |
| - For Linux/Mac: | |
| source venv/bin/activate | |
| - For Windows: | |
| venv\Scripts\activate | |
| 2. Install Required Libraries | |
| pip install -r requirements.txt | |
| # Ensure the following dependencies are included: | |
| - Flask | |
| - spaCy | |
| - huggingface_hub | |
| - PyMuPDF | |
| - python-docx | |
| - Tesseract-OCR (for image-based parsing) | |
| 3. Set up Hugging Face Token | |
| - Add your Hugging Face token to the .env file as: | |
| HF_TOKEN=<your_huggingface_token> | |
| # File Structure Overview: | |
| Mistral_With_Spacy/ | |
| β | |
| βββ Spacy_Models/ | |
| β βββ ner_model_05_3 # Pretrained spaCy model directory for resume parsing | |
| β | |
| βββ templates/ | |
| β βββ index.html # UI for file upload | |
| β βββ result.html # Display parsed results in structured JSON | |
| β | |
| βββ uploads/ # Directory for uploaded resume files | |
| β | |
| βββ utils/ | |
| β βββ mistral.py # Code for calling Mistral API and handling responses | |
| β βββ spacy.py # spaCy fallback model for parsing resumes | |
| β βββ error.py # Error handling utilities | |
| β βββ fileTotext.py # Functions to extract text from different file formats (PDF, DOCX, etc.) | |
| β | |
| βββ venv/ # Virtual environment | |
| β | |
| βββ .env # Environment variables file (contains Hugging Face token) | |
| β | |
| βββ main.py # Flask app handling API routes for uploading and processing resumes | |
| β | |
| βββ requirements.txt # Dependencies required for the project | |
| # Program Overview: | |
| # Mistral Integration (utils/mistral.py) | |
| - Mistral API Calls: Uses Hugging Faceβs Mistral-Nemo-Instruct-2407 model to parse resumes. | |
| - Personal and Professional Extraction: Two functions extract personal and professional information in structured JSON format. | |
| - Fallback Mechanism: If Mistral fails, spaCy NER model is used as a fallback. | |
| # SpaCy Integration (utils/spacy.py) | |
| - Custom Trained Model: Uses a spaCy model (ner_model_05_3) trained specifically for resume parsing. | |
| - Named Entity Recognition: Extracts key information like Name, Email, Contact, Location, Skills, Experience, etc., from resumes. | |
| - Validation: Includes validation for extracted emails and contacts. | |
| # File Conversion (utils/fileTotext.py) | |
| - Text Extraction: Handles different resume formats (PDF, DOCX, ODT, RSF, and images like PNG, JPG, JPEG) and extracts text for further processing. | |
| - PDF Files: Uses PyMuPDF to extract text and, if necessary, Tesseract-OCR for image-based PDF content. | |
| - DOCX Files: Uses `python-docx` to extract structured text from Word documents. | |
| - ODT Files: Uses `odfpy` to extract text from ODT (OpenDocument) files. | |
| - RSF Files: Reads plain text from RSF files. | |
| - Images (PNG, JPG, JPEG): Uses Tesseract-OCR to extract text from image-based resumes. | |
| - Hyperlink Extraction: Extracts hyperlinks from PDF files, capturing any embedded URLs during the parsing process. | |
| # Error Handling (utils/error.py) | |
| - Handles API response errors, file format errors, and ensures smooth fallbacks without crashing the app. | |
| # Flask API (main.py) | |
| Endpoints: | |
| - /upload for uploading resumes. | |
| - Displays parsed results in JSON format on the results page. | |
| - UI: Simple interface for uploading resumes and viewing the parsing results. | |
| # Tree map of your program: | |
| main.py | |
| βββ Handles API side | |
| βββ File upload/remove | |
| βββ Process resumes | |
| βββ Show result | |
| utils | |
| βββ fileTotext.py | |
| β βββ Converts files to text | |
| β βββ PDF | |
| β βββ DOCX | |
| β βββ RTF | |
| β βββ ODT | |
| β βββ PNG | |
| β βββ JPG | |
| β βββ JPEG | |
| βββ mistral.py | |
| β βββ Mistral API Calls | |
| β β βββ Uses Mistral-Nemo-Instruct-2407 model | |
| β βββ Personal and Professional Extraction | |
| β β βββ Extracts personal information | |
| β β βββ Extracts professional information | |
| β βββ Fallback Mechanism | |
| β βββ Uses spaCy NER model if Mistral fails | |
| βββ spacy.py | |
| βββ Custom Trained Model | |
| β βββ Uses spaCy model (ner_model_05_3) | |
| βββ Named Entity Recognition | |
| β βββ Extracts key information (Name, Email, Contact, etc.) | |
| βββ Validation | |
| βββ Validates emails and contacts | |
| # References: | |
| - [Flask Documentation](https://flask.palletsprojects.com/) | |
| - [spaCy Documentation](https://spacy.io/usage) | |
| - [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index) | |
| - [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/) | |
| - [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/) | |
| - [Tesseract OCR Documentation](https://github.com/tesseract-ocr/tesseract) | |
| - [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html) | |