Spaces:
Running
Running
| title: AskVeracity | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Fact-checking and misinformation detection tool. | |
| # AskVeracity: Fact Checking System | |
| [](https://huggingface.co/spaces/ankanghosh/askveracity) | |
| [](https://opensource.org/licenses/MIT) | |
| A streamlined web application that analyzes claims to determine their truthfulness through evidence gathering and analysis, supporting efforts in misinformation detection. | |
| <p align="center"> | |
| <img src="docs/assets/app_screenshot.png" alt="Application Screenshot" width="800"/> | |
| </p> | |
| ## Overview | |
| AskVeracity is an agentic AI system that verifies factual claims through a combination of NLP techniques and large language models. The system gathers and analyzes evidence from multiple sources to provide transparent and explainable verdicts. | |
| The AI agent: | |
| 1. Uses a ReAct (Reasoning + Acting) methodology to analyze claims | |
| 2. Dynamically gathers evidence from multiple sources, prioritized by claim category | |
| 3. Applies semantic analysis to determine evidence relevance | |
| 4. Classifies the truthfulness of claims with confidence scores | |
| 5. Provides transparency into its reasoning process | |
| 6. Generates clear explanations for its verdict | |
| ## Key Features | |
| - **Intelligent Claim Extraction:** Extracts and focuses on the primary factual claim | |
| - **Category Detection:** Automatically identifies claim categories for optimized evidence retrieval | |
| - **Multi-source Evidence Gathering:** Collects evidence from: | |
| - Wikipedia and Wikidata | |
| - News articles | |
| - Academic sources via OpenAlex | |
| - Fact-checking websites | |
| - Category-specific RSS feeds | |
| - **Enhanced Entity Matching:** Uses improved entity and verb matching for accurate evidence relevance assessment | |
| - **Category-Specific Fallbacks:** Ensures robust evidence retrieval with domain-appropriate fallbacks | |
| - **Transparent Classification:** Provides clear verdicts with confidence scores | |
| - **Safety-First Classification:** Prioritizes avoiding incorrect assertions when evidence is insufficient | |
| - **Detailed Explanations:** Generates human-readable explanations for verdicts | |
| - **Interactive UI:** Easy-to-use Streamlit interface with evidence exploration options | |
| - **Claim Formatting Guidance:** Helps users format claims optimally for better results | |
| ## System Architecture | |
| AskVeracity is built with a modular architecture: | |
| ``` | |
| askveracity/ | |
| β | |
| βββ agent.py # LangGraph agent implementation | |
| βββ app.py # Main Streamlit application | |
| βββ config.py # Configuration and API keys | |
| βββ evaluate_performance.py # Performance evaluation script | |
| β | |
| βββ modules/ # Core functionality modules | |
| β βββ claim_extraction.py # Claim extraction functionality | |
| β βββ evidence_retrieval.py # Evidence gathering from various sources | |
| β βββ classification.py # Truth classification logic | |
| β βββ explanation.py # Explanation generation | |
| β βββ rss_feed.py # RSS feed evidence retrieval | |
| β βββ category_detection.py # Claim category detection | |
| β | |
| βββ utils/ # Utility functions | |
| β βββ api_utils.py # API rate limiting and error handling | |
| β βββ performance.py # Performance tracking utilities | |
| β βββ models.py # Model initialization functions | |
| β | |
| βββ results/ # Performance evaluation results | |
| β βββ performance_results.json # Evaluation metrics | |
| β βββ *.png # Performance visualization charts | |
| β | |
| βββ docs/ # Documentation | |
| βββ assets/ # Images and other media | |
| β βββ app_screenshot.png # Application screenshot | |
| βββ architecture.md # System design and component interactions | |
| βββ configuration.md # Setup and environment configuration | |
| βββ data-handling.md # Data processing and flow | |
| βββ changelog.md # Version history | |
| ``` | |
| ## Claim Verification Process | |
| 1. **Claim Extraction:** The system extracts the main factual claim from user input | |
| 2. **Category Detection:** The claim is categorized (ai, science, technology, politics, business, world, sports, entertainment) | |
| 3. **Evidence Retrieval:** Evidence is gathered from multiple sources with category-specific prioritization | |
| 4. **Evidence Analysis:** Evidence relevance is assessed using entity and verb matching | |
| 5. **Classification:** A weighted evaluation determines the verdict with confidence score | |
| 6. **Explanation Generation:** A human-readable explanation is generated | |
| 7. **Result Presentation:** Results are presented with detailed evidence exploration options | |
| ## Setup and Installation | |
| ### Local Development | |
| 1. Clone this repository | |
| ``` | |
| git clone https://github.com/yourusername/askveracity.git | |
| cd askveracity | |
| ``` | |
| 2. Install the required dependencies: | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Download the required spaCy model: | |
| ``` | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| 4. Set up your API keys: | |
| **Option 1: Using Streamlit secrets (recommended for local development)** | |
| - Create a `.streamlit/secrets.toml` file with your API keys: | |
| ```toml | |
| OPENAI_API_KEY = "your_openai_api_key" | |
| NEWS_API_KEY = "your_news_api_key" | |
| FACTCHECK_API_KEY = "your_factcheck_api_key" | |
| ``` | |
| **Option 2: Using environment variables** | |
| - Set environment variables directly or create a `.env` file: | |
| ``` | |
| OPENAI_API_KEY=your_openai_api_key | |
| NEWS_API_KEY=your_news_api_key | |
| FACTCHECK_API_KEY=your_factcheck_api_key | |
| ``` | |
| 5. Run the application: | |
| ``` | |
| streamlit run app.py | |
| ``` | |
| ### Deploying to Hugging Face Spaces | |
| 1. Create a new Space on Hugging Face: | |
| - Go to https://huggingface.co/spaces | |
| - Click "Create new Space" | |
| - Select "Streamlit" as the SDK | |
| - Choose the hardware tier (recommended: 16GB RAM) | |
| 2. Add the required API keys as secrets: | |
| - Go to the "Settings" tab of your Space | |
| - Navigate to the "Repository secrets" section | |
| - Add the following secrets: | |
| - `OPENAI_API_KEY` | |
| - `NEWS_API_KEY` | |
| - `FACTCHECK_API_KEY` | |
| 3. Push your code to the Hugging Face repository or upload files directly through the web interface | |
| ## Configuration Options | |
| The system includes several configuration options in `config.py`: | |
| 1. **API Rate Limits:** Controls request rates to external APIs | |
| ```python | |
| RATE_LIMITS = { | |
| "newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour | |
| "factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day | |
| # Other API limits... | |
| } | |
| ``` | |
| 2. **Error Handling:** Configures retry behavior for API errors | |
| ```python | |
| ERROR_BACKOFF = { | |
| "max_retries": 5, | |
| "initial_backoff": 1, # seconds | |
| "backoff_factor": 2, # exponential backoff | |
| } | |
| ``` | |
| 3. **RSS Feed Settings:** Customizes RSS feed handling | |
| ```python | |
| RSS_SETTINGS = { | |
| "max_feeds_per_request": 10, | |
| "max_age_days": 3, | |
| "timeout_seconds": 5, | |
| "max_workers": 5 | |
| } | |
| ``` | |
| 4. **Category-Specific RSS Feeds:** Defined in `modules/category_detection.py` for optimized evidence retrieval | |
| ## Performance Evaluation and Development | |
| The system includes a performance evaluation script that tests the fact-checking capabilities using predefined claims: | |
| ```bash | |
| python evaluate_performance.py [--limit N] [--output FILE] | |
| ``` | |
| The evaluation measures: | |
| - **Accuracy:** How often the system correctly classifies claims | |
| - **Safety Rate:** How often the system avoids making incorrect assertions | |
| - **Processing Time:** Average time to process claims | |
| - **Confidence Scores:** Average confidence in verdicts | |
| Detailed results and visualizations are saved to the `results/` directory. These results are not tracked in the repository as they will vary based on: | |
| - The evolving nature of available evidence | |
| - News sources constantly updating and deprioritizing older content | |
| - Changes in the recency and relevance of test claims | |
| Developers should update the claims in `evaluate_performance.py` to use fresh, relevant examples and run the evaluation script to generate current performance metrics. This ensures that performance evaluations remain relevant in the rapidly changing information landscape. | |
| ## Recent Improvements | |
| - **Safety Rate Metric:** Added metric to measure how often the system avoids making incorrect assertions | |
| - **Refined Relevance Scoring:** Implemented weighted scoring with entity and verb matching with keyword fallback for accurate evidence relevance assessment during classification | |
| - **Enhanced Evidence Relevance:** Improved entity and verb matching with weighted scoring prioritization and increased evidence gathering from 5 to 10 items | |
| - **Streamlined Architecture:** Removed source credibility and semantic analysis complexity for improved maintainability | |
| - **Category-Specific Fallbacks:** AI claims use both AI-specific and technology sources; other categories fall back to default RSS feeds | |
| - **OpenAlex Integration:** Replaced Semantic Scholar with OpenAlex for academic evidence | |
| - **Improved User Experience:** Enhanced claim processing and result presentation | |
| - **Better Robustness:** Improved handling of specialized topics and novel terms | |
| ## Limitations | |
| AskVeracity has several limitations to be aware of: | |
| - Performance is best for widely-reported news and information published within the last 48 hours | |
| - The system evaluates claims based on current evidence - claims that were true in the past may be judged differently if circumstances have changed | |
| - Technical or very specialized claims may receive "Uncertain" verdicts if insufficient evidence is found | |
| - Non-English claims have limited support | |
| - The system is designed to indicate uncertainty when evidence is insufficient | |
| - Results can vary based on available evidence and LLM behavior | |
| ## License | |
| This project is licensed under the [MIT License](./LICENSE), allowing free use, modification, and distribution with proper attribution. | |
| ## Blog and Additional Resources | |
| Read our detailed blog post about the project: [AskVeracity: An Agentic Fact-Checking System for Misinformation Detection](https://researchguy.in/askveracity-an-agentic-fact-checking-system-for-misinformation-detection/) | |
| ## Acknowledgements | |
| - Built with [LangGraph](https://github.com/langchain-ai/langgraph) and [Streamlit](https://streamlit.io/) | |
| - Uses OpenAI's API for language model capabilities | |
| - Leverages open data sources including Wikipedia, Wikidata, and various RSS feeds | |
| ## Contact | |
| For questions, feedback, or suggestions, please contact us at [email protected]. |