Spaces:
Running
Running
| # AskVeracity Configuration Guide | |
| This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system. | |
| ## Prerequisites | |
| Before setting up AskVeracity, ensure you have: | |
| - Python 3.8 or higher | |
| - pip (Python package installer) | |
| - Git (for cloning the repository) | |
| - API keys for external services | |
| ## Installation | |
| ### Local Development | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/yourusername/askveracity.git | |
| cd askveracity | |
| ``` | |
| 2. Install the required dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Download the required spaCy model: | |
| ```bash | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| ## API Key Configuration | |
| AskVeracity requires several API keys to access external services. You have two options for configuring these keys: | |
| ### Option 1: Using Streamlit Secrets (Recommended for Local Development) | |
| 1. Create a `.streamlit` directory if it doesn't exist: | |
| ```bash | |
| mkdir -p .streamlit | |
| ``` | |
| 2. Create a `secrets.toml` file: | |
| ```bash | |
| cp .streamlit/secrets.toml.example .streamlit/secrets.toml | |
| ``` | |
| 3. Edit the `.streamlit/secrets.toml` file with your API keys: | |
| ```toml | |
| OPENAI_API_KEY = "your_openai_api_key" | |
| NEWS_API_KEY = "your_news_api_key" | |
| FACTCHECK_API_KEY = "your_factcheck_api_key" | |
| ``` | |
| ### Option 2: Using Environment Variables | |
| 1. Create a `.env` file in the root directory: | |
| ```bash | |
| touch .env | |
| ``` | |
| 2. Add your API keys to the `.env` file: | |
| ``` | |
| OPENAI_API_KEY=your_openai_api_key | |
| NEWS_API_KEY=your_news_api_key | |
| FACTCHECK_API_KEY=your_factcheck_api_key | |
| ``` | |
| 3. Load the environment variables: | |
| ```python | |
| # In Python | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| ``` | |
| Or in your terminal: | |
| ```bash | |
| # Unix/Linux/MacOS | |
| source .env | |
| # Windows | |
| # Install python-dotenv[cli] and run | |
| dotenv run streamlit run app.py | |
| ``` | |
| ## Required API Keys | |
| AskVeracity uses the following external APIs: | |
| 1. **OpenAI API** (Required) | |
| - Used for claim extraction, classification, and explanation generation | |
| - Get an API key from [OpenAI's website](https://platform.openai.com/) | |
| 2. **News API** (Optional but recommended) | |
| - Used for retrieving news article evidence | |
| - Get an API key from [NewsAPI.org](https://newsapi.org/) | |
| 3. **Google Fact Check Tools API** (Optional but recommended) | |
| - Used for retrieving fact-checking evidence | |
| - Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api) | |
| ## Configuration Files | |
| ### config.py | |
| The main configuration file is `config.py`, which contains: | |
| - API key handling | |
| - Rate limiting configuration | |
| - Error backoff settings | |
| - RSS feed settings | |
| Important configuration sections in `config.py`: | |
| ```python | |
| # Rate limiting configuration | |
| RATE_LIMITS = { | |
| # api_name: {"requests": max_requests, "period": period_in_seconds} | |
| "newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour | |
| "factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day | |
| "semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes | |
| "wikidata": {"requests": 60, "period": 60}, # 60 requests per minute | |
| "wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute | |
| "rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour | |
| } | |
| # Error backoff settings | |
| ERROR_BACKOFF = { | |
| "max_retries": 5, | |
| "initial_backoff": 1, # seconds | |
| "backoff_factor": 2, # exponential backoff | |
| } | |
| # RSS feed settings | |
| RSS_SETTINGS = { | |
| "max_feeds_per_request": 10, # Maximum number of feeds to try per request | |
| "max_age_days": 3, # Maximum age of RSS items to consider | |
| "timeout_seconds": 5, # Timeout for RSS feed requests | |
| "max_workers": 5 # Number of parallel workers for fetching feeds | |
| } | |
| ``` | |
| ### Category-Specific RSS Feeds | |
| Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category: | |
| ```python | |
| CATEGORY_SPECIFIC_FEEDS = { | |
| "ai": [ | |
| "https://www.artificialintelligence-news.com/feed/", | |
| "https://openai.com/news/rss.xml", | |
| # Additional AI-specific feeds | |
| ], | |
| "science": [ | |
| "https://www.science.org/rss/news_current.xml", | |
| "https://www.nature.com/nature.rss", | |
| # Additional science feeds | |
| ], | |
| # Additional categories | |
| } | |
| ``` | |
| ## Hugging Face Spaces Deployment | |
| ### Setting Up a Space | |
| 1. Create a new Space on Hugging Face: | |
| - Go to https://huggingface.co/spaces | |
| - Click "Create new Space" | |
| - Select "Streamlit" as the SDK | |
| - Choose the hardware tier (use the default 16GB RAM) | |
| 2. Upload the project files: | |
| - You can upload files directly through the Hugging Face web interface | |
| - Alternatively, use Git to push to the Hugging Face repository | |
| - Make sure to include all necessary files including requirements.txt | |
| ### Setting Up Secrets | |
| 1. Add API keys as secrets: | |
| - Go to the "Settings" tab of your Space | |
| - Navigate to the "Repository secrets" section | |
| - Add your API keys: | |
| - `OPENAI_API_KEY` | |
| - `NEWS_API_KEY` | |
| - `FACTCHECK_API_KEY` | |
| ### Configuring the Space | |
| Edit the metadata in the `README.md` file: | |
| ```yaml | |
| --- | |
| title: Askveracity | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Fact-checking and misinformation detection tool. | |
| --- | |
| ``` | |
| ## Custom Configuration | |
| ### Adjusting Rate Limits | |
| You can adjust the rate limits in `config.py` based on your API subscription levels: | |
| ```python | |
| # Update for higher tier News API subscription | |
| RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour | |
| ``` | |
| ### Modifying RSS Feeds | |
| The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed. | |
| ### Performance Evaluation | |
| The system includes a performance evaluation script `evaluate_performance.py` that: | |
| 1. Runs the fact-checking system on a predefined set of test claims | |
| 2. Calculates accuracy, safety rate, processing time, and confidence metrics | |
| 3. Generates visualization charts in the `results/` directory | |
| 4. Saves detailed results to `results/performance_results.json` | |
| To run the performance evaluation: | |
| ```bash | |
| python evaluate_performance.py [--limit N] [--output FILE] | |
| ``` | |
| - `--limit N`: Limit evaluation to first N claims (default: all) | |
| - `--output FILE`: Save results to FILE (default: performance_results.json) | |
| ## Running the Application | |
| Start the Streamlit app: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| The application will be available at http://localhost:8501 by default. | |
| ## Troubleshooting | |
| ### API Key Issues | |
| If you encounter API key errors: | |
| 1. Verify that your API keys are set correctly | |
| 2. Check the logs for specific error messages | |
| 3. Make sure API keys are not expired or rate-limited | |
| ### Model Loading Errors | |
| If spaCy model fails to load: | |
| ```bash | |
| # Reinstall the model | |
| python -m spacy download en_core_web_sm --force | |
| ``` | |
| ### Rate Limiting | |
| If you encounter rate limiting issues: | |
| 1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py` | |
| 2. Increase the backoff parameters in `ERROR_BACKOFF` | |
| 3. Subscribe to higher API tiers if available | |
| ### Memory Issues | |
| If the application crashes due to memory issues: | |
| 1. Reduce the number of parallel workers in `RSS_SETTINGS` | |
| 2. Limit the maximum number of evidence items processed | |
| ## Performance Optimization | |
| For better performance: | |
| 1. Upgrade to a higher-tier OpenAI model for improved accuracy | |
| 2. Increase the number of parallel workers for evidence retrieval | |
| 3. Add more relevant RSS feeds to improve evidence gathering |