Spaces:
Running
Running
| # Data Handling in AskVeracity | |
| This document explains how data flows through the AskVeracity fact-checking and misinformation detection system, from user input to final verification results. | |
| ## Data Flow Overview | |
| ``` | |
| User Input β Claim Extraction β Category Detection β Evidence Retrieval β Evidence Analysis β Classification β Explanation β Result Display | |
| ``` | |
| ## User Input Processing | |
| ### Input Sanitization and Extraction | |
| 1. **Input Acceptance:** The system accepts user input as free-form text through the Streamlit interface. | |
| 2. **Claim Extraction** (`modules/claim_extraction.py`): | |
| - For concise inputs (<30 words), the system preserves the input as-is | |
| - For longer texts, an LLM extracts the main factual claim | |
| - Validation ensures the extraction doesn't add information not present in the original | |
| - Entity preservation is verified using spaCy's NER | |
| 3. **Claim Shortening:** | |
| - For evidence retrieval, claims are shortened to preserve key entities and context | |
| - Preserves entity mentions, key nouns, titles, country references, and negation contexts | |
| ## Evidence Retrieval and Processing | |
| ### Multi-source Evidence Gathering | |
| Evidence is collected from multiple sources in parallel (`modules/evidence_retrieval.py`): | |
| 1. **Category Detection** (`modules/category_detection.py`): | |
| - Detects the claim category (ai, science, technology, politics, business, world, sports, entertainment) | |
| - Prioritizes sources based on category | |
| - No category receives preferential weighting; assignment is based purely on keyword matching | |
| 2. **Wikipedia** evidence: | |
| - Search Wikipedia API for relevant articles | |
| - Extract introductory paragraphs | |
| - Process in parallel for up to 3 top search results | |
| 3. **Wikidata** evidence: | |
| - SPARQL queries for structured data | |
| - Entity extraction with descriptions | |
| 4. **News API** evidence: | |
| - Retrieval from NewsAPI.org with date filtering | |
| - Prioritizes recent articles | |
| - Extracts titles, descriptions, and content snippets | |
| 5. **RSS Feed** evidence (`modules/rss_feed.py`): | |
| - Parallel retrieval from multiple RSS feeds | |
| - Category-specific feeds selection | |
| - Relevance and recency scoring | |
| 6. **ClaimReview** evidence: | |
| - Google's Fact Check Tools API integration | |
| - Retrieves fact-checks from fact-checking organizations | |
| - Includes ratings and publisher information | |
| 7. **Scholarly** evidence: | |
| - OpenAlex API for academic sources | |
| - Extracts titles, abstracts, and publication dates | |
| 8. **Category Fallback** mechanism: | |
| - For AI claims, uses both AI-specific and technology RSS feeds simultaneously | |
| - For other categories, falls back to default RSS feeds | |
| - Ensures robust evidence retrieval across related domains | |
| ### Evidence Preprocessing | |
| Each evidence item is standardized to a consistent format: | |
| ``` | |
| Title: [title], Source: [source], Date: [date], URL: [url], Content: [content snippet] | |
| ``` | |
| Length limits are applied to reduce token usage: | |
| - Content snippets are limited to ~1000 characters | |
| - Evidence items are truncated while maintaining context | |
| ## Evidence Analysis and Relevance Ranking | |
| ### Relevance Assessment | |
| Evidence is analyzed and scored for relevance: | |
| 1. **Component Extraction:** | |
| - Extract entities, verbs, and keywords from the claim | |
| - Use NLP processing to identify key claim components | |
| 2. **Entity and Verb Matching:** | |
| - Match entities from claim to evidence (case-sensitive and case-insensitive) | |
| - Match verbs from claim to evidence | |
| - Score based on matches (entity matches weighted higher than verb matches) | |
| 3. **Temporal Relevance:** | |
| - Detection of temporal indicators in claims | |
| - Date-based filtering for time-sensitive claims | |
| - Adjusts evidence retrieval window based on claim temporal context | |
| 4. **Scoring Formula:** | |
| ``` | |
| final_score = (entity_matches * 3.0) + (verb_matches * 2.0) | |
| ``` | |
| If no entity or verb matches, fall back to keyword matching: | |
| ``` | |
| final_score = keyword_matches * 1.0 | |
| ``` | |
| ### Evidence Selection | |
| The system selects the most relevant evidence: | |
| 1. **Relevance Sorting:** | |
| - Evidence items sorted by relevance score (descending) | |
| - Top 10 most relevant items selected | |
| 2. **Handling No Evidence:** | |
| - If no evidence is found, a placeholder is returned | |
| - Ensures graceful handling of edge cases | |
| ## Truth Classification | |
| ### Evidence Classification (`modules/classification.py`) | |
| Each evidence item is classified individually: | |
| 1. **LLM Classification:** | |
| - Each evidence item is analyzed by an LLM | |
| - Classification categories: support, contradict, insufficient | |
| - Confidence score (0-100) assigned to each classification | |
| - Structured output parsing with fallback mechanisms | |
| 2. **Tense Normalization:** | |
| - Normalizes verb tenses in claims to ensure consistent classification | |
| - Converts present simple and perfect forms to past tense equivalents | |
| - Preserves semantic equivalence across tense variations | |
| ### Verdict Aggregation | |
| Evidence classifications are aggregated to determine the final verdict: | |
| 1. **Weighted Aggregation:** | |
| - 55% weight for count of support/contradict items | |
| - 45% weight for quality (confidence) of support/contradict items | |
| 2. **Confidence Calculation:** | |
| - Formula: `1.0 - (min_score / max_score)` | |
| - Higher confidence for consistent evidence | |
| - Lower confidence for mixed or insufficient evidence | |
| 3. **Final Verdict Categories:** | |
| - "True (Based on Evidence)" | |
| - "False (Based on Evidence)" | |
| - "Uncertain" | |
| ## Explanation Generation | |
| ### Explanation Creation (`modules/explanation.py`) | |
| Human-readable explanations are generated based on the verdict: | |
| 1. **Template Selection:** | |
| - Different prompts for true, false, and uncertain verdicts | |
| - Special handling for claims containing negation | |
| 2. **Confidence Communication:** | |
| - Translation of confidence scores to descriptive language | |
| - Clear communication of certainty/uncertainty | |
| 3. **Very Low Confidence Handling:** | |
| - Special explanations for verdicts with very low confidence (<10%) | |
| - Strong recommendations to verify with authoritative sources | |
| ## Result Presentation | |
| Results are presented in the Streamlit UI with multiple components: | |
| 1. **Verdict Display:** | |
| - Color-coded verdict (green for true, red for false, gray for uncertain) | |
| - Confidence percentage | |
| - Explanation text | |
| 2. **Evidence Presentation:** | |
| - Tabbed interface for different evidence views with URLs if available | |
| - Supporting and contradicting evidence tabs | |
| - Source distribution summary | |
| 3. **Input Guidance:** | |
| - Tips for claim formatting | |
| - Guidance for time-sensitive claims | |
| - Suggestions for verb tense based on claim age | |
| 4. **Processing Insights:** | |
| - Processing time | |
| - AI reasoning steps | |
| - Source distribution statistics | |
| ## Data Persistence and Privacy | |
| AskVeracity prioritizes user privacy: | |
| 1. **No Data Storage:** | |
| - User claims are not stored persistently | |
| - Results are maintained only in session state | |
| - No user data is collected or retained | |
| 2. **Session Management:** | |
| - Session state in Streamlit manages current user interaction | |
| - Session is cleared when starting a new verification | |
| 3. **API Interaction:** | |
| - External API calls use their respective privacy policies | |
| - OpenAI API usage follows their data handling practices | |
| 4. **Caching:** | |
| - Model caching for performance | |
| - Resource cleanup on application termination | |
| ## Performance Tracking | |
| The system includes a performance tracking utility (`utils/performance.py`): | |
| 1. **Metrics Tracked:** | |
| - Claims processed count | |
| - Evidence retrieval success rates | |
| - Processing times | |
| - Confidence scores | |
| - Source types used | |
| - Temporal relevance | |
| 2. **Usage:** | |
| - Performance metrics are logged during processing | |
| - Summary of select metrics available in the final result | |
| - Used for system optimization | |
| ## Performance Evaluation | |
| The system includes a performance evaluation script (`evaluate_performance.py`): | |
| 1. **Test Claims:** | |
| - Predefined set of test claims with known ground truth labels | |
| - Claims categorized as "True", "False", or "Uncertain" | |
| 2. **Metrics:** | |
| - Overall accuracy: Percentage of claims correctly classified according to ground truth | |
| - Safety rate: Percentage of claims either correctly classified or safely categorized as "Uncertain" rather than making an incorrect assertion | |
| - Per-class accuracy and safety rates | |
| - Average processing time | |
| - Average confidence score | |
| - Classification distributions | |
| 3. **Visualization:** | |
| - Charts for accuracy by classification type | |
| - Charts for safety rate by classification type | |
| - Processing time by classification type | |
| - Confidence scores by classification type | |
| 4. **Results Storage:** | |
| - Detailed results saved to JSON file | |
| - Visualization charts saved as PNG files | |
| - All results stored in the `results/` directory | |
| ## Error Handling and Resilience | |
| The system implements robust error handling: | |
| 1. **API Error Handling** (`utils/api_utils.py`): | |
| - Decorator-based error handling | |
| - Exponential backoff for retries | |
| - Rate limiting respecting API constraints | |
| 2. **Safe JSON Parsing:** | |
| - Defensive parsing of API responses | |
| - Fallback mechanisms for invalid responses | |
| 3. **Graceful Degradation:** | |
| - Multiple fallback strategies | |
| - Core functionality preservation even when some sources fail | |
| 4. **Fallback Mechanisms:** | |
| - Fallback for truth classification when classifier is not called | |
| - Fallback for explanation generation when explanation generator is not called | |
| - Ensures complete results even with partial component failures |