Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.45.0
π― New Features Added to Active Reading Demo
π Category Selection Feature
What It Does
Users can now manually select or override the document category detection:
Available Categories:
- Auto-Detect (default) - AI detects domain automatically
- Finance - Financial reports, earnings, budgets
- Legal - Contracts, agreements, policies
- Technical - API docs, manuals, specifications
- Medical - Clinical trials, research, treatments
- General - Any other document type
Category-Specific Extraction Patterns
π Finance Category
- Revenue:
$150 million revenue
,sales of $2.5B
- Profit:
profit margin 25%
,net profit $50M
- Growth:
15% growth
,increased by 20%
- Dates:
Q3 2024
,fiscal year 2023
- Employees:
hire 200 engineers
,workforce of 5000
- Market Cap:
market cap $10B
βοΈ Legal Category
- Parties:
between Company A and Company B
- Terms:
term of 36 months
,duration 3 years
- Liability:
liability not to exceed $1M
- Termination:
90 days written notice
- Governing Law:
governed by laws of Delaware
- Effective Date:
effective January 1, 2024
π§ Technical Category
- API Endpoints:
GET /api/users
,POST /auth/login
- Versions:
version 2.1.0
,v3.5
- Response Time:
response time 150ms
- Rate Limits:
1000 requests per minute
- Authentication:
OAuth 2.0
,JWT tokens
- Status Codes:
HTTP 200
,status code 404
π₯ Medical Category
- Dosage:
50mg daily
,100ml twice daily
- Duration:
treatment for 12 weeks
- Efficacy:
85% efficacy rate
- Side Effects:
side effects in 12% of patients
- Patient Count:
500 patients enrolled
- P-Values:
p<0.001
,p=0.025
π Custom Keys Feature
What It Does
Users can specify their own extraction terms as comma-separated values:
Example Inputs:
CEO, budget, deadline, timeline
risk assessment, compliance, audit
performance, scalability, security
treatment, dosage, clinical trial
How It Works
- Smart Extraction: Finds sentences containing the custom terms
- Context Preservation: Returns full sentences, not just keywords
- Confidence Scoring: Shows extraction confidence levels
- JSON Output: Structured data for easy integration
π― New Strategy: Category-Specific Extraction
What's New
Added a specialized strategy that combines:
- Category-specific patterns for targeted extraction
- Custom key extraction for user-defined terms
- Structured output with confidence scores
- Domain expertise for each business category
Example Output
{
"category": "Finance",
"extracted_data": {
"revenue": ["$150 million", "$2.5 billion sales"],
"growth": ["15% increase", "20% growth rate"],
"date": ["Q3 2024", "fiscal year 2023"]
},
"custom_extractions": {
"CEO": ["CEO announced plans to expand", "CEO John Smith reported"],
"investment": ["$50M investment in AI", "investment in new markets"]
},
"confidence_scores": {
"revenue": 8.5,
"custom_CEO": 6.2
}
}
π¨ Enhanced UI Elements
New Input Controls
- π Category Dropdown: Manual category selection
- π Custom Keys Input: Text field for custom extraction terms
- π Enhanced Strategy Selection: Added "Category-Specific Extraction"
New Output Tabs
- π― Category Analysis: Dedicated tab for category-specific results
- Enhanced JSON: Structured category extraction data
- Confidence Scores: Shows extraction reliability
Improved User Experience
- Dynamic Help Text: Context-aware guidance
- Example Suggestions: Sample custom keys for each category
- Better Visual Organization: Clearer result presentation
π Usage Examples
Finance Document Analysis
Document Category: Finance
Custom Keys: CEO, quarterly results, investment
Strategy: Category-Specific Extraction
Result: Extracts revenue figures, profit margins, growth rates PLUS CEO mentions, quarterly data, and investment information.
Legal Contract Review
Document Category: Legal
Custom Keys: liability, termination, governing law
Strategy: Category-Specific Extraction
Result: Finds contract parties, terms, dates PLUS specific liability clauses, termination conditions, and jurisdiction details.
Technical Documentation
Document Category: Technical
Custom Keys: security, performance, scalability
Strategy: Category-Specific Extraction
Result: Extracts API endpoints, versions, rate limits PLUS security features, performance metrics, and scalability considerations.
π― Why This Makes Active Reading Better
1. Adaptive Intelligence
- AI now adapts not just to document type, but to user-specific needs
- Combines automated domain detection with custom requirements
2. Enterprise Flexibility
- Users can extract exactly what they need for their business case
- Supports diverse enterprise document analysis workflows
3. Structured Output
- Category-specific patterns ensure consistent extraction
- Custom keys add user-defined flexibility
- JSON format enables easy integration
4. Demonstrable Value
- Shows how Active Reading adapts to different business domains
- Proves the framework can handle real enterprise requirements
- Highlights the superiority over one-size-fits-all approaches
π¨ Implementation Impact
What Changed in Code
- Added:
extract_category_specific_info()
method - Enhanced:
process_document()
function with category/custom key parameters - New: Category-specific regex patterns for each domain
- Improved: UI with additional input controls and output tabs
Backward Compatibility
- β All existing strategies still work
- β Auto-detection remains the default
- β Original demo functionality preserved
- β Enhanced with new capabilities
This makes your Active Reading demo much more interactive and showcases the adaptive intelligence that makes it superior to traditional document processing approaches! π