Spaces:
Sleeping
Sleeping
# π― New Features Added to Active Reading Demo | |
## π **Category Selection Feature** | |
### What It Does | |
Users can now manually select or override the document category detection: | |
**Available Categories:** | |
- **Auto-Detect** (default) - AI detects domain automatically | |
- **Finance** - Financial reports, earnings, budgets | |
- **Legal** - Contracts, agreements, policies | |
- **Technical** - API docs, manuals, specifications | |
- **Medical** - Clinical trials, research, treatments | |
- **General** - Any other document type | |
### Category-Specific Extraction Patterns | |
#### π Finance Category | |
- **Revenue**: `$150 million revenue`, `sales of $2.5B` | |
- **Profit**: `profit margin 25%`, `net profit $50M` | |
- **Growth**: `15% growth`, `increased by 20%` | |
- **Dates**: `Q3 2024`, `fiscal year 2023` | |
- **Employees**: `hire 200 engineers`, `workforce of 5000` | |
- **Market Cap**: `market cap $10B` | |
#### βοΈ Legal Category | |
- **Parties**: `between Company A and Company B` | |
- **Terms**: `term of 36 months`, `duration 3 years` | |
- **Liability**: `liability not to exceed $1M` | |
- **Termination**: `90 days written notice` | |
- **Governing Law**: `governed by laws of Delaware` | |
- **Effective Date**: `effective January 1, 2024` | |
#### π§ Technical Category | |
- **API Endpoints**: `GET /api/users`, `POST /auth/login` | |
- **Versions**: `version 2.1.0`, `v3.5` | |
- **Response Time**: `response time 150ms` | |
- **Rate Limits**: `1000 requests per minute` | |
- **Authentication**: `OAuth 2.0`, `JWT tokens` | |
- **Status Codes**: `HTTP 200`, `status code 404` | |
#### π₯ Medical Category | |
- **Dosage**: `50mg daily`, `100ml twice daily` | |
- **Duration**: `treatment for 12 weeks` | |
- **Efficacy**: `85% efficacy rate` | |
- **Side Effects**: `side effects in 12% of patients` | |
- **Patient Count**: `500 patients enrolled` | |
- **P-Values**: `p<0.001`, `p=0.025` | |
## π **Custom Keys Feature** | |
### What It Does | |
Users can specify their own extraction terms as comma-separated values: | |
**Example Inputs:** | |
``` | |
CEO, budget, deadline, timeline | |
risk assessment, compliance, audit | |
performance, scalability, security | |
treatment, dosage, clinical trial | |
``` | |
### How It Works | |
- **Smart Extraction**: Finds sentences containing the custom terms | |
- **Context Preservation**: Returns full sentences, not just keywords | |
- **Confidence Scoring**: Shows extraction confidence levels | |
- **JSON Output**: Structured data for easy integration | |
## π― **New Strategy: Category-Specific Extraction** | |
### What's New | |
Added a specialized strategy that combines: | |
1. **Category-specific patterns** for targeted extraction | |
2. **Custom key extraction** for user-defined terms | |
3. **Structured output** with confidence scores | |
4. **Domain expertise** for each business category | |
### Example Output | |
```json | |
{ | |
"category": "Finance", | |
"extracted_data": { | |
"revenue": ["$150 million", "$2.5 billion sales"], | |
"growth": ["15% increase", "20% growth rate"], | |
"date": ["Q3 2024", "fiscal year 2023"] | |
}, | |
"custom_extractions": { | |
"CEO": ["CEO announced plans to expand", "CEO John Smith reported"], | |
"investment": ["$50M investment in AI", "investment in new markets"] | |
}, | |
"confidence_scores": { | |
"revenue": 8.5, | |
"custom_CEO": 6.2 | |
} | |
} | |
``` | |
## π¨ **Enhanced UI Elements** | |
### New Input Controls | |
- **π Category Dropdown**: Manual category selection | |
- **π Custom Keys Input**: Text field for custom extraction terms | |
- **π Enhanced Strategy Selection**: Added "Category-Specific Extraction" | |
### New Output Tabs | |
- **π― Category Analysis**: Dedicated tab for category-specific results | |
- **Enhanced JSON**: Structured category extraction data | |
- **Confidence Scores**: Shows extraction reliability | |
### Improved User Experience | |
- **Dynamic Help Text**: Context-aware guidance | |
- **Example Suggestions**: Sample custom keys for each category | |
- **Better Visual Organization**: Clearer result presentation | |
## π **Usage Examples** | |
### Finance Document Analysis | |
``` | |
Document Category: Finance | |
Custom Keys: CEO, quarterly results, investment | |
Strategy: Category-Specific Extraction | |
``` | |
**Result**: Extracts revenue figures, profit margins, growth rates PLUS CEO mentions, quarterly data, and investment information. | |
### Legal Contract Review | |
``` | |
Document Category: Legal | |
Custom Keys: liability, termination, governing law | |
Strategy: Category-Specific Extraction | |
``` | |
**Result**: Finds contract parties, terms, dates PLUS specific liability clauses, termination conditions, and jurisdiction details. | |
### Technical Documentation | |
``` | |
Document Category: Technical | |
Custom Keys: security, performance, scalability | |
Strategy: Category-Specific Extraction | |
``` | |
**Result**: Extracts API endpoints, versions, rate limits PLUS security features, performance metrics, and scalability considerations. | |
## π― **Why This Makes Active Reading Better** | |
### 1. **Adaptive Intelligence** | |
- AI now adapts not just to document type, but to user-specific needs | |
- Combines automated domain detection with custom requirements | |
### 2. **Enterprise Flexibility** | |
- Users can extract exactly what they need for their business case | |
- Supports diverse enterprise document analysis workflows | |
### 3. **Structured Output** | |
- Category-specific patterns ensure consistent extraction | |
- Custom keys add user-defined flexibility | |
- JSON format enables easy integration | |
### 4. **Demonstrable Value** | |
- Shows how Active Reading adapts to different business domains | |
- Proves the framework can handle real enterprise requirements | |
- Highlights the superiority over one-size-fits-all approaches | |
## π¨ **Implementation Impact** | |
### What Changed in Code | |
- **Added**: `extract_category_specific_info()` method | |
- **Enhanced**: `process_document()` function with category/custom key parameters | |
- **New**: Category-specific regex patterns for each domain | |
- **Improved**: UI with additional input controls and output tabs | |
### Backward Compatibility | |
- β All existing strategies still work | |
- β Auto-detection remains the default | |
- β Original demo functionality preserved | |
- β Enhanced with new capabilities | |
This makes your Active Reading demo much more interactive and showcases the adaptive intelligence that makes it superior to traditional document processing approaches! π | |