Spaces:
Sleeping
Sleeping
File size: 6,203 Bytes
411c845 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# π― New Features Added to Active Reading Demo
## π **Category Selection Feature**
### What It Does
Users can now manually select or override the document category detection:
**Available Categories:**
- **Auto-Detect** (default) - AI detects domain automatically
- **Finance** - Financial reports, earnings, budgets
- **Legal** - Contracts, agreements, policies
- **Technical** - API docs, manuals, specifications
- **Medical** - Clinical trials, research, treatments
- **General** - Any other document type
### Category-Specific Extraction Patterns
#### π Finance Category
- **Revenue**: `$150 million revenue`, `sales of $2.5B`
- **Profit**: `profit margin 25%`, `net profit $50M`
- **Growth**: `15% growth`, `increased by 20%`
- **Dates**: `Q3 2024`, `fiscal year 2023`
- **Employees**: `hire 200 engineers`, `workforce of 5000`
- **Market Cap**: `market cap $10B`
#### βοΈ Legal Category
- **Parties**: `between Company A and Company B`
- **Terms**: `term of 36 months`, `duration 3 years`
- **Liability**: `liability not to exceed $1M`
- **Termination**: `90 days written notice`
- **Governing Law**: `governed by laws of Delaware`
- **Effective Date**: `effective January 1, 2024`
#### π§ Technical Category
- **API Endpoints**: `GET /api/users`, `POST /auth/login`
- **Versions**: `version 2.1.0`, `v3.5`
- **Response Time**: `response time 150ms`
- **Rate Limits**: `1000 requests per minute`
- **Authentication**: `OAuth 2.0`, `JWT tokens`
- **Status Codes**: `HTTP 200`, `status code 404`
#### π₯ Medical Category
- **Dosage**: `50mg daily`, `100ml twice daily`
- **Duration**: `treatment for 12 weeks`
- **Efficacy**: `85% efficacy rate`
- **Side Effects**: `side effects in 12% of patients`
- **Patient Count**: `500 patients enrolled`
- **P-Values**: `p<0.001`, `p=0.025`
## π **Custom Keys Feature**
### What It Does
Users can specify their own extraction terms as comma-separated values:
**Example Inputs:**
```
CEO, budget, deadline, timeline
risk assessment, compliance, audit
performance, scalability, security
treatment, dosage, clinical trial
```
### How It Works
- **Smart Extraction**: Finds sentences containing the custom terms
- **Context Preservation**: Returns full sentences, not just keywords
- **Confidence Scoring**: Shows extraction confidence levels
- **JSON Output**: Structured data for easy integration
## π― **New Strategy: Category-Specific Extraction**
### What's New
Added a specialized strategy that combines:
1. **Category-specific patterns** for targeted extraction
2. **Custom key extraction** for user-defined terms
3. **Structured output** with confidence scores
4. **Domain expertise** for each business category
### Example Output
```json
{
"category": "Finance",
"extracted_data": {
"revenue": ["$150 million", "$2.5 billion sales"],
"growth": ["15% increase", "20% growth rate"],
"date": ["Q3 2024", "fiscal year 2023"]
},
"custom_extractions": {
"CEO": ["CEO announced plans to expand", "CEO John Smith reported"],
"investment": ["$50M investment in AI", "investment in new markets"]
},
"confidence_scores": {
"revenue": 8.5,
"custom_CEO": 6.2
}
}
```
## π¨ **Enhanced UI Elements**
### New Input Controls
- **π Category Dropdown**: Manual category selection
- **π Custom Keys Input**: Text field for custom extraction terms
- **π Enhanced Strategy Selection**: Added "Category-Specific Extraction"
### New Output Tabs
- **π― Category Analysis**: Dedicated tab for category-specific results
- **Enhanced JSON**: Structured category extraction data
- **Confidence Scores**: Shows extraction reliability
### Improved User Experience
- **Dynamic Help Text**: Context-aware guidance
- **Example Suggestions**: Sample custom keys for each category
- **Better Visual Organization**: Clearer result presentation
## π **Usage Examples**
### Finance Document Analysis
```
Document Category: Finance
Custom Keys: CEO, quarterly results, investment
Strategy: Category-Specific Extraction
```
**Result**: Extracts revenue figures, profit margins, growth rates PLUS CEO mentions, quarterly data, and investment information.
### Legal Contract Review
```
Document Category: Legal
Custom Keys: liability, termination, governing law
Strategy: Category-Specific Extraction
```
**Result**: Finds contract parties, terms, dates PLUS specific liability clauses, termination conditions, and jurisdiction details.
### Technical Documentation
```
Document Category: Technical
Custom Keys: security, performance, scalability
Strategy: Category-Specific Extraction
```
**Result**: Extracts API endpoints, versions, rate limits PLUS security features, performance metrics, and scalability considerations.
## π― **Why This Makes Active Reading Better**
### 1. **Adaptive Intelligence**
- AI now adapts not just to document type, but to user-specific needs
- Combines automated domain detection with custom requirements
### 2. **Enterprise Flexibility**
- Users can extract exactly what they need for their business case
- Supports diverse enterprise document analysis workflows
### 3. **Structured Output**
- Category-specific patterns ensure consistent extraction
- Custom keys add user-defined flexibility
- JSON format enables easy integration
### 4. **Demonstrable Value**
- Shows how Active Reading adapts to different business domains
- Proves the framework can handle real enterprise requirements
- Highlights the superiority over one-size-fits-all approaches
## π¨ **Implementation Impact**
### What Changed in Code
- **Added**: `extract_category_specific_info()` method
- **Enhanced**: `process_document()` function with category/custom key parameters
- **New**: Category-specific regex patterns for each domain
- **Improved**: UI with additional input controls and output tabs
### Backward Compatibility
- β
All existing strategies still work
- β
Auto-detection remains the default
- β
Original demo functionality preserved
- β
Enhanced with new capabilities
This makes your Active Reading demo much more interactive and showcases the adaptive intelligence that makes it superior to traditional document processing approaches! π
|