Spaces:
				
			
			
	
			
			
		Runtime error
		
	
	
	
			
			
	
	
	
	
		
		
		Runtime error
		
	File size: 4,754 Bytes
			
			| 03c0888 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | ### Session Management
Session management in Crawl4AI is a powerful feature that allows you to maintain state across multiple requests, making it particularly suitable for handling complex multi-step crawling tasks. It enables you to reuse the same browser tab (or page object) across sequential actions and crawls, which is beneficial for:
- **Performing JavaScript actions before and after crawling.**
- **Executing multiple sequential crawls faster** without needing to reopen tabs or allocate memory repeatedly.
**Note:** This feature is designed for sequential workflows and is not suitable for parallel operations.
---
#### Basic Session Usage
Use `BrowserConfig` and `CrawlerRunConfig` to maintain state with a `session_id`:
```python
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
async with AsyncWebCrawler() as crawler:
    session_id = "my_session"
    # Define configurations
    config1 = CrawlerRunConfig(url="https://example.com/page1", session_id=session_id)
    config2 = CrawlerRunConfig(url="https://example.com/page2", session_id=session_id)
    # First request
    result1 = await crawler.arun(config=config1)
    # Subsequent request using the same session
    result2 = await crawler.arun(config=config2)
    # Clean up when done
    await crawler.crawler_strategy.kill_session(session_id)
```
---
#### Dynamic Content with Sessions
Here's an example of crawling GitHub commits across multiple pages while preserving session state:
```python
from crawl4ai.async_configs import CrawlerRunConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
from crawl4ai.cache_context import CacheMode
async def crawl_dynamic_content():
    async with AsyncWebCrawler() as crawler:
        session_id = "github_commits_session"
        url = "https://github.com/microsoft/TypeScript/commits/main"
        all_commits = []
        # Define extraction schema
        schema = {
            "name": "Commit Extractor",
            "baseSelector": "li.Box-sc-g0xbh4-0",
            "fields": [{"name": "title", "selector": "h4.markdown-title", "type": "text"}],
        }
        extraction_strategy = JsonCssExtractionStrategy(schema)
        # JavaScript and wait configurations
        js_next_page = """document.querySelector('a[data-testid="pagination-next-button"]').click();"""
        wait_for = """() => document.querySelectorAll('li.Box-sc-g0xbh4-0').length > 0"""
        # Crawl multiple pages
        for page in range(3):
            config = CrawlerRunConfig(
                url=url,
                session_id=session_id,
                extraction_strategy=extraction_strategy,
                js_code=js_next_page if page > 0 else None,
                wait_for=wait_for if page > 0 else None,
                js_only=page > 0,
                cache_mode=CacheMode.BYPASS
            )
            result = await crawler.arun(config=config)
            if result.success:
                commits = json.loads(result.extracted_content)
                all_commits.extend(commits)
                print(f"Page {page + 1}: Found {len(commits)} commits")
        # Clean up session
        await crawler.crawler_strategy.kill_session(session_id)
        return all_commits
```
---
#### Session Best Practices
1. **Descriptive Session IDs**:
   Use meaningful names for session IDs to organize workflows:
   ```python
   session_id = "login_flow_session"
   session_id = "product_catalog_session"
   ```
2. **Resource Management**:
   Always ensure sessions are cleaned up to free resources:
   ```python
   try:
       # Your crawling code here
       pass
   finally:
       await crawler.crawler_strategy.kill_session(session_id)
   ```
3. **State Maintenance**:
   Reuse the session for subsequent actions within the same workflow:
   ```python
   # Step 1: Login
   login_config = CrawlerRunConfig(
       url="https://example.com/login",
       session_id=session_id,
       js_code="document.querySelector('form').submit();"
   )
   await crawler.arun(config=login_config)
   # Step 2: Verify login success
   dashboard_config = CrawlerRunConfig(
       url="https://example.com/dashboard",
       session_id=session_id,
       wait_for="css:.user-profile"  # Wait for authenticated content
   )
   result = await crawler.arun(config=dashboard_config)
   ```
---
#### Common Use Cases for Sessions
1. **Authentication Flows**: Login and interact with secured pages.
2. **Pagination Handling**: Navigate through multiple pages.
3. **Form Submissions**: Fill forms, submit, and process results.
4. **Multi-step Processes**: Complete workflows that span multiple actions.
5. **Dynamic Content Navigation**: Handle JavaScript-rendered or event-triggered content.
 | 
 
			
