Spaces:

hadadrjt
/

ai

Running

App Files Files Community

hadadrjt commited on 9 days ago

Commit

5d9ca4f

1 Parent(s): 3d462e4

ai: Enable API for Next-Gen!

Browse files

And also coordinate all the necessary arrangements for it.

Files changed (14) hide show

README.md +24 -27
app.py +1 -1
assets/bin/ai +135 -0
assets/bin/install.sh +33 -0
src/core/server.py +97 -241
src/core/transport/__init__.py +0 -0
src/core/transport/aiohttp.py +151 -0
src/core/transport/httpx.py +152 -0
src/ui/interface.py +8 -3
src/ui/reasoning.py +0 -75
src/utils/instruction.py +55 -0
src/utils/reasoning.py +140 -0
src/utils/session_mapping.py +39 -28
src/utils/time.py +49 -0

README.md CHANGED Viewed

@@ -12,28 +12,29 @@ app_port: 7860
 pinned: true
 short_description: Just a Rather Very Intelligent System
 models:
-- hadadrjt/JARVIS
-- agentica-org/DeepCoder-14B-Preview
-- deepseek-ai/DeepSeek-V3-0324
-- deepseek-ai/DeepSeek-R1
-- deepseek-ai/DeepSeek-R1-0528
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
-- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
-- google/gemma-3-1b-it
-- google/gemma-3-4b-it
-- google/gemma-3-27b-it
-- meta-llama/Llama-3.1-8B-Instruct
-- meta-llama/Llama-3.2-3B-Instruct
-- meta-llama/Llama-3.3-70B-Instruct
-- meta-llama/Llama-4-Maverick-17B-128E-Instruct
-- meta-llama/Llama-4-Scout-17B-16E-Instruct
-- Qwen/Qwen2.5-VL-3B-Instruct
-- Qwen/Qwen2.5-VL-32B-Instruct
-- Qwen/Qwen2.5-VL-72B-Instruct
-- Qwen/QwQ-32B
-- Qwen/Qwen3-235B-A22B
-- mistralai/Devstral-Small-2505
-- google/gemma-3n-E4B-it-litert-preview
 ---
 ## Credits
@@ -44,8 +45,4 @@ Thanks are extended to [SearXNG](https://paulgo.io), [Baidu](https://www.baidu.c
 The latest version of Deep Search is entirely inspired by the [OpenWebUI](https://openwebui.com/t/cooksleep/infinite_search) tools script.
-Special appreciation is given to [Hugging Face](https://huggingface.co) for hosting this Space as the primary deployment platform.
-## API
-Efforts are underway to restore API and multi-platform support at the earliest opportunity.

 pinned: true
 short_description: Just a Rather Very Intelligent System
 models:
+  - hadadrjt/JARVIS
+  # Credits for several models previously used across multiple platforms
+  - agentica-org/DeepCoder-14B-Preview
+  - deepseek-ai/DeepSeek-V3-0324
+  - deepseek-ai/DeepSeek-R1
+  - deepseek-ai/DeepSeek-R1-0528
+  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+  - deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+  - google/gemma-3-1b-it
+  - google/gemma-3-4b-it
+  - google/gemma-3-27b-it
+  - meta-llama/Llama-3.1-8B-Instruct
+  - meta-llama/Llama-3.2-3B-Instruct
+  - meta-llama/Llama-3.3-70B-Instruct
+  - meta-llama/Llama-4-Maverick-17B-128E-Instruct
+  - meta-llama/Llama-4-Scout-17B-16E-Instruct
+  - Qwen/Qwen2.5-VL-3B-Instruct
+  - Qwen/Qwen2.5-VL-32B-Instruct
+  - Qwen/Qwen2.5-VL-72B-Instruct
+  - Qwen/QwQ-32B
+  - Qwen/Qwen3-235B-A22B
+  - mistralai/Devstral-Small-2505
+  - google/gemma-3n-E4B-it-litert-preview
 ---
 ## Credits
 The latest version of Deep Search is entirely inspired by the [OpenWebUI](https://openwebui.com/t/cooksleep/infinite_search) tools script.
+Special appreciation is given to [Hugging Face](https://huggingface.co) for hosting this Space as the primary deployment platform.

app.py CHANGED Viewed

@@ -16,4 +16,4 @@ if __name__ == "__main__":
     # Call the 'launch' method on the 'app' object to start the user interface.
     # This typically opens the UI window or begins the event loop, making the application interactive.
-    app.queue(default_concurrency_limit=2).launch(show_api=False, quiet=True, pwa=True)

     # Call the 'launch' method on the 'app' object to start the user interface.
     # This typically opens the UI window or begins the event loop, making the application interactive.
+    app.queue(default_concurrency_limit=2).launch(share=True, quiet=True, pwa=True)

assets/bin/ai ADDED Viewed

	@@ -0,0 +1,135 @@

+#!/usr/bin/env python3
+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+import sys  # Provides access to command-line arguments and system-related functions for input handling
+import re  # Provides regular expression operations used here to parse and extract code blocks from text
+from gradio_client import Client  # Imports Client class to interact programmatically with a Gradio-hosted AI model endpoint
+from rich.console import Console, Group  # Imports Console for rich text output, Group to combine multiple renderables for display
+from rich.markdown import Markdown  # Imports Markdown renderer to format and display markdown text in the terminal
+from rich.syntax import Syntax  # Imports Syntax highlighter to render code blocks with language-specific coloring
+from rich.live import Live  # Imports Live to enable live-updating terminal output for streaming content display
+console = Console()  # Creates a Console instance for enhanced terminal output with colors and formatting
+client = Client("https://hadadrjt-ai.hf.space/")  # Initializes a Gradio client connected to the specified AI service URL
+def layout(text):
+    """
+    Processes the input text to separate markdown content and code blocks, then formats them for terminal display.
+    Code blocks are detected by triple backticks with language specifiers. The function returns a Group object
+    combining Markdown and Syntax renderables for rich output.
+    Args:
+        text (str): The input string potentially containing markdown and fenced code blocks.
+    Returns:
+        Group: A rich Group object containing formatted markdown and syntax-highlighted code blocks.
+    """
+    if not isinstance(text, str):
+        # Convert input to string if it is not already, to avoid errors during regex processing
+        text = str(text)
+    # Use regex to find all code blocks in the text matching the pattern:
+    # Two newlines, triple backticks, language identifier, two newlines, code content, two newlines, triple backticks, then three newlines
+    # The pattern captures language and code separately for formatting
+    code_blocks = list(re.finditer(r"\n\n``````\n\n\n", text, re.DOTALL))
+    segments = []  # List to hold markdown and syntax segments for rendering
+    last_end = 0  # Tracks the end position of the last matched code block to slice text correctly
+    for block in code_blocks:
+        # Extract text before the current code block
+        pre = text[last_end:block.start()]
+        if pre.strip():
+            # If pre-block text is not just whitespace, convert it to Markdown renderable
+            segments.append(Markdown(pre.strip()))
+        # Extract language and code content from the current code block
+        lang, code = block.group(1) or "text", block.group(2).rstrip()
+        # Append a Syntax renderable with the extracted code and language for syntax highlighting
+        segments.append(Syntax(code, lang, theme="monokai", line_numbers=False, word_wrap=True))
+        # Update last_end to the end of the current code block for next iteration
+        last_end = block.end()
+    # Append any remaining text after the last code block as Markdown
+    tail = text[last_end:]
+    if tail.strip():
+        segments.append(Markdown(tail.strip()))
+    # Return a Group combining all markdown and syntax segments for unified rendering
+    return Group(*segments)
+def main():
+    """
+    Main entry point of the script that handles user input, sends it to the AI model, and streams the response.
+    The function supports command-line input or defaults to a greeting message. It streams the AI's response token-by-token,
+    updating the terminal output live with proper markdown and code formatting.
+    Workflow:
+    1. Parse command-line arguments to form the user input message.
+    2. Define parameters for the AI model including model precision, reasoning mode, and feature toggles.
+    3. Submit the request to the remote AI service and receive a streaming response.
+    4. Incrementally update the console output with streamed tokens, formatting markdown and code blocks dynamically.
+    """
+    # Extract command-line arguments excluding the script name
+    args = sys.argv[1:]
+    # Join arguments into a single string as user input; default to "Hi!" if no input provided
+    user_input = " ".join(args) if args else "Hi!"
+    # Define parameters for the AI model request
+    params = dict(
+        message=user_input,           # The input message to send to the AI model
+        model_label="Q8_K_XL",        # Specifies the model precision or variant to use
+        thinking=True,                # Enables reasoning mode for more thoughtful responses
+        image_gen=False,              # Disables image generation as terminal cannot display images
+        audio_gen=False,              # Disables audio generation as terminal cannot play audio
+        search_gen=True,              # Enables deep search feature for enhanced response accuracy
+                                      # Type /dp followed by the instructions to search the web
+        api_name="/api"               # API endpoint path to use on the Gradio client
+    )
+    # Submit the request to the AI model and start receiving a streaming response
+    job = client.submit(**params)
+    partial = ""  # Stores the accumulated response text received so far
+    # Use Live context manager to dynamically update the console output as new tokens arrive
+    with Live(layout(partial), console=console) as live:
+        for chunk in job:
+            # Each chunk can be a list or a single item; extract the 'content' field if present
+            if isinstance(chunk, list):
+                # Iterate through list items to find dictionary with 'content' key
+                for item in chunk:
+                    if isinstance(item, dict) and 'content' in item:
+                        new_response = item['content']  # Extract the new content token
+                        break
+                else:
+                    # If no content found in list, convert entire chunk to string
+                    new_response = str(chunk)
+            else:
+                # If chunk is not a list, convert it directly to string
+                new_response = str(chunk)
+            # Determine the new token by removing the already received part from the new response
+            if new_response.startswith(partial):
+                new_token = new_response[len(partial):]
+            else:
+                # If the new response does not start with partial, treat entire response as new token
+                new_token = new_response
+            # Update the accumulated partial response with the new token
+            partial = new_response
+            # Update the live display in the terminal with the newly formatted content
+            live.update(layout(partial))
+# Entry point check to ensure main() runs only if this script is executed directly
+if __name__ == "__main__":
+    main()  # Call the main function to start the program

assets/bin/install.sh ADDED Viewed

	@@ -0,0 +1,33 @@

+#!/bin/sh
+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+echo "Installing required Python packages..."
+pip install gradio_client rich --upgrade
+echo "Installation complete."
+echo ""
+echo ""
+echo "Downloading the J.A.R.V.I.S. script..."
+wget https://huggingface.co/spaces/hadadrjt/ai/raw/main/assets/bin/ai
+echo "Download complete."
+echo ""
+echo ""
+echo "Setting executable permission..."
+chmod a+x ai
+echo "Permission set."
+echo ""
+echo ""
+echo "Removing installer script..."
+rm install.sh
+echo "Done."
+echo ""
+echo ""
+echo "To send a regular message:"
+echo "./ai Your message here"
+echo ""
+echo "To use Deep Search mode:"
+echo "./ai /dp Your message here"
+echo ""
+echo ""

src/core/server.py CHANGED Viewed

@@ -3,268 +3,124 @@
 # SPDX-License-Identifier: Apache-2.0
 #
-import json  # Module to parse and handle JSON data
-import uuid  # Module to generate unique identifiers (UUIDs)
-from typing import List, Dict, Any  # Type hinting for list, dict, and generic types
-from datetime import datetime  # To get and format current date and time
-from config import *  # Import all configuration variables, including 'auth' and 'restrictions'
-from src.utils.session_mapping import get_host  # Function to get server info by session ID
-from src.utils.ip_generator import generate_ip  # Function to generate random IP addresses for headers
-from src.utils.helper import mark  # Function to mark a server as busy/unavailable
-from src.ui.reasoning import styles  # Function to apply CSS styling to reasoning output
-import asyncio  # Asyncio for asynchronous programming
-import httpx  # Async HTTP client supporting HTTP/2 and streaming
-import aiohttp  # Async HTTP client for alternative HTTP requests and streaming
 async def jarvis(
-    session_id: str,  # Unique session identifier to maintain consistent server assignment
-    model: str,  # AI model name specifying which model to use
-    history: List[Dict[str, str]],  # List of previous conversation messages with roles and content
-    user_message: str,  # Latest user input message to send to the AI model
-    mode: str,  # Mode string to guide AI behavior, e.g., '/think' or '/no_think'
-    files=None,  # Optional files or attachments to include with the user message
-    temperature: float = 0.6,  # Sampling temperature controlling randomness in token generation
-    top_k: int = 20,  # Limit token selection to top_k probable tokens
-    min_p: float = 0,  # Minimum probability threshold for token selection
-    top_p: float = 0.95,  # Nucleus sampling cumulative probability threshold
-    repetition_penalty: float = 1,  # Penalty factor to reduce token repetition
 ):
     """
-    Asynchronously send a chat request to a Jarvis AI server and handle streaming response incrementally.
-    This function manages server selection based on the session ID, retries requests on specific error codes,
-    and yields incremental parts of the AI-generated response as they arrive. It integrates CSS styling into
-    the reasoning output only if the mode is not '/no_think', preserving the behavior where reasoning is streamed
-    first inside a styled HTML block, followed by the main content streamed normally.
-    The implementation uses both httpx (with HTTP/2 support) and aiohttp to ensure compatibility and robustness
-    in streaming responses.
-    Args:
-        session_id (str): Identifier for the user session to maintain consistent server assignment.
-        model (str): Name of the AI model to use for generating the response.
-        history (List[Dict[str, str]]): List of previous messages in the conversation.
-        user_message (str): The current message from the user to send to the AI model.
-        mode (str): Contextual instructions to guide the AI model's response style.
-        files (optional): Additional files or attachments to include with the user message.
-        temperature (float): Controls randomness in token generation.
-        top_k (int): Limits token selection to top_k probable tokens.
-        min_p (float): Minimum probability threshold for token selection.
-        top_p (float): Nucleus sampling cumulative probability threshold.
-        repetition_penalty (float): Factor to reduce token repetition.
-    Yields:
-        str: Incremental strings of AI-generated response streamed from the server.
-             Reasoning is wrapped in a styled HTML details block and streamed incrementally only if mode is not '/no_think'.
-             After reasoning finishes, the main content is streamed normally.
-    Notes:
-        The function attempts to send the request to a server assigned for the session.
-        If the server returns a specific error code indicating it is busy, it retries with another server.
-        If all servers are busy or fail, it yields a message indicating the server is busy.
-    """
-    tried = set()  # Set to track servers already tried to avoid repeated retries
-    # Loop until all available servers have been tried without success
-    while len(tried) < len(auth):
-        # Get server setup info assigned for this session, including endpoint, token, and error code
-        setup = get_host(session_id)
-        server = setup["jarvis"]  # Server identifier string
-        host = setup["endpoint"]  # API endpoint URL string
-        token = setup["token"]  # Authorization token string
-        error = setup["error"]  # HTTP error code integer which triggers retry
-        tried.add(server)  # Mark this server as tried to prevent retrying immediately
-        # Format current date/time string for system instructions, e.g., "Tuesday, June 24, 2025, 08:13 PM "
-        date = datetime.now().strftime("%A, %B %d, %Y, %I:%M %p %Z")
-        # Combine mode instructions, usage restrictions, and date into a single system instructions string
-        instructions = f"{mode}\n\n\n{restrictions}\n\n\nToday: {date}\n\n\n"
-        # Copy conversation history to avoid mutating the original list outside this function
-        messages = history.copy()
-        # Insert system instructions as the first message in the conversation history
-        messages.insert(0, {"role": "system", "content": instructions})
-        # Prepare user message dictionary, include files if provided
-        msg = {"role": "user", "content": user_message}
-        if files:
-            msg["files"] = files
-        messages.append(msg)  # Append user message to the conversation messages list
-        # Prepare HTTP headers with authorization and randomized client IP for X-Forwarded-For
         headers = {
-            "Authorization": f"Bearer {token}",  # Bearer token for API access authentication
-            "Content-Type": "application/json",  # Content type set to JSON for request body
-            "X-Forwarded-For": generate_ip(),  # Random IP to simulate different client origins for load balancing or logging
         }
-        # Prepare JSON payload with model parameters and conversation messages to send in POST request
         payload = {
-            "model": model,
-            "messages": messages,
-            "stream": True,  # Enable streaming response
-            "temperature": temperature,
-            "top_k": top_k,
-            "min_p": min_p,
-            "top_p": top_p,
-            "repetition_penalty": repetition_penalty,
         }
-        # Initialize accumulators and flags for streamed response parts
-        reasoning = ""  # String accumulator for reasoning text from the AI
-        reasoning_check = None  # Flag to detect presence of reasoning in response; None means not checked yet
-        reasoning_done = False  # Flag marking reasoning completion
-        content = ""  # String accumulator for main content text from the AI
         try:
-            # Use httpx AsyncClient with no timeout to allow long streaming connections
-            async with httpx.AsyncClient(timeout=None, http2=True) as client:
-                # Open async streaming POST request to Jarvis server endpoint with headers and JSON payload
-                async with client.stream("POST", host, headers=headers, json=payload) as response:
-                    # Iterate asynchronously over each line of streaming response as it arrives
-                    async for chunk in response.aiter_lines():
-                        # Skip lines that do not start with "data:" prefix as per server-sent events (SSE) format
-                        if not chunk.strip().startswith("data:"):
-                            continue
-                        try:
-                            # Parse JSON data after "data:" prefix which contains incremental response delta
-                            data = json.loads(chunk[5:])
-                            # Extract incremental delta message from first choice in response
-                            choice = data["choices"][0]["delta"]
-                            # On first delta received, detect if 'reasoning' field is present and non-empty
-                            if reasoning_check is None:
-                                # Initialize reasoning_check to empty string if reasoning exists and is non-empty, else None
-                                reasoning_check = "" if ("reasoning" in choice and choice["reasoning"]) else None
-                            # If reasoning is present and mode is not '/no_think' and reasoning not done yet
-                            if (
-                                reasoning_check == ""  # Reasoning detected in response
-                                and mode != "/no_think"  # Mode allows reasoning output
-                                and not reasoning_done  # Reasoning phase not finished yet
-                                and "reasoning" in choice  # Current delta includes reasoning part
-                                and choice["reasoning"]  # Reasoning content is not empty
-                            ):
-                                reasoning += choice["reasoning"]  # Append incremental reasoning text
-                                # Yield reasoning wrapped in styled HTML block with details expanded
-                                yield styles(reasoning=reasoning, content="", expanded=True)
-                                continue  # Continue streaming reasoning increments without processing content yet
-                            # When reasoning ends and content starts, mark reasoning done, yield empty string, then content
-                            if (
-                                reasoning_check == ""  # Reasoning was detected previously
-                                and mode != "/no_think"  # Mode allows reasoning output
-                                and not reasoning_done  # Reasoning phase not finished yet
-                                and "content" in choice  # Current delta includes content part
-                                and choice["content"]  # Content is not empty
-                            ):
-                                reasoning_done = True  # Mark reasoning phase complete
-                                yield ""  # Yield empty string to signal end of reasoning block to the consumer
-                                content += choice["content"]  # Start accumulating content text
-                                yield content  # Yield first part of content to the consumer
-                                continue  # Continue streaming content increments
-                            # If no reasoning present or reasoning done, accumulate content and yield incrementally
-                            if (
-                                (reasoning_check is None or reasoning_done or mode == "/no_think")  # No reasoning or reasoning finished or mode disables reasoning
-                                and "content" in choice  # Current delta includes content part
-                                and choice["content"]  # Content is not empty
-                            ):
-                                content += choice["content"]  # Append incremental content text
-                                yield content  # Yield updated content string to the consumer
-                        except Exception:
-                            # Ignore exceptions during JSON parsing or key access and continue streaming
-                            continue
-            return  # Exit function after successful streaming completion
-        except httpx.HTTPStatusError as e:
-            # If server returns specific error code indicating busy, retry with another server
-            if e.response.status_code == error:
-                # Continue to next iteration to try a different server
-                continue
             else:
-                # For other HTTP errors, mark this server as busy/unavailable
-                mark(server)
-        except Exception:
-            # For other exceptions (network errors, timeouts), mark server as busy/unavailable
-            mark(server)
-        # If httpx fails or server is busy, fallback to aiohttp for robustness and compatibility
-        try:
-            # Create aiohttp client session with no timeout for streaming
-            async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=None)) as session:
-                # Open async streaming POST request to Jarvis server endpoint with headers and JSON payload
-                async with session.post(host, headers=headers, json=payload) as resp:
-                    # Raise for status to catch HTTP errors
-                    resp.raise_for_status()
-                    # Iterate asynchronously over each line of streaming response as it arrives
-                    async for line_bytes in resp.content:
-                        # Decode bytes to string and strip whitespace
-                        line = line_bytes.decode("utf-8").strip()
-                        # Skip lines that do not start with "data:" prefix as per SSE format
-                        if not line.startswith("data:"):
-                            continue
-                        try:
-                            # Parse JSON data after "data:" prefix which contains incremental response delta
-                            data = json.loads(line[5:])
-                            # Extract incremental delta message from first choice in response
-                            choice = data["choices"][0]["delta"]
-                            # On first delta received, detect if 'reasoning' field is present and non-empty
-                            if reasoning_check is None:
-                                reasoning_check = "" if ("reasoning" in choice and choice["reasoning"]) else None
-                            # If reasoning is present and mode is not '/no_think' and reasoning not done yet
-                            if (
-                                reasoning_check == ""
-                                and mode != "/no_think"
-                                and not reasoning_done
-                                and "reasoning" in choice
-                                and choice["reasoning"]
-                            ):
-                                reasoning += choice["reasoning"]
-                                yield styles(reasoning=reasoning, content="", expanded=True)
-                                continue
-                            # When reasoning ends and content starts, mark reasoning done, yield empty string, then content
-                            if (
-                                reasoning_check == ""
-                                and mode != "/no_think"
-                                and not reasoning_done
-                                and "content" in choice
-                                and choice["content"]
-                            ):
-                                reasoning_done = True
-                                yield ""
-                                content += choice["content"]
-                                yield content
-                                continue
-                            # If no reasoning present or reasoning done, accumulate content and yield incrementally
-                            if (
-                                (reasoning_check is None or reasoning_done or mode == "/no_think")
-                                and "content" in choice
-                                and choice["content"]
-                            ):
-                                content += choice["content"]
-                                yield content
-                        except Exception:
-                            # Ignore exceptions during JSON parsing or key access and continue streaming
-                            continue
-            return  # Exit function after successful streaming completion with aiohttp
-        except aiohttp.ClientResponseError as e:
-            # If server returns specific error code indicating busy, retry with another server
-            if e.status == error:
-                continue  # Try next available server
             else:
-                mark(server)  # Mark server as busy/unavailable for other HTTP errors
-        except Exception:
-            # For other exceptions (network errors, timeouts), mark server as busy/unavailable
-            mark(server)
-    # If all servers tried and none succeeded, yield a message indicating server busy status
-    yield "The server is currently busy. Please wait a moment or try again later."
-    return  # End of function

 # SPDX-License-Identifier: Apache-2.0
 #
+import json  # Import json module to work with JSON objects for request and response handling
+import uuid  # Import uuid module to generate unique identifiers if needed for tracking or sessions
+from typing import List, Dict  # Import type hinting for function parameters to improve code clarity and checking
+from config import *  # Import all configuration variables such as server lists and tokens from config files
+from src.utils.session_mapping import get_host  # Import helper function to map session ID to appropriate server host
+from src.utils.ip_generator import generate_ip  # Import utility function to generate random IP addresses for headers
+from src.utils.helper import mark  # Import function to mark servers as failed for retry or logging purposes
+import asyncio  # Import asyncio module to enable asynchronous programming constructs in the function
+from src.utils.time import get_time  # Import function to get current date and time in required format
+from src.utils.reasoning import reasoning_tag_open, reasoning_tag_close  # Import functions to wrap reasoning text with tags
+from src.utils.instruction import set_instructions  # Import function to generate system instructions based on mode and time
+from src.core.transport.httpx import httpx_transport  # Import primary HTTP transport method using httpx for streaming
+from src.core.transport.aiohttp import aiohttp_transport  # Import fallback HTTP transport method using aiohttp
+# Define the main asynchronous function to communicate with AI server and stream responses
 async def jarvis(
+    session_id: str,  # Unique identifier for the user session to route requests correctly
+    model: str,  # AI model name or identifier to specify which model to use for generation
+    history: List[Dict[str, str]],  # List of previous conversation messages to maintain context
+    user_message: str,  # The latest message input from the user to send to the AI model
+    mode: str,  # Mode string controlling behavior such as enabling or disabling reasoning output
+    files=None,  # Optional parameter for any files attached by the user to include in the request
+    temperature: float = 0.6,  # Sampling temperature controlling randomness of AI responses
+    top_k: int = 20,  # Limits token selection to top-k probable tokens for response generation
+    min_p: float = 0,  # Minimum probability threshold for token sampling to filter unlikely tokens
+    top_p: float = 0.95,  # Cumulative probability cutoff for nucleus sampling of tokens
+    repetition_penalty: float = 1,  # Parameter to penalize repeated tokens to reduce repetition in output
 ):
     """
+    Stream AI response from multiple configured servers using asynchronous HTTP requests
+    Yields chunks of response that include reasoning and content parts as they arrive
+    """
+    # Initialize a set to keep track of servers that have already been attempted
+    tried = set()  # Prevents retrying the same server multiple times to avoid redundant requests
+    # Loop until a server successfully returns a response or all servers have been exhausted
+    while len(tried) < len(auth):  # Continue trying servers until all configured servers are tried
+        # Retrieve server configuration details mapped to the current session
+        setup = get_host(session_id)  # Get server host, token, and error codes for the session
+        server = setup["jarvis"]  # Extract server name identifier for logging and marking
+        host = setup["endpoint"]  # Extract server endpoint URL for sending requests
+        token = setup["token"]  # Extract authentication token for authorized access
+        error = setup["error"]  # Extract HTTP status code that indicates retryable error
+        tried.add(server)  # Add current server to tried set to avoid retrying it again
+        # Get the current date and time for system instruction
+        date = get_time()  # Retrieve current timestamp to include in system instructions
+        # Generate system instructions
+        instructions = set_instructions(mode, date)  # Create system instructions guiding AI behavior
+        # Make a shallow copy of the conversation history to avoid mutating original list
+        messages = history.copy()  # Duplicate previous messages to safely modify for this request
+        # Insert the system instruction message at the beginning of the message list
+        messages.insert(0, {"role": "system", "content": instructions})  # Add system instructions as first message
+        # Construct the user message dictionary with role and content
+        msg = {"role": "user", "content": user_message}  # Prepare user's latest input for the request
+        if files:  # Check if any files are attached to include in the message payload
+            msg["files"] = files  # Attach files to the user message to send alongside text input
+        messages.append(msg)  # Append the user message (with optional files) to the message history
+        # Prepare HTTP headers including authorization and content type for the request
         headers = {
+            "Authorization": f"Bearer {token}",  # Bearer token for authenticating with the AI server
+            "Content-Type": "application/json",  # Specify that the request body is JSON formatted
+            "X-Forwarded-For": generate_ip(),  # Randomly generated IP address to simulate client origin
         }
+        # Build the JSON payload containing model, messages, and generation parameters
         payload = {
+            "model": model,  # Specify which AI model to use for generating responses
+            "messages": messages,  # Provide the full message history including system and user inputs
+            "stream": True,  # Enable streaming mode to receive partial response chunks progressively
+            "temperature": temperature,  # Control randomness in token sampling for response diversity
+            "top_k": top_k,  # Restrict token selection to top-k most probable tokens
+            "min_p": min_p,  # Set minimum probability threshold to filter out unlikely tokens
+            "top_p": top_p,  # Use nucleus sampling with cumulative probability cutoff
+            "repetition_penalty": repetition_penalty,  # Penalize repeated tokens to reduce redundancy
         }
+        # Attempt to stream the response using the primary HTTP transport method (httpx)
         try:
+            async for chunk in httpx_transport(host, headers, payload, mode):  # Stream response chunks asynchronously
+                yield chunk  # Yield each chunk to the caller as it arrives for real-time processing
+            return  # Exit the function if streaming completes successfully without errors
+        # Handle HTTP errors with status codes that indicate retryable failures
+        except httpx.HTTPStatusError as e:  # Catch HTTP errors specific to httpx transport
+            if e.response.status_code == error:  # If error code matches retryable error, try next server
+                continue  # Skip current server and proceed to next iteration to retry
             else:
+                mark(server)  # Mark the current server as failed for non-retryable errors
+        # Handle any other unexpected exceptions during httpx transport
+        except Exception:  # Catch all other exceptions to prevent crashing
+            mark(server)  # Mark server as failed due to unexpected error
+        # If the primary transport fails, attempt to stream response using fallback transport (aiohttp)
+        try:
+            async for chunk in aiohttp_transport(host, headers, payload, mode):  # Use fallback streaming method
+                yield chunk  # Yield streamed chunks to caller as they arrive
+            return  # Exit if fallback transport succeeds
+        # Handle aiohttp-specific response errors with retryable status codes
+        except aiohttp.ClientResponseError as e:  # Catch HTTP response errors from aiohttp transport
+            if e.status == error:  # Retry on matching error code by trying next server
+                continue  # Continue to next server attempt
             else:
+                mark(server)  # Mark server as failed for non-retryable errors
+        # Handle any other exceptions during aiohttp transport
+        except Exception:  # Catch generic exceptions to avoid crashing
+            mark(server)  # Mark fallback server as failed
+    # If all servers have been tried and failed, yield a user-friendly error message
+    yield "The server is currently busy. Please wait a moment or try again later"  # Inform user of service unavailability
+    return  # End the function after exhausting all servers

src/core/transport/__init__.py ADDED Viewed

File without changes

src/core/transport/aiohttp.py ADDED Viewed

	@@ -0,0 +1,151 @@

+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+import json  # Import json module to parse JSON formatted strings from server response lines
+import aiohttp  # Import aiohttp library to perform asynchronous HTTP requests and handle streaming responses
+# Import helper functions to add opening and closing reasoning tags around reasoning text
+from src.utils.reasoning import reasoning_tag_open, reasoning_tag_close  # Functions to wrap reasoning with tags
+# Define an asynchronous function to send a POST request and stream the response from the server
+async def aiohttp_transport(host: str, headers: dict, payload: dict, mode: str):
+    """
+    This asynchronous function establishes a streaming HTTP POST connection to the specified server endpoint
+    using the aiohttp library. It sends a JSON payload containing the request parameters and headers, and
+    processes the server's streamed response line by line in real time.
+    The function is designed to handle responses that include two types of data chunks: reasoning text and
+    content text. Reasoning text represents intermediate thought processes or explanations generated by the AI,
+    while content text represents the final output or answer.
+    The function maintains several internal state variables to manage the streaming process:
+    - 'reasoning' accumulates the reasoning text segments as they arrive incrementally from the server.
+    - 'reasoning_tag' is a boolean flag that ensures the opening reasoning tag (<think>) is inserted only once.
+    - 'reasoning_check' is used to detect if the reasoning field is present in the initial streamed data chunk,
+      which determines whether reasoning processing should occur.
+    - 'reasoning_done' indicates when the reasoning phase has completed and the function should switch to
+      accumulating content text.
+    - 'content' accumulates the main content text after reasoning finishes.
+    The function reads the response stream asynchronously, decoding each line from bytes to UTF-8 strings,
+    and filters out any lines that do not start with the expected "data:" prefix. For valid data lines, it
+    parses the JSON payload to extract incremental updates contained within the 'delta' field of the first
+    choice in the response.
+    Upon detecting reasoning text in the delta, and if the current mode allows reasoning output (i.e., mode is
+    not "/no_think"), the function inserts an opening <think> tag once and appends subsequent reasoning chunks,
+    carefully removing any duplicate tags to maintain clean formatting. It yields these reasoning segments
+    progressively to the caller, enabling real-time display of the AI's intermediate thoughts.
+    When the response transitions from reasoning to content (indicated by the presence of 'content' in the delta),
+    the function closes the reasoning block with a closing </think> tag if it was opened, yields the final reasoning
+    block, and then begins accumulating and yielding content chunks. An empty string is yielded as a separator
+    between reasoning and content for clarity.
+    If reasoning is absent, completed, or disabled by mode, the function directly accumulates and yields content
+    chunks as they arrive.
+    The function includes robust error handling to gracefully skip over any malformed JSON chunks or transient
+    connection issues without interrupting the streaming process. This ensures continuous and reliable streaming
+    of AI responses even in the face of occasional data irregularities.
+    Overall, this function provides a comprehensive and efficient mechanism to stream, parse, and yield AI-generated
+    reasoning and content in real time, supporting interactive and dynamic user experiences.
+    """
+    # Initialize an empty string to accumulate streamed reasoning text segments from the response
+    reasoning = ""  # This will hold the reasoning text as it is received incrementally
+    # Boolean flag to track if the opening <think> tag has been inserted to avoid duplicates
+    reasoning_tag = False  # Ensures the reasoning opening tag is added only once
+    # Variable to check presence of reasoning field in the first chunk of streamed data
+    reasoning_check = None  # Used to determine if reasoning should be processed for this response
+    # Flag to indicate that reasoning section has finished and content streaming should start
+    reasoning_done = False  # Marks when reasoning is complete and content output begins
+    # Initialize an empty string to accumulate the main content text from the response
+    content = ""  # Will hold the actual content output after reasoning is finished
+    # Create an aiohttp client session with no timeout to allow indefinite streaming
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=None)) as session:
+        # Send a POST request to the given host with specified headers and JSON payload
+        async with session.post(host, headers=headers, json=payload) as resp:
+            resp.raise_for_status()  # Raise an exception if HTTP response status is not successful (2xx)
+            # Iterate asynchronously over each line of bytes in the streamed response content
+            async for line_bytes in resp.content:
+                line = line_bytes.decode("utf-8").strip()  # Decode bytes to UTF-8 string and strip whitespace
+                # Skip processing for lines that do not start with the expected "data:" prefix
+                if not line.startswith("data:"):
+                    continue  # Ignore lines without data prefix and continue to next streamed line
+                try:
+                    # Parse the JSON object from the line after removing the "data:" prefix
+                    data = json.loads(line[5:])  # Convert JSON string to Python dictionary
+                    # Extract the 'delta' dictionary which contains incremental update fields
+                    choice = data["choices"][0]["delta"]  # Access the partial update from the streamed response
+                    # Perform a one-time check on the first chunk to detect if reasoning field exists and is non-empty
+                    if reasoning_check is None:  # Only check once on the initial chunk received
+                        # Set reasoning_check to empty string if reasoning key exists and has content, else None
+                        reasoning_check = "" if ("reasoning" in choice and choice["reasoning"]) else None
+                    # If reasoning is present, mode allows thinking, reasoning not done, and reasoning text exists
+                    if (
+                        reasoning_check == ""  # Reasoning field detected in first chunk
+                        and mode != "/no_think"  # Mode does not disable reasoning output
+                        and not reasoning_done  # Reasoning section is still in progress
+                        and "reasoning" in choice  # Current chunk contains reasoning text
+                        and choice["reasoning"]  # Reasoning text is non-empty
+                    ):
+                        # Insert opening reasoning tag once and append the first reasoning chunk
+                        if not reasoning_tag:  # Only add opening tag once at the start of reasoning
+                            reasoning_tag = True  # Mark that opening tag has been inserted
+                            reasoning = reasoning_tag_open(reasoning)  # Add opening <think> tag to reasoning string
+                            reasoning += choice["reasoning"]  # Append initial reasoning text chunk
+                        else:
+                            # Remove any duplicate opening tags and append subsequent reasoning chunks
+                            reasoning_content = choice["reasoning"].replace("<think>", "")  # Clean redundant tags
+                            reasoning += reasoning_content  # Append next reasoning segment to accumulated text
+                        yield reasoning  # Yield the intermediate reasoning text chunk to the caller
+                        continue  # Continue to next streamed line without further processing
+                    # If reasoning is done and content starts arriving, finalize reasoning output
+                    if (
+                        reasoning_check == ""  # Reasoning was detected initially
+                        and mode != "/no_think"  # Mode allows reasoning output
+                        and not reasoning_done  # Reasoning not yet marked as done
+                        and "content" in choice  # Current chunk contains content field
+                        and choice["content"]  # Content text is non-empty
+                    ):
+                        reasoning_done = True  # Mark reasoning section as complete
+                        # If reasoning tag was opened, close it properly before yielding final reasoning block
+                        if reasoning_tag:  # Only close tag if it was previously opened
+                            reasoning = reasoning_tag_close(reasoning)  # Append closing </think> tag
+                            yield reasoning  # Yield the complete reasoning text block
+                        yield ""  # Yield an empty string as a separator between reasoning and content
+                        content += choice["content"]  # Start accumulating content text from this chunk
+                        yield content  # Yield the first chunk of content text to the caller
+                        continue  # Proceed to next line in the stream
+                    # Handle cases where reasoning is absent, finished, or mode disables reasoning, but content is present
+                    if (
+                        (reasoning_check is None or reasoning_done or mode == "/no_think")  # No reasoning or reasoning done or disabled mode
+                        and "content" in choice  # Current chunk contains content field
+                        and choice["content"]  # Content text is non-empty
+                    ):
+                        content += choice["content"]  # Append the content chunk to accumulated content string
+                        yield content  # Yield the updated content string so far
+                # Catch any exceptions from JSON parsing errors or connection issues to prevent stream break
+                except Exception:
+                    continue  # Ignore malformed chunks or transient errors and continue processing next lines

src/core/transport/httpx.py ADDED Viewed

	@@ -0,0 +1,152 @@

+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+import json  # Import json module to decode JSON formatted strings from server responses
+import httpx  # Import httpx library to perform asynchronous HTTP requests with HTTP/2 support
+# Import functions to add opening and closing tags around reasoning text for proper formatting
+from src.utils.reasoning import reasoning_tag_open, reasoning_tag_close  # Functions to wrap reasoning with tags
+# Define asynchronous function to send a POST request and stream the response from the server
+async def httpx_transport(host: str, headers: dict, payload: dict, mode: str):
+    """
+    This asynchronous function establishes a streaming POST request to the specified server endpoint using the httpx library with HTTP/2 support.
+    It is designed to handle incremental server responses that include both reasoning and content parts, which are streamed as separate chunks.
+    The function processes each line of the streamed response, parsing JSON data prefixed by "data:", and yields partial outputs to the caller in real-time.
+    The function maintains internal state to manage the reasoning text separately from the main content. It detects whether the response includes a reasoning section
+    by inspecting the first chunk containing the 'reasoning' field. If reasoning is present and the mode allows it (i.e., mode is not "/no_think"), it wraps the reasoning text
+    within custom tags (<think> ... </think>) to clearly demarcate this part of the output. The opening tag is inserted once at the start of reasoning, and subsequent chunks
+    append reasoning text after cleansing redundant tags.
+    Once the reasoning section is complete and the content part begins, the function closes the reasoning tags properly before yielding the final reasoning block. It then yields
+    an empty string as a separator, followed by the streamed content chunks. If reasoning is absent or disabled, the function directly accumulates and yields content chunks.
+    The function is robust against malformed data or transient connection issues, gracefully skipping any problematic chunks without interrupting the stream. It reads each line
+    as a UTF-8 decoded string, strips whitespace, and only processes lines starting with the "data:" prefix to ensure valid data handling.
+    Parameters:
+    - host (str): The URL of the server endpoint to which the POST request is sent.
+    - headers (dict): HTTP headers to include in the request, such as authorization and content type.
+    - payload (dict): The JSON payload containing the request data, including model, messages, and generation parameters.
+    - mode (str): A string controlling behavior such as enabling or disabling reasoning output (e.g., "/no_think" disables reasoning).
+    Yields:
+    - str: Partial chunks of reasoning or content as they are received from the server, allowing real-time streaming output.
+    Workflow:
+    1. Initializes empty strings and flags to track reasoning text, content text, and reasoning state.
+    2. Opens an asynchronous HTTP client session with HTTP/2 enabled and no timeout to allow indefinite streaming.
+    3. Sends a POST request to the specified host with provided headers and JSON payload, initiating a streaming response.
+    4. Iterates asynchronously over each line of the streamed response.
+       - Skips any lines that do not start with the "data:" prefix to filter valid data chunks.
+       - Parses the JSON content after the "data:" prefix into a Python dictionary.
+       - Extracts the 'delta' field from the first choice in the response, which contains incremental updates.
+    5. On the first chunk, checks if the 'reasoning' field is present and non-empty to determine if reasoning should be processed.
+    6. If reasoning is present and allowed by mode, and reasoning is not yet complete:
+       - Inserts the opening <think> tag once.
+       - Appends reasoning text chunks, removing redundant opening tags if necessary.
+       - Yields the accumulated reasoning text for real-time consumption.
+    7. When reasoning ends and content begins:
+       - Marks reasoning as done.
+       - Closes the reasoning tag properly if it was opened.
+       - Yields the finalized reasoning block.
+       - Yields an empty string as a separator.
+       - Starts accumulating content text and yields the first content chunk.
+    8. If reasoning is absent, finished, or disabled, accumulates and yields content chunks directly.
+    9. Handles any exceptions during parsing or connection by skipping malformed chunks, ensuring the stream continues uninterrupted.
+    This design allows clients to receive partial reasoning and content outputs as they are generated by the server, enabling interactive and responsive user experiences.
+    """
+    # Initialize an empty string to accumulate streamed reasoning text from the response
+    reasoning = ""  # Holds reasoning text segments as they are received incrementally
+    # Boolean flag to track whether the opening <think> tag has been inserted to avoid duplicates
+    reasoning_tag = False  # Ensures the reasoning opening tag is added only once during streaming
+    # Variable to check presence of reasoning field in the first chunk of streamed data
+    reasoning_check = None  # Used to determine if reasoning should be processed for this response
+    # Flag to indicate that reasoning section has finished and content streaming should start
+    reasoning_done = False  # Marks when reasoning is complete and content output begins
+    # Initialize an empty string to accumulate the main content text from the response
+    content = ""  # Will hold the actual content output after reasoning is finished
+    # Create an asynchronous HTTP client session with HTTP/2 enabled and no timeout to allow indefinite streaming
+    async with httpx.AsyncClient(timeout=None, http2=True) as client:  # Establish persistent HTTP/2 connection
+        # Send a POST request to the given host with specified headers and JSON payload, and start streaming response
+        async with client.stream("POST", host, headers=headers, json=payload) as response:  # Initiate streaming POST request
+            # Iterate asynchronously over each line of text in the streamed response content
+            async for chunk in response.aiter_lines():  # Read response line by line as it arrives from the server
+                # Skip processing for lines that do not start with the expected "data:" prefix
+                if not chunk.strip().startswith("data:"):  # Only process lines that contain data payloads
+                    continue  # Ignore non-data lines and continue to next streamed line
+                try:
+                    # Parse the JSON object from the line after removing the "data:" prefix
+                    data = json.loads(chunk[5:])  # Convert JSON string to Python dictionary
+                    # Extract the 'delta' dictionary which contains incremental update fields
+                    choice = data["choices"][0]["delta"]  # Access the partial update from the streamed response
+                    # Perform a one-time check on the first chunk to detect if reasoning field exists and is non-empty
+                    if reasoning_check is None:  # Only check once on the initial chunk received
+                        # Set reasoning_check to empty string if reasoning key exists and has content, else None
+                        reasoning_check = "" if ("reasoning" in choice and choice["reasoning"]) else None
+                    # If reasoning is present, mode allows thinking, reasoning not done, and reasoning text exists
+                    if (
+                        reasoning_check == ""  # Reasoning field detected in first chunk
+                        and mode != "/no_think"  # Mode does not disable reasoning output
+                        and not reasoning_done  # Reasoning section is still in progress
+                        and "reasoning" in choice  # Current chunk contains reasoning text
+                        and choice["reasoning"]  # Reasoning text is non-empty
+                    ):
+                        # Insert opening reasoning tag once and append the first reasoning chunk
+                        if not reasoning_tag:  # Only add opening tag once at the start of reasoning
+                            reasoning_tag = True  # Mark that opening tag has been inserted
+                            reasoning = reasoning_tag_open(reasoning)  # Add opening <think> tag to reasoning string
+                            reasoning += choice["reasoning"]  # Append initial reasoning text chunk
+                        else:
+                            # Remove any duplicate opening tags and append subsequent reasoning chunks
+                            reasoning_content = choice["reasoning"].replace("<think>", "")  # Clean redundant tags
+                            reasoning += reasoning_content  # Append next reasoning segment to accumulated text
+                        yield reasoning  # Yield the intermediate reasoning text chunk to the caller
+                        continue  # Continue to next streamed line without further processing
+                    # If reasoning is done and content starts arriving, finalize reasoning output
+                    if (
+                        reasoning_check == ""  # Reasoning was detected initially
+                        and mode != "/no_think"  # Mode allows reasoning output
+                        and not reasoning_done  # Reasoning not yet marked as done
+                        and "content" in choice  # Current chunk contains content field
+                        and choice["content"]  # Content text is non-empty
+                    ):
+                        reasoning_done = True  # Mark reasoning section as complete
+                        # If reasoning tag was opened, close it properly before yielding final reasoning block
+                        if reasoning_tag:  # Only close tag if it was previously opened
+                            reasoning = reasoning_tag_close(reasoning)  # Append closing </think> tag
+                            yield reasoning  # Yield the complete reasoning text block
+                        yield ""  # Yield an empty string as a separator between reasoning and content
+                        content += choice["content"]  # Start accumulating content text from this chunk
+                        yield content  # Yield the first chunk of content text to the caller
+                        continue  # Proceed to next line in the stream
+                    # Handle cases where reasoning is absent, finished, or mode disables reasoning, but content is present
+                    if (
+                        (reasoning_check is None or reasoning_done or mode == "/no_think")  # No reasoning or reasoning done or disabled mode
+                        and "content" in choice  # Current chunk contains content field
+                        and choice["content"]  # Content text is non-empty
+                    ):
+                        content += choice["content"]  # Append the content chunk to accumulated content string
+                        yield content  # Yield the updated content string so far
+                # Catch any exceptions from JSON parsing errors or connection issues to prevent stream break
+                except Exception:  # Gracefully handle any error encountered during streaming or parsing
+                    continue  # Ignore malformed chunks or transient errors and continue processing next lines

src/ui/interface.py CHANGED Viewed

@@ -102,7 +102,8 @@ def ui():
             reasoning.change(
                 fn=update_parameters,  # Function to call on checkbox state change
                 inputs=[reasoning],  # Input is the reasoning checkbox's current value
-                outputs=[temperature, top_k, min_p, top_p, repetition_penalty]  # Update these sliders with new values
             )
             # Initialize the parameter sliders with values corresponding to the default reasoning checkbox state
@@ -164,6 +165,8 @@ def ui():
                 ["/image Create a cartoon-style image of a man."],
                 ["What day is it today, what's the date, and what time is it?"],
                 ['/audio Say "I am J.A.R.V.I.S.".'],
                 ["Please generate a highly complex code snippet on any topic."],
                 ["Explain about quantum computers."]
             ],
@@ -171,13 +174,15 @@ def ui():
             chatbot=gr.Chatbot(
                 label="J.A.R.V.I.S.",   # Title label displayed above the chat window
                 show_copy_button=True,  # Show a button allowing users to copy chat messages
-                scale=1  # Scale factor for the chatbot UI size
             ),
             multimodal=False, # Disable support for multimodal inputs such as images or audio files
             fill_height=True, # Duplicate from Blocks to Chat Interface
             fill_width=True,  # Duplicate from Blocks to Chat Interface
             head=meta_tags,   # Duplicate from Blocks to Chat Interface
-            show_progress="full" # Progress animation
         )
     # Return the complete Gradio app object for launching or embedding
     return app

             reasoning.change(
                 fn=update_parameters,  # Function to call on checkbox state change
                 inputs=[reasoning],  # Input is the reasoning checkbox's current value
+                outputs=[temperature, top_k, min_p, top_p, repetition_penalty],  # Update these sliders with new values
+                api_name=False # Disable API
             )
             # Initialize the parameter sliders with values corresponding to the default reasoning checkbox state
                 ["/image Create a cartoon-style image of a man."],
                 ["What day is it today, what's the date, and what time is it?"],
                 ['/audio Say "I am J.A.R.V.I.S.".'],
+                ["How can I run you in the terminal without having to download the model?"],
+                ["Do you have an OpenAI-compatible API for your model?"],
                 ["Please generate a highly complex code snippet on any topic."],
                 ["Explain about quantum computers."]
             ],
             chatbot=gr.Chatbot(
                 label="J.A.R.V.I.S.",   # Title label displayed above the chat window
                 show_copy_button=True,  # Show a button allowing users to copy chat messages
+                scale=1,  # Scale factor for the chatbot UI size
+                allow_tags=["think"] # Reasoning tag
             ),
             multimodal=False, # Disable support for multimodal inputs such as images or audio files
             fill_height=True, # Duplicate from Blocks to Chat Interface
             fill_width=True,  # Duplicate from Blocks to Chat Interface
             head=meta_tags,   # Duplicate from Blocks to Chat Interface
+            show_progress="full", # Progress animation
+            api_name="api" # API endpoint
         )
     # Return the complete Gradio app object for launching or embedding
     return app

src/ui/reasoning.py DELETED Viewed

@@ -1,75 +0,0 @@
-#
-# SPDX-FileCopyrightText: Hadad <[email protected]>
-# SPDX-License-Identifier: Apache-2.0
-#
-def styles(reasoning: str, content: str, expanded: bool = False) -> str:
-    """
-    Generate a clean, interactive HTML <details> block that displays reasoning text inside a collapsible container
-    with subtle styling and enhanced user experience, without applying any custom colors or shadows.
-    This function creates a collapsible section using HTML and inline CSS that focuses on simplicity and readability.
-    It avoids setting any custom text or background colors and does not include any box shadow effects,
-    allowing the block to inherit colors and styles from its surrounding environment.
-    The summary header includes a brain emoji represented by its HTML numeric character reference to symbolize reasoning.
-    The collapsible block can be rendered initially expanded or collapsed based on the 'expanded' parameter.
-    The 'content' parameter is unused but kept for compatibility with similar function signatures.
-    Args:
-        reasoning (str): The explanation or reasoning text to be displayed inside the collapsible block.
-                         This text is wrapped in a styled <div> for clear presentation.
-        content (str): An unused parameter retained for compatibility with other functions sharing this signature.
-        expanded (bool): If True, the collapsible block is initially open; if False, it starts closed.
-    Returns:
-        str: A complete HTML snippet string containing a <details> element with inline CSS that styles it as
-             a simple, user-friendly collapsible container. The styling includes padding, rounded corners,
-             smooth transitions on the summary header's text color, and readable font sizing without any color overrides or shadows.
-    """
-    # Determine whether to include the 'open' attribute in the <details> tag.
-    # If 'expanded' is True, the block will be rendered open by default in the browser.
-    open_attr = "open" if expanded else ""
-    # Define the brain emoji using its HTML numeric character reference to ensure consistent display
-    # across different browsers and platforms, avoiding potential encoding issues.
-    emoji = "&#129504;"  # Unicode code point U+1F9E0 representing the 🧠 emoji
-    # Construct and return the full HTML string for the collapsible block.
-    # The <details> element acts as a toggleable container with padding and rounded corners for a modern look.
-    # The inline styles avoid setting explicit colors or shadows, allowing the block to inherit styles from its context.
-    # The <summary> element serves as the clickable header, featuring the brain emoji and the label "Reasoning".
-    # It includes styling for font weight, size, cursor, and smooth color transitions on hover.
-    # The hover effect is implemented using inline JavaScript event handlers that maintain the inherited color,
-    # ensuring no color changes occur but allowing for potential future customization.
-    # The reasoning text is wrapped inside a <div> with spacing and a subtle top border to visually separate it from the summary.
-    # Typography settings improve readability with increased line height and slight letter spacing.
-    # The 'content' parameter is intentionally unused but present to maintain compatibility with similar function signatures.
-    return f"""
-<details {open_attr} style="
-    padding: 16px;                /* Inner spacing for comfortable content layout */
-    border-radius: 12px;         /* Rounded corners for a smooth, friendly appearance */
-    margin: 12px 0;              /* Vertical spacing to separate from adjacent elements */
-    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; /* Modern, readable font stack */
-    transition: box-shadow 0.3s ease-in-out; /* Transition effect retained though no shadow is applied */
-">
-  <summary style="
-    font-weight: 700;            /* Bold text to highlight the summary header */
-    font-size: 14px !important; /* Slightly larger font size for emphasis */
-    cursor: pointer;             /* Cursor changes to pointer to indicate interactivity */
-    user-select: none;           /* Prevents text selection on click for cleaner UX */
-    transition: color 0.25s ease-in-out; /* Smooth transition for color changes on hover */
-  " onmouseover="this.style.color='inherit';" onmouseout="this.style.color='inherit';">
-    {emoji} Reasoning
-  </summary>
-  <div style="
-    margin-top: 12px;            /* Space separating the summary from the content */
-    padding-top: 8px;            /* Additional padding for comfortable reading */
-    border-top: 1.5px solid;     /* Subtle top border to visually separate content */
-    font-size: 11px !important; /* Slightly larger font size for better readability */
-    line-height: 1.7;            /* Increased line height for easy text flow */
-    letter-spacing: 0.02em;      /* Slight letter spacing to enhance legibility */
-  ">
-    {reasoning}
-  </div>
-</details>
-"""

src/utils/instruction.py ADDED Viewed

	@@ -0,0 +1,55 @@

+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+from config import restrictions  # Load predefined restriction settings for instruction building
+# Define a function to set system instructions
+def set_instructions(mode: str, date: str) -> str:
+    """
+    This function constructs a comprehensive system instruction string that integrates several key components needed
+    to guide the behavior of an AI model or system during interaction. It takes two inputs: 'mode' and 'date', and
+    returns a single formatted string that combines these inputs with a predefined set of restrictions loaded from
+    the configuration.
+    The purpose of this instruction string is to provide contextual and operational directives to the AI system.
+    The 'mode' parameter typically represents the current operational mode or state, which may influence how the AI
+    processes inputs or generates outputs. This could include modes such as normal operation, restricted mode, or
+    specialized behavior modes.
+    The 'restrictions' component, imported from the configuration, contains predefined rules, limitations, or guidelines
+    that the AI should adhere to during its operation. These restrictions might include content filters, ethical
+    guidelines, or other constraints necessary to ensure safe and appropriate AI behavior.
+    The 'date' parameter represents the current date or timestamp, providing temporal context that may be relevant
+    for time-sensitive instructions or for logging and auditing purposes.
+    The function formats these three components into a single string with clear separation using multiple newline
+    characters. This spacing improves readability and ensures that each section is distinctly identifiable when the
+    instruction string is parsed or displayed. The resulting string looks like this:
+        <mode>
+        <restrictions>
+        Today: <date>
+    This structured format allows downstream systems or models to easily extract and interpret each part of the
+    instruction, facilitating consistent and context-aware AI responses.
+    Parameters:
+    - mode (str): A string indicating the current operational mode or context for the AI system.
+    - date (str): A string representing the current date or timestamp to provide temporal context.
+    Returns:
+    - str: A formatted instruction string combining the mode, restrictions, and date sections with spacing.
+    Usage:
+    This function is typically called before sending prompts or requests to the AI model to ensure that all necessary
+    contextual information and operational constraints are included in the system instructions.
+    """
+    # Combine mode, restrictions, and date into a formatted instruction block
+    return f"{mode}\n\n\n{restrictions}\n\n\nToday: {date}\n\n\n"
+    # Return the composed string with spacing between sections

src/utils/reasoning.py ADDED Viewed

	@@ -0,0 +1,140 @@

+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+# Define function to keep only the first <think> tag at the beginning of the text
+def reasoning_tag_start(text: str) -> str:
+    """
+    This function ensures that the reasoning text contains exactly one opening <think> tag at the very beginning.
+    It is common for streamed or concatenated reasoning texts to accumulate multiple <think> tags due to incremental
+    appends or repeated insertions. This function cleans the text by removing all occurrences of the <think> tag
+    throughout the entire string, then checks if the original text started with a <think> tag. If it did, it reinserts
+    a single <think> tag at the start to preserve the intended opening marker.
+    The purpose of this function is to normalize the reasoning text so that it has a clean, unambiguous opening tag,
+    which is critical for consistent parsing, rendering, or further processing downstream. By preventing multiple
+    opening tags, it avoids confusion and formatting errors in the final output.
+    Steps:
+    1. Remove all <think> tags from the entire text to eliminate duplicates.
+    2. Check if the original text began with <think>.
+    3. If yes, prepend a single <think> tag to the cleaned text.
+    4. If no, return the cleaned text without any opening tag.
+    Parameters:
+    - text (str): The reasoning text which may contain multiple or misplaced <think> tags.
+    Returns:
+    - str: The reasoning text normalized to have at most one <think> tag at the start.
+    """
+    # Remove all <think> tags from the text
+    reasoning_mode = text.replace("<think>", "")  # Strip all <think> tags throughout the text
+    # Check if the original text started with <think> and reinsert one if so
+    if text.startswith("<think>"):  # Reinsert a single <think> tag at the beginning
+        return "<think>" + reasoning_mode  # Return the cleaned text with one <think> tag at the start
+    else:
+        return reasoning_mode  # Return the cleaned text without any <think> tags
+# Define function to keep only the last </think> tag at the end of the text
+def reasoning_tag_stop(text: str) -> str:
+    """
+    This function ensures that the reasoning text contains exactly one closing </think> tag at the very end.
+    Similar to the opening tag, streamed or concatenated reasoning texts might accumulate multiple closing </think> tags,
+    which can cause parsing or display issues. This function removes all closing </think> tags from the text and then
+    checks if the original text ended with a closing tag. If it did, it appends a single closing </think> tag at the end,
+    preserving the intended closing marker.
+    This normalization is important to maintain a clean and consistent structure in the reasoning text, ensuring that
+    the closing tag is unambiguous and properly positioned for downstream consumers or renderers.
+    Steps:
+    1. Remove all </think> tags from the entire text to eliminate duplicates.
+    2. Check if the original text ended with </think>.
+    3. If yes, append a single </think> tag to the cleaned text.
+    4. If no, return the cleaned text without any closing tag.
+    Parameters:
+    - text (str): The reasoning text which may contain multiple or misplaced </think> tags.
+    Returns:
+    - str: The reasoning text normalized to have at most one </think> tag at the end.
+    """
+    # Remove all </think> tags from the text
+    reasoning_mode = text.replace("</think>", "")  # Strip all </think> tags throughout the text
+    # Check if the original text ended with </think> and reinsert one if so
+    if text.endswith("</think>"):  # Reinsert a single </think> tag at the end
+        return reasoning_mode + "</think>"  # Return the cleaned text with one </think> tag at the end
+    else:
+        return reasoning_mode  # Return the cleaned text without any </think> tags
+# Define function to ensure text starts with exactly one <think> tag
+def reasoning_tag_open(text: str) -> str:
+    """
+    This function guarantees that the reasoning text starts with exactly one opening <think> tag.
+    It first strips any leading whitespace to accurately detect whether the tag is already present.
+    If the tag is missing, it inserts a <think> tag followed by a newline at the very beginning of the text.
+    If the tag is present, it calls reasoning_tag_start to remove any duplicate tags and ensure only one opening tag remains.
+    This function is essential for preparing reasoning text before streaming or output, as it enforces a consistent
+    and clean opening tag structure. The newline after the tag improves readability and formatting when displayed.
+    Steps:
+    1. Strip leading whitespace from the text.
+    2. Check if the text starts with <think>.
+    3. If not, prepend "<think>\n" to the text.
+    4. If yes, clean duplicates using reasoning_tag_start.
+    5. Return the normalized text.
+    Parameters:
+    - text (str): The reasoning text to be normalized.
+    Returns:
+    - str: The reasoning text with exactly one <think> tag at the start.
+    """
+    # Remove leading whitespace for accurate tag checking
+    stripped = text.lstrip()  # Eliminate spaces or newlines from the start
+    # If tag is missing, insert it, else clean up any duplicates
+    if not stripped.startswith("<think>"):  # Check if <think> tag is absent at the beginning
+        text = "<think>\n" + text  # Add <think> tag followed by a newline at the start
+    else:
+        text = reasoning_tag_start(text)  # Remove duplicates if the tag is already present
+    return text  # Return text with one valid <think> tag at the start
+# Define function to ensure text ends with exactly one </think> tag
+def reasoning_tag_close(text: str) -> str:
+    """
+    This function guarantees that the reasoning text ends with exactly one closing </think> tag.
+    It first strips any trailing whitespace to accurately detect whether the tag is already present.
+    If the tag is missing, it appends a newline, the closing </think> tag, and two additional newlines to the end of the text.
+    If the tag is present, it calls reasoning_tag_stop to remove any duplicate closing tags and ensure only one remains.
+    This function is crucial for finalizing reasoning text before output or further processing, ensuring the closing tag
+    is properly placed and that the text formatting remains clean and readable. The added newlines after the closing tag
+    provide spacing for separation from subsequent content.
+    Steps:
+    1. Strip trailing whitespace from the text.
+    2. Check if the text ends with </think>.
+    3. If not, append "\n</think>\n\n" to the text.
+    4. If yes, clean duplicates using reasoning_tag_stop.
+    5. Return the normalized text.
+    Parameters:
+    - text (str): The reasoning text to be normalized.
+    Returns:
+    - str: The reasoning text with exactly one </think> tag at the end.
+    """
+    # Remove trailing whitespace for accurate tag checking
+    stripped = text.rstrip()  # Eliminate spaces or newlines from the end
+    # If tag is missing, append it, else clean up any duplicates
+    if not stripped.endswith("</think>"):  # Check if </think> tag is absent at the end
+        text = text.rstrip() + "\n</think>\n\n"  # Append </think> tag with spacing
+    else:
+        text = reasoning_tag_stop(text)  # Remove duplicates if the tag is already present
+    return text  # Return text with one valid </think> tag at the end

src/utils/session_mapping.py CHANGED Viewed

@@ -9,10 +9,11 @@ from typing import Dict, List  # Import type hints for dictionaries and lists (n
 from config import auth  # Import authentication configuration, likely a list of host dictionaries with credentials
 from src.utils.helper import busy, mark  # Import 'busy' dictionary and 'mark' function to track and update host busy status
-# Dictionary to map session IDs to their assigned host information
-mapping = {}
-def get_host(session_id: str):
     """
     Retrieve or assign a host for the given session ID.
@@ -26,38 +27,48 @@ def get_host(session_id: str):
         Exception: If no available hosts are found to assign.
     Explanation:
-        This function manages host assignment per session. If the session ID already has a host assigned,
-        it returns that host immediately. Otherwise, it filters the list of hosts from 'auth' to find those
-        that are currently not busy or whose busy period has expired (based on the 'busy' dictionary).
-        From the available hosts, it randomly selects one, records the assignment in 'mapping',
-        marks the selected host as busy for one hour, and returns the selected host.
     """
     # Check if the session ID already has an assigned host in the mapping dictionary
-    if session_id in mapping:
-        # Return the previously assigned host for this session
-        return mapping[session_id]
-    # Get the current UTC time to compare against busy timestamps
-    now = datetime.utcnow()
-    # Filter hosts from auth that are either not marked busy or whose busy period has expired
-    connect = [
-        h for h in auth
-        if h["jarvis"] not in busy or busy[h["jarvis"]] <= now
     ]
-    # If no hosts are available after filtering, raise an exception to indicate no hosts can be assigned
-    if not connect:
-        raise Exception("No available hosts to assign.")
-    # Randomly select one host from the list of available hosts
-    selected = random.choice(connect)
-    # Map the session ID to the selected host for future reference
-    mapping[session_id] = selected
-    # Mark the selected host as busy for the next hour to prevent immediate reassignment
-    mark(selected["jarvis"])
-    # Return the selected host dictionary
-    return selected

 from config import auth  # Import authentication configuration, likely a list of host dictionaries with credentials
 from src.utils.helper import busy, mark  # Import 'busy' dictionary and 'mark' function to track and update host busy status
+# Initialize a global dictionary to map session IDs to assigned hosts
+mapping = {}  # Store session_id to host assignment mapping to maintain consistent host allocation per session
+# Define a function to get an available host for a given session, optionally excluding certain hosts
+def get_host(session_id: str, exclude_hosts: List[str] = None) -> dict:
     """
     Retrieve or assign a host for the given session ID.
         Exception: If no available hosts are found to assign.
     Explanation:
+        Retrieve an available host for the specified session ID, ensuring excluded hosts are not assigned.
+        This function maintains a mapping of session IDs to hosts to provide consistent host assignment.
+        It filters out busy hosts and those explicitly excluded, then randomly selects an available host.
+        If no hosts are available, it raises an exception.
     """
+    # If no list of hosts to exclude is provided, initialize it as an empty list
+    if exclude_hosts is None:  # Check if exclude_hosts parameter was omitted or set to None
+        exclude_hosts = []  # Initialize exclude_hosts to an empty list to avoid errors during filtering
     # Check if the session ID already has an assigned host in the mapping dictionary
+    if session_id in mapping:  # Verify if a host was previously assigned to this session
+        assigned_host = mapping[session_id]  # Retrieve the assigned host dictionary for this session
+        # If the assigned host is not in the list of hosts to exclude, return it immediately
+        if assigned_host["jarvis"] not in exclude_hosts:  # Ensure assigned host is allowed for this request
+            return assigned_host  # Return the cached host assignment for session consistency
+        else:
+            # If the assigned host is excluded, remove the mapping to allow reassignment
+            del mapping[session_id]  # Delete the existing session-host mapping to find a new host
+    # Get the current UTC time to compare against host busy status timestamps
+    now = datetime.utcnow()  # Capture current time to filter out hosts that are still busy
+    # Create a list of hosts that are not currently busy and not in the exclude list
+    available_hosts = [
+        h for h in auth  # Iterate over all hosts defined in the authentication configuration
+        if h["jarvis"] not in busy or busy[h["jarvis"]] <= now  # Include hosts not busy or whose busy time has expired
+        if h["jarvis"] not in exclude_hosts  # Exclude hosts specified in the exclude_hosts list
     ]
+    # If no hosts are available after filtering, raise an exception to indicate resource exhaustion
+    if not available_hosts:  # Check if the filtered list of hosts is empty
+        raise Exception("No available hosts to assign.")  # Inform caller that no hosts can be assigned currently
+    # Randomly select one host from the list of available hosts to distribute load evenly
+    selected = random.choice(available_hosts)  # Choose a host at random to avoid bias in host selection
+    # Store the selected host in the mapping dictionary for future requests with the same session ID
+    mapping[session_id] = selected  # Cache the selected host to maintain session affinity
+    # Mark the selected host as busy using the helper function to update its busy status
+    mark(selected["jarvis"])  # Update the busy dictionary to indicate this host is now in use
+    # Return the selected host dictionary to the caller for use in processing the session
+    return selected  # Provide the caller with the assigned host details

src/utils/time.py ADDED Viewed

	@@ -0,0 +1,49 @@

+#
+# SPDX-FileCopyrightText: Hadad <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
+#
+from datetime import datetime  # Import datetime module to work with date and time
+# Define a function to get the current date and time in a specific format
+def get_time() -> str:
+    """
+    This function retrieves the current local date and time and returns it as a human-readable formatted string.
+    It leverages Python's built-in datetime module to obtain the precise moment at which the function is called,
+    ensuring that the timestamp reflects the current system time accurately.
+    The formatting applied to the datetime object is designed to produce a clear and comprehensive representation
+    of the date and time, suitable for display in user interfaces, logging, or as contextual information within
+    system instructions or AI prompts.
+    Specifically, the format string used in strftime produces the following components in order:
+    - %A: Full weekday name (e.g., Monday, Tuesday) to indicate the day of the week explicitly.
+    - %B: Full month name (e.g., January, February) providing the month in a readable form.
+    - %d: Day of the month as a zero-padded decimal number (01 to 31), giving the exact calendar day.
+    - %Y: Four-digit year (e.g., 2025), specifying the calendar year.
+    - %I: Hour (12-hour clock) as a zero-padded decimal number (01 to 12), for conventional time representation.
+    - %M: Minute as a zero-padded decimal number (00 to 59), showing the exact minute.
+    - %p: Locale’s AM or PM designation, clarifying morning or afternoon/evening time.
+    - %Z: Time zone name or abbreviation, providing the timezone context of the timestamp.
+    By combining these elements, the returned string might look like:
+    "Sunday, June 29, 2025, 08:11 PM WIB"
+    This detailed timestamp format is particularly useful in contexts where precise temporal information is necessary,
+    such as generating system instructions that depend on the current date and time, logging events with timestamps,
+    or displaying current time information to users in a clear and localized manner.
+    Returns:
+    - str: A string representing the current date and time formatted with weekday, month, day, year, 12-hour time,
+      AM/PM marker, and timezone abbreviation.
+    Usage:
+    This function can be called whenever the current timestamp is needed in a standardized human-readable format,
+    especially before sending instructions or prompts to AI systems that may require temporal context.
+    """
+    # Get the current date and time and format it using strftime
+    return datetime.now().strftime("%A, %B %d, %Y, %I:%M %p %Z")
+    # Format as full weekday name, month name, day, year,
+    # 12-hour time, AM/PM, and timezone