BoAmps_report_creation

Sleeping

App Files Files Community

soury commited on 14 days ago

Commit

488a9f6

1 Parent(s): e2adf94

code refacto and cleaning

Browse files

Files changed (6) hide show

README.md +92 -9
app.py +129 -124
src/services/{huggingface.py → dataset_upload.py} +0 -0
src/services/form_parser.py +147 -0
src/services/json_generator.py +152 -204
src/services/report_builder.py +273 -0

README.md CHANGED Viewed

@@ -11,20 +11,103 @@ license: apache-2.0
 short_description: Create a report in BoAmps format
 ---
 This tool is part of the initiative [BoAmps](https://github.com/Boavizta/BoAmps).
 The purpose of the BoAmps project is to build a large, open, database of energy consumption of IT / AI tasks depending on data nature, algorithms, hardware, etc., in order to improve energy efficiency approaches based on empiric knowledge.
 This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
-### Development
-Install prerequisites :
-Python >= 3.12
-Pip & Pipenv
-Clone and open the project
-Create a virtual environment: >python -m venv .venv
-Activate it: >.\.venv\Scripts\activate
-Install dependencies:  >pipenv install -d
-Launch the application: pipenv run python main.py

 short_description: Create a report in BoAmps format
 ---
+# BoAmps Report Creation Tool 🌿
 This tool is part of the initiative [BoAmps](https://github.com/Boavizta/BoAmps).
 The purpose of the BoAmps project is to build a large, open, database of energy consumption of IT / AI tasks depending on data nature, algorithms, hardware, etc., in order to improve energy efficiency approaches based on empiric knowledge.
 This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
+## 🚀 Quick Start
+### Prerequisites
+- **Python** >= 3.12
+### Installation Steps
+1. **Clone the repository**
+2. **Create and activate virtual environment (not mandatory)**
+   ```bash
+   # Windows
+   python -m venv .venv
+   .\.venv\Scripts\activate
+   # Linux/MacOS
+   python -m venv .venv
+   source .venv/bin/activate
+   ```
+3. **Install dependencies**
+   ```bash
+   pip install pipenv
+   pipenv install --dev
+   ```
+4. **Launch the application**
+   ```bash
+   python ./app.py
+   ```
+5. **Access the application**
+   - Open your browser and go to `http://localhost:7860`
+   - The Gradio interface will be available for creating BoAmps reports
+## 🏗️ Architecture Overview
+### Core Components
+1. **`app.py`** - Main application file
+   - Initializes the Gradio interface
+   - Orchestrates all UI components
+   - Handles application routing and main logic
+2. **Services Layer (`src/services/`)**
+   - **`json_generator.py`**: Generates BoAmps-compliant JSON reports
+   - **`report_builder.py`**: Constructs structured report data
+   - **`form_parser.py`**: Processes and validates form inputs
+   - **`dataset_upload.py`**: Manages Hugging Face dataset integration
+   - **`util.py`**: Common utility functions
+3. **UI Layer (`src/ui/`)**
+   - **`form_components.py`**: Gradio interface components for different report sections
+4. **Assets & Validation (`assets/`)**
+   - **`validation.py`**: BoAmps schema validation logic
+   - **`app.css`**: Application styling
+### Data Flow
+```
+User Input (Gradio Form)
+    ↓
+Form Parser & Validation
+    ↓
+JSON Generator
+    ↓
+Report Builder
+    ↓
+BoAmps Schema Validation
+    ↓
+JSON Report Output
+```
+## 🤝 Contributing
+Contributions are welcome! Please:
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request
+## 📄 License
+This project is licensed under the Apache 2.0 License - see the license information in the repository header.
+## 🙏 Acknowledgments
+This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
+For more information about the BoAmps initiative, visit the [official repository](https://github.com/Boavizta/BoAmps).

app.py CHANGED Viewed

@@ -1,7 +1,8 @@
 import gradio as gr
 from os import path
-from src.services.huggingface import init_huggingface, update_dataset
 from src.services.json_generator import generate_json
 from src.ui.form_components import (
     create_header_tab,
     create_task_tab,
@@ -19,120 +20,102 @@ init_huggingface()
 def handle_submit(*inputs):
-    # Reconstruct the expected parameters for generate_json
-    # We need to group the flattened dynamic components back into lists
-    idx = 0
-    # Header (11 components)
-    header_params = inputs[idx:idx+11]
-    idx += 11
-    # Task simple (3 components)
-    taskFamily, taskStage, nbRequest = inputs[idx:idx+3]
-    idx += 3
-    # Task algorithms (14 fields × 5 rows = 70 components)
-    algorithm_flat = inputs[idx:idx+70]
-    idx += 70
-    # Reconstruct algorithm lists (14 fields)
-    # The components are organized by field first, then row
-    trainingType = algorithm_flat[0:5]        # positions 0-4
-    algorithmType = algorithm_flat[5:10]      # positions 5-9
-    algorithmName = algorithm_flat[10:15]     # positions 10-14
-    algorithmUri = algorithm_flat[15:20]      # positions 15-19
-    foundationModelName = algorithm_flat[20:25]  # positions 20-24
-    foundationModelUri = algorithm_flat[25:30]  # positions 25-29
-    parametersNumber = algorithm_flat[30:35]   # positions 30-34
-    framework = algorithm_flat[35:40]         # positions 35-39
-    frameworkVersion = algorithm_flat[40:45]  # positions 40-44
-    classPath = algorithm_flat[45:50]         # positions 45-49
-    layersNumber = algorithm_flat[50:55]      # positions 50-54
-    epochsNumber = algorithm_flat[55:60]      # positions 55-59
-    optimizer = algorithm_flat[60:65]         # positions 60-64
-    quantization = algorithm_flat[65:70]      # positions 65-69
-    # Task dataset (9 fields × 5 rows = 45 components)
-    dataset_flat = inputs[idx:idx+45]
-    idx += 45
-    # Reconstruct dataset lists (9 fields)
-    # The components are organized by field first, then row:
-    # dataUsage[0-4], dataType[0-4], dataFormat[0-4], etc.
-    dataUsage = dataset_flat[0:5]      # positions 0-4
-    dataType = dataset_flat[5:10]      # positions 5-9
-    dataFormat = dataset_flat[10:15]   # positions 10-14
-    dataSize = dataset_flat[15:20]     # positions 15-19
-    dataQuantity = dataset_flat[20:25]  # positions 20-24
-    shape = dataset_flat[25:30]        # positions 25-29
-    source = dataset_flat[30:35]       # positions 30-34
-    sourceUri = dataset_flat[35:40]    # positions 35-39
-    owner = dataset_flat[40:45]        # positions 40-44
-    # Task final (3 components)
-    measuredAccuracy, estimatedAccuracy, taskDescription = inputs[idx:idx+3]
-    idx += 3
-    # Measures dynamic section (12 fields × 5 rows = 60 components)
-    measures_flat = inputs[idx:idx+60]
-    idx += 60
-    # Reconstruct measures lists (12 fields)
-    # The components are organized by field first, then row
-    measurementMethod = measures_flat[0:5]        # positions 0-4
-    manufacturer = measures_flat[5:10]            # positions 5-9
-    version = measures_flat[10:15]                # positions 10-14
-    cpuTrackingMode = measures_flat[15:20]        # positions 15-19
-    gpuTrackingMode = measures_flat[20:25]        # positions 20-24
-    averageUtilizationCpu = measures_flat[25:30]  # positions 25-29
-    averageUtilizationGpu = measures_flat[30:35]  # positions 30-34
-    powerCalibrationMeasurement = measures_flat[35:40]  # positions 35-39
-    durationCalibrationMeasurement = measures_flat[40:45]  # positions 40-44
-    powerConsumption = measures_flat[45:50]       # positions 45-49
-    measurementDuration = measures_flat[50:55]    # positions 50-54
-    measurementDateTime = measures_flat[55:60]    # positions 55-59
-    # System (3 components)
-    osystem, distribution, distributionVersion = inputs[idx:idx+3]
-    idx += 3
-    # Software (2 components)
-    language, version_software = inputs[idx:idx+2]
-    idx += 2
-    # Infrastructure simple (4 components)
-    infraType, cloudProvider, cloudInstance, cloudService = inputs[idx:idx+4]
-    idx += 4
-    # Infrastructure components dynamic section (8 fields × 5 rows = 40 components)
-    infra_flat = inputs[idx:idx+40]
-    idx += 40
-    # Reconstruct infrastructure component lists (8 fields)
-    # The components are organized by field first, then row
-    componentName = infra_flat[0:5]         # positions 0-4
-    componentType = infra_flat[5:10]        # positions 5-9
-    nbComponent = infra_flat[10:15]         # positions 10-14
-    memorySize = infra_flat[15:20]          # positions 15-19
-    manufacturer_infra = infra_flat[20:25]  # positions 20-24
-    family = infra_flat[25:30]              # positions 25-29
-    series = infra_flat[30:35]              # positions 30-34
-    share = infra_flat[35:40]               # positions 35-39
-    # Environment (7 components)
-    country, latitude, longitude, location, powerSupplierType, powerSource, powerSourceCarbonIntensity = inputs[
-        idx:idx+7]
-    idx += 7
-    # Quality (1 component)
-    quality = inputs[idx]
-    idx += 1
-    # Continue with other sections - for now, take the remaining as they were
-    remaining_params = inputs[idx:]
-    # Call generate_json with reconstructed parameters
     try:
         message, file_path, json_output = generate_json(
             *header_params,
             taskFamily, taskStage, nbRequest,
@@ -152,6 +135,7 @@ def handle_submit(*inputs):
             powerSupplierType, powerSource, powerSourceCarbonIntensity,
             quality
         )
     except Exception as e:
         return f"Error: {e}", None, "", gr.Button("Share your data to the public repository", interactive=False, elem_classes="pubbutton")
@@ -168,9 +152,16 @@ def handle_submit(*inputs):
 def handle_publi(file_path, json_output):
-    # If validation passed, proceed to update_dataset
-    update_output = update_dataset(file_path, json_output)
-    return update_output
 # Create Gradio interface
@@ -197,21 +188,35 @@ with gr.Blocks(css_paths=css_path) as app:
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=False, elem_classes="pubbutton")
-    # Event Handlers - Flatten all inputs to avoid nested lists
     def flatten_inputs(components):
-        """Recursively flatten nested lists of components"""
-        flattened = []
-        for item in components:
             if isinstance(item, list):
-                flattened.extend(flatten_inputs(item))
             else:
-                flattened.append(item)
-        return flattened
     all_inputs = flatten_inputs(header_components + task_components + measures_components +
                                 system_components + software_components + infrastructure_components +
                                 environment_components + quality_components)
     submit_button.click(
         handle_submit,
         inputs=all_inputs,

 import gradio as gr
 from os import path
+from src.services.dataset_upload import init_huggingface, update_dataset
 from src.services.json_generator import generate_json
+from src.services.form_parser import form_parser
 from src.ui.form_components import (
     create_header_tab,
     create_task_tab,
 def handle_submit(*inputs):
+    """Handle form submission with optimized parsing."""
     try:
+        # Parse inputs using the structured parser
+        parsed_data = form_parser.parse_inputs(inputs)
+        # Extract data for generate_json function
+        header_params = list(parsed_data["header"].values())
+        # Task data
+        task_simple = parsed_data["task_simple"]
+        taskFamily, taskStage, nbRequest = task_simple[
+            "taskFamily"], task_simple["taskStage"], task_simple["nbRequest"]
+        # Dynamic sections - algorithm data
+        algorithms = parsed_data["algorithms"]
+        trainingType = algorithms["trainingType"]
+        algorithmType = algorithms["algorithmType"]
+        algorithmName = algorithms["algorithmName"]
+        algorithmUri = algorithms["algorithmUri"]
+        foundationModelName = algorithms["foundationModelName"]
+        foundationModelUri = algorithms["foundationModelUri"]
+        parametersNumber = algorithms["parametersNumber"]
+        framework = algorithms["framework"]
+        frameworkVersion = algorithms["frameworkVersion"]
+        classPath = algorithms["classPath"]
+        layersNumber = algorithms["layersNumber"]
+        epochsNumber = algorithms["epochsNumber"]
+        optimizer = algorithms["optimizer"]
+        quantization = algorithms["quantization"]
+        # Dynamic sections - dataset data
+        dataset = parsed_data["dataset"]
+        dataUsage = dataset["dataUsage"]
+        dataType = dataset["dataType"]
+        dataFormat = dataset["dataFormat"]
+        dataSize = dataset["dataSize"]
+        dataQuantity = dataset["dataQuantity"]
+        shape = dataset["shape"]
+        source = dataset["source"]
+        sourceUri = dataset["sourceUri"]
+        owner = dataset["owner"]
+        # Task final data
+        task_final = parsed_data["task_final"]
+        measuredAccuracy, estimatedAccuracy, taskDescription = task_final[
+            "measuredAccuracy"], task_final["estimatedAccuracy"], task_final["taskDescription"]
+        # Measures data
+        measures = parsed_data["measures"]
+        measurementMethod = measures["measurementMethod"]
+        manufacturer = measures["manufacturer"]
+        version = measures["version"]
+        cpuTrackingMode = measures["cpuTrackingMode"]
+        gpuTrackingMode = measures["gpuTrackingMode"]
+        averageUtilizationCpu = measures["averageUtilizationCpu"]
+        averageUtilizationGpu = measures["averageUtilizationGpu"]
+        powerCalibrationMeasurement = measures["powerCalibrationMeasurement"]
+        durationCalibrationMeasurement = measures["durationCalibrationMeasurement"]
+        powerConsumption = measures["powerConsumption"]
+        measurementDuration = measures["measurementDuration"]
+        measurementDateTime = measures["measurementDateTime"]
+        # System data
+        system = parsed_data["system"]
+        osystem, distribution, distributionVersion = system[
+            "osystem"], system["distribution"], system["distributionVersion"]
+        # Software data
+        software = parsed_data["software"]
+        language, version_software = software["language"], software["version_software"]
+        # Infrastructure data
+        infra_simple = parsed_data["infrastructure_simple"]
+        infraType, cloudProvider, cloudInstance, cloudService = infra_simple["infraType"], infra_simple[
+            "cloudProvider"], infra_simple["cloudInstance"], infra_simple["cloudService"]
+        # Infrastructure components
+        infra_components = parsed_data["infrastructure_components"]
+        componentName = infra_components["componentName"]
+        componentType = infra_components["componentType"]
+        nbComponent = infra_components["nbComponent"]
+        memorySize = infra_components["memorySize"]
+        manufacturer_infra = infra_components["manufacturer_infra"]
+        family = infra_components["family"]
+        series = infra_components["series"]
+        share = infra_components["share"]
+        # Environment data
+        environment = parsed_data["environment"]
+        country, latitude, longitude, location, powerSupplierType, powerSource, powerSourceCarbonIntensity = environment["country"], environment["latitude"], environment[
+            "longitude"], environment["location"], environment["powerSupplierType"], environment["powerSource"], environment["powerSourceCarbonIntensity"]
+        # Quality data
+        quality = parsed_data["quality"]["quality"]
+        # Call generate_json with structured parameters
         message, file_path, json_output = generate_json(
             *header_params,
             taskFamily, taskStage, nbRequest,
             powerSupplierType, powerSource, powerSourceCarbonIntensity,
             quality
         )
     except Exception as e:
         return f"Error: {e}", None, "", gr.Button("Share your data to the public repository", interactive=False, elem_classes="pubbutton")
 def handle_publi(file_path, json_output):
+    """Handle publication to Hugging Face dataset with improved error handling."""
+    try:
+        if not file_path or not json_output:
+            return "Error: No file or data to publish."
+        # If validation passed, proceed to update_dataset
+        update_output = update_dataset(file_path, json_output)
+        return update_output
+    except Exception as e:
+        return f"Error during publication: {str(e)}"
 # Create Gradio interface
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=False, elem_classes="pubbutton")
+    # Event Handlers - Optimized input flattening
     def flatten_inputs(components):
+        """
+        Recursively flatten nested lists of components with improved performance.
+        Uses iterative approach and generator expressions for better memory efficiency.
+        """
+        result = []
+        stack = list(reversed(components))  # Use stack to avoid recursion
+        while stack:
+            item = stack.pop()
             if isinstance(item, list):
+                # Add items in reverse order to maintain original sequence
+                stack.extend(reversed(item))
             else:
+                result.append(item)
+        return result
     all_inputs = flatten_inputs(header_components + task_components + measures_components +
                                 system_components + software_components + infrastructure_components +
                                 environment_components + quality_components)
+    # Validate input count matches expected structure
+    expected_count = form_parser.get_total_input_count()
+    if len(all_inputs) != expected_count:
+        print(
+            f"Warning: Input count mismatch. Expected {expected_count}, got {len(all_inputs)}")
     submit_button.click(
         handle_submit,
         inputs=all_inputs,

src/services/{huggingface.py → dataset_upload.py} RENAMED Viewed

File without changes

src/services/form_parser.py ADDED Viewed

	@@ -0,0 +1,147 @@

+"""
+Form parser configuration and utilities for handling Gradio form inputs.
+This module provides a centralized way to manage form structure and parsing.
+"""
+from dataclasses import dataclass
+from typing import List, Any, Tuple
+@dataclass
+class FormSection:
+    """Represents a section of the form with its field count."""
+    name: str
+    field_count: int
+    fields: List[str] = None
+@dataclass
+class DynamicSection:
+    """Represents a dynamic section with multiple rows and fields."""
+    name: str
+    fields: List[str]
+    max_rows: int = 5
+    @property
+    def total_components(self) -> int:
+        return len(self.fields) * self.max_rows
+# Form structure configuration
+FORM_STRUCTURE = [
+    FormSection("header", 11, [
+        "licensing", "formatVersion", "formatVersionSpecificationUri", "reportId",
+        "reportDatetime", "reportStatus", "publisher_name", "publisher_division",
+        "publisher_projectName", "publisher_confidentialityLevel", "publisher_publicKey"
+    ]),
+    FormSection("task_simple", 3, [
+        "taskFamily", "taskStage", "nbRequest"
+    ]),
+    DynamicSection("algorithms", [
+        "trainingType", "algorithmType", "algorithmName", "algorithmUri",
+        "foundationModelName", "foundationModelUri", "parametersNumber", "framework",
+        "frameworkVersion", "classPath", "layersNumber", "epochsNumber", "optimizer", "quantization"
+    ]),
+    DynamicSection("dataset", [
+        "dataUsage", "dataType", "dataFormat", "dataSize", "dataQuantity",
+        "shape", "source", "sourceUri", "owner"
+    ]),
+    FormSection("task_final", 3, [
+        "measuredAccuracy", "estimatedAccuracy", "taskDescription"
+    ]),
+    DynamicSection("measures", [
+        "measurementMethod", "manufacturer", "version", "cpuTrackingMode", "gpuTrackingMode",
+        "averageUtilizationCpu", "averageUtilizationGpu", "powerCalibrationMeasurement",
+        "durationCalibrationMeasurement", "powerConsumption", "measurementDuration", "measurementDateTime"
+    ]),
+    FormSection("system", 3, [
+        "osystem", "distribution", "distributionVersion"
+    ]),
+    FormSection("software", 2, [
+        "language", "version_software"
+    ]),
+    FormSection("infrastructure_simple", 4, [
+        "infraType", "cloudProvider", "cloudInstance", "cloudService"
+    ]),
+    DynamicSection("infrastructure_components", [
+        "componentName", "componentType", "nbComponent", "memorySize",
+        "manufacturer_infra", "family", "series", "share"
+    ]),
+    FormSection("environment", 7, [
+        "country", "latitude", "longitude", "location",
+        "powerSupplierType", "powerSource", "powerSourceCarbonIntensity"
+    ]),
+    FormSection("quality", 1, ["quality"])
+]
+class FormParser:
+    """Utility class for parsing form inputs based on the form structure."""
+    def __init__(self):
+        self.structure = FORM_STRUCTURE
+    def parse_inputs(self, inputs: Tuple[Any, ...]) -> dict:
+        """
+        Parse form inputs into a structured dictionary.
+        Args:
+            inputs: Tuple of all form input values
+        Returns:
+            dict: Parsed form data organized by sections
+        """
+        parsed_data = {}
+        idx = 0
+        for section in self.structure:
+            if isinstance(section, FormSection):
+                # Simple section - extract values directly
+                section_data = inputs[idx:idx + section.field_count]
+                if section.fields:
+                    parsed_data[section.name] = dict(
+                        zip(section.fields, section_data))
+                else:
+                    parsed_data[section.name] = section_data
+                idx += section.field_count
+            elif isinstance(section, DynamicSection):
+                # Dynamic section - extract and reshape data
+                flat_data = inputs[idx:idx + section.total_components]
+                idx += section.total_components
+                # Reshape flat data into field-organized lists
+                section_data = {}
+                for field_idx, field_name in enumerate(section.fields):
+                    start_pos = field_idx * section.max_rows
+                    end_pos = start_pos + section.max_rows
+                    section_data[field_name] = flat_data[start_pos:end_pos]
+                parsed_data[section.name] = section_data
+        return parsed_data
+    def get_total_input_count(self) -> int:
+        """Get the total number of expected inputs."""
+        total = 0
+        for section in self.structure:
+            if isinstance(section, FormSection):
+                total += section.field_count
+            elif isinstance(section, DynamicSection):
+                total += section.total_components
+        return total
+# Global parser instance
+form_parser = FormParser()

src/services/json_generator.py CHANGED Viewed

@@ -1,9 +1,8 @@
 import json
 import tempfile
 from datetime import datetime
-import uuid
 from assets.utils.validation import validate_boamps_schema
-import tempfile
 import os
@@ -94,205 +93,154 @@ def generate_json(
     # Quality
     quality
 ):
-    """Generate JSON data from form inputs."""
-    # Process algorithms using the generic function
-    algorithm_fields = {
-        "trainingType": trainingType,
-        "algorithmType": algorithmType,
-        "algorithmName": algorithmName,
-        "algorithmUri": algorithmUri,
-        "foundationModelName": foundationModelName,
-        "foundationModelUri": foundationModelUri,
-        "parametersNumber": parametersNumber,
-        "framework": framework,
-        "frameworkVersion": frameworkVersion,
-        "classPath": classPath,
-        "layersNumber": layersNumber,
-        "epochsNumber": epochsNumber,
-        "optimizer": optimizer,
-        "quantization": quantization
-    }
-    algorithms_list = process_component_list(algorithm_fields)
-    # Process dataset using the generic function
-    dataset_fields = {
-        "dataUsage": dataUsage,
-        "dataType": dataType,
-        "dataFormat": dataFormat,
-        "dataSize": dataSize,
-        "dataQuantity": dataQuantity,
-        "shape": shape,
-        "source": source,
-        "sourceUri": sourceUri,
-        "owner": owner
-    }
-    dataset_list = process_component_list(dataset_fields)
-    # Process measures using the generic function
-    measure_fields = {
-        "measurementMethod": measurementMethod,
-        "manufacturer": manufacturer,
-        "version": version,
-        "cpuTrackingMode": cpuTrackingMode,
-        "gpuTrackingMode": gpuTrackingMode,
-        "averageUtilizationCpu": averageUtilizationCpu,
-        "averageUtilizationGpu": averageUtilizationGpu,
-        "powerCalibrationMeasurement": powerCalibrationMeasurement,
-        "durationCalibrationMeasurement": durationCalibrationMeasurement,
-        "powerConsumption": powerConsumption,
-        "measurementDuration": measurementDuration,
-        "measurementDateTime": measurementDateTime
-    }
-    measures_list = process_component_list(measure_fields)
-    # Process components using the generic function
-    component_fields = {
-        "componentName": componentName,
-        "componentType": componentType,
-        "nbComponent": nbComponent,
-        "memorySize": memorySize,
-        "manufacturer": manufacturer_infra,
-        "family": family,
-        "series": series,
-        "share": share
-    }
-    components_list = process_component_list(component_fields)
-    # process report
-    report = {}
-    # Process header
-    header = {}
-    if licensing:
-        header["licensing"] = licensing
-    if formatVersion:
-        header["formatVersion"] = formatVersion
-    if formatVersionSpecificationUri:
-        header["formatVersionSpecificationUri"] = formatVersionSpecificationUri
-    if reportId:
-        header["reportId"] = reportId
-    if reportDatetime:
-        header["reportDatetime"] = reportDatetime or datetime.now().isoformat()
-    if reportStatus:
-        header["reportStatus"] = reportStatus
-    publisher = {}
-    if publisher_name:
-        publisher["name"] = publisher_name
-    if publisher_division:
-        publisher["division"] = publisher_division
-    if publisher_projectName:
-        publisher["projectName"] = publisher_projectName
-    if publisher_confidentialityLevel:
-        publisher["confidentialityLevel"] = publisher_confidentialityLevel
-    if publisher_publicKey:
-        publisher["publicKey"] = publisher_publicKey
-    if publisher:
-        header["publisher"] = publisher
-    if header:
-        report["header"] = header
-    # proceed task
-    task = {}
-    if taskStage:
-        task["taskStage"] = taskStage
-    if taskFamily:
-        task["taskFamily"] = taskFamily
-    if nbRequest:
-        task["nbRequest"] = nbRequest
-    if algorithms_list:
-        task["algorithms"] = algorithms_list
-    if dataset_list:
-        task["dataset"] = dataset_list
-    if measuredAccuracy:
-        task["measuredAccuracy"] = measuredAccuracy
-    if estimatedAccuracy:
-        task["estimatedAccuracy"] = estimatedAccuracy
-    if taskDescription:
-        task["taskDescription"] = taskDescription
-    report["task"] = task
-    # proceed measures
-    if measures_list:
-        report["measures"] = measures_list
-    # proceed system
-    system = {}
-    if osystem:
-        system["os"] = osystem
-    if distribution:
-        system["distribution"] = distribution
-    if distributionVersion:
-        system["distributionVersion"] = distributionVersion
-    if system:
-        report["system"] = system
-    # proceed software
-    software = {}
-    if language:
-        software["language"] = language
-    if version_software:
-        software["version"] = version_software
-    if software:
-        report["software"] = software
-    # proceed infrastructure
-    infrastructure = {}
-    if infraType:
-        infrastructure["infraType"] = infraType
-    if cloudProvider:
-        infrastructure["cloudProvider"] = cloudProvider
-    if cloudInstance:
-        infrastructure["cloudInstance"] = cloudInstance
-    if cloudService:
-        infrastructure["cloudService"] = cloudService
-    if components_list:
-        infrastructure["components"] = components_list
-    report["infrastructure"] = infrastructure
-    # proceed environment
-    environment = {}
-    if country:
-        environment["country"] = country
-    if latitude:
-        environment["latitude"] = latitude
-    if longitude:
-        environment["longitude"] = longitude
-    if location:
-        environment["location"] = location
-    if powerSupplierType:
-        environment["powerSupplierType"] = powerSupplierType
-    if powerSource:
-        environment["powerSource"] = powerSource
-    if powerSourceCarbonIntensity:
-        environment["powerSourceCarbonIntensity"] = powerSourceCarbonIntensity
-    if environment:
-        report["environment"] = environment
-    # proceed quality
-    if quality:
-        report["quality"] = quality
-    # Validate that the schema follows the BoAmps format and so that the required fields have been completed
-    is_valid, message = validate_boamps_schema(report)
-    if not is_valid:
-        return message, None, ""
-    # Create and save the JSON file
-    filename = f"report_{taskStage}_{taskFamily}_{infraType}_{reportId}.json"
-    filename = filename.replace(" ", "-")
-    # Create the JSON string
-    json_str = json.dumps(report, indent=4, ensure_ascii=False)
-    # Write JSON to a temporary file with the desired filename (not permanent)
-    temp_dir = tempfile.gettempdir()
-    temp_path = os.path.join(temp_dir, filename)
-    with open(temp_path, "w", encoding="utf-8") as tmp:
-        tmp.write(json_str)
-    # Return logical filename, JSON string, and temp file path for upload
-    return message, temp_path, json_str

 import json
 import tempfile
 from datetime import datetime
 from assets.utils.validation import validate_boamps_schema
+from src.services.report_builder import ReportBuilder
 import os
     # Quality
     quality
 ):
+    """Generate JSON data from form inputs using optimized ReportBuilder."""
+    try:
+        # Use ReportBuilder for cleaner, more maintainable code
+        builder = ReportBuilder()
+        # Build header section
+        header_data = {
+            "licensing": licensing,
+            "formatVersion": formatVersion,
+            "formatVersionSpecificationUri": formatVersionSpecificationUri,
+            "reportId": reportId,
+            "reportDatetime": reportDatetime or datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+            "reportStatus": reportStatus,
+            "publisher_name": publisher_name,
+            "publisher_division": publisher_division,
+            "publisher_projectName": publisher_projectName,
+            "publisher_confidentialityLevel": publisher_confidentialityLevel,
+            "publisher_publicKey": publisher_publicKey
+        }
+        builder.add_header(header_data)
+        # Build task section
+        task_data = {
+            "taskStage": taskStage,
+            "taskFamily": taskFamily,
+            "nbRequest": nbRequest,
+            "measuredAccuracy": measuredAccuracy,
+            "estimatedAccuracy": estimatedAccuracy,
+            "taskDescription": taskDescription,
+            "algorithms": {
+                "trainingType": trainingType,
+                "algorithmType": algorithmType,
+                "algorithmName": algorithmName,
+                "algorithmUri": algorithmUri,
+                "foundationModelName": foundationModelName,
+                "foundationModelUri": foundationModelUri,
+                "parametersNumber": parametersNumber,
+                "framework": framework,
+                "frameworkVersion": frameworkVersion,
+                "classPath": classPath,
+                "layersNumber": layersNumber,
+                "epochsNumber": epochsNumber,
+                "optimizer": optimizer,
+                "quantization": quantization
+            },
+            "dataset": {
+                "dataUsage": dataUsage,
+                "dataType": dataType,
+                "dataFormat": dataFormat,
+                "dataSize": dataSize,
+                "dataQuantity": dataQuantity,
+                "shape": shape,
+                "source": source,
+                "sourceUri": sourceUri,
+                "owner": owner
+            }
+        }
+        builder.add_task(task_data)
+        # Build measures section
+        measures_data = {
+            "measurementMethod": measurementMethod,
+            "manufacturer": manufacturer,
+            "version": version,
+            "cpuTrackingMode": cpuTrackingMode,
+            "gpuTrackingMode": gpuTrackingMode,
+            "averageUtilizationCpu": averageUtilizationCpu,
+            "averageUtilizationGpu": averageUtilizationGpu,
+            "powerCalibrationMeasurement": powerCalibrationMeasurement,
+            "durationCalibrationMeasurement": durationCalibrationMeasurement,
+            "powerConsumption": powerConsumption,
+            "measurementDuration": measurementDuration,
+            "measurementDateTime": measurementDateTime
+        }
+        builder.add_measures(measures_data)
+        # Build system section
+        system_data = {
+            "osystem": osystem,
+            "distribution": distribution,
+            "distributionVersion": distributionVersion
+        }
+        builder.add_system(system_data)
+        # Build software section
+        software_data = {
+            "language": language,
+            "version_software": version_software
+        }
+        builder.add_software(software_data)
+        # Build infrastructure section
+        infrastructure_data = {
+            "infraType": infraType,
+            "cloudProvider": cloudProvider,
+            "cloudInstance": cloudInstance,
+            "cloudService": cloudService,
+            "components": {
+                "componentName": componentName,
+                "componentType": componentType,
+                "nbComponent": nbComponent,
+                "memorySize": memorySize,
+                "manufacturer": manufacturer_infra,
+                "family": family,
+                "series": series,
+                "share": share
+            }
+        }
+        builder.add_infrastructure(infrastructure_data)
+        # Build environment section
+        environment_data = {
+            "country": country,
+            "latitude": latitude,
+            "longitude": longitude,
+            "location": location,
+            "powerSupplierType": powerSupplierType,
+            "powerSource": powerSource,
+            "powerSourceCarbonIntensity": powerSourceCarbonIntensity
+        }
+        builder.add_environment(environment_data)
+        # Add quality
+        builder.add_quality(quality)
+        # Build the final report
+        report = builder.build()
+        # Validate that the schema follows the BoAmps format
+        is_valid, message = validate_boamps_schema(report)
+        if not is_valid:
+            return message, None, ""
+        # Create and save the JSON file
+        filename = f"report_{taskStage}_{taskFamily}_{infraType}_{reportId}.json"
+        filename = filename.replace(" ", "-")
+        # Create the JSON string
+        json_str = json.dumps(report, indent=4, ensure_ascii=False)
+        # Write JSON to a temporary file with the desired filename
+        temp_dir = tempfile.gettempdir()
+        temp_path = os.path.join(temp_dir, filename)
+        with open(temp_path, "w", encoding="utf-8") as tmp:
+            tmp.write(json_str)
+        return message, temp_path, json_str
+    except Exception as e:
+        return f"Error generating JSON: {str(e)}", None, ""

src/services/report_builder.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""
+JSON processing utilities for BoAmps report generation.
+Provides optimized functions for data transformation and organization.
+"""
+from typing import Dict, List, Any, Optional
+def create_section_dict(data: Dict[str, Any], required_fields: List[str] = None) -> Dict[str, Any]:
+    """
+    Create a section dictionary, including only non-empty values.
+    Args:
+        data: Dictionary of field values
+        required_fields: List of fields that should always be included if provided
+    Returns:
+        Dictionary with non-empty values only, or empty dict if no meaningful values
+    """
+    section = {}
+    required_fields = required_fields or []
+    for key, value in data.items():
+        # Include only if it's a required field with meaningful value, or if it's meaningful
+        if key in required_fields and is_meaningful_value(value):
+            section[key] = value
+        elif key not in required_fields and is_meaningful_value(value):
+            section[key] = value
+    return section
+def is_meaningful_value(value: Any) -> bool:
+    """
+    Check if a value is meaningful (not empty, not just whitespace).
+    Args:
+        value: Value to check
+    Returns:
+        True if the value is meaningful, False otherwise
+    """
+    if value is None:
+        return False
+    if isinstance(value, str):
+        return value.strip() != ""
+    if isinstance(value, (int, float)):
+        return True
+    if isinstance(value, (list, dict)):
+        return len(value) > 0
+    return bool(value)
+def process_dynamic_component_list(field_data: Dict[str, List[Any]], max_rows: int = 5) -> List[Dict[str, Any]]:
+    """
+    Process dynamic component data into a list of component dictionaries.
+    Optimized version of the original process_component_list function.
+    Args:
+        field_data: Dictionary where keys are field names and values are lists of row values
+        max_rows: Maximum number of rows to process
+    Returns:
+        List of component dictionaries
+    """
+    components = []
+    # Find the actual number of rows with data
+    actual_rows = 0
+    for field_values in field_data.values():
+        if field_values:
+            # Count non-empty values from the end
+            for i in range(len(field_values) - 1, -1, -1):
+                if is_meaningful_value(field_values[i]):
+                    actual_rows = max(actual_rows, i + 1)
+                    break
+    # Create components for rows that have data
+    for row_idx in range(min(actual_rows, max_rows)):
+        component = {}
+        # Add fields that have meaningful values for this row
+        for field_name, field_values in field_data.items():
+            if row_idx < len(field_values) and is_meaningful_value(field_values[row_idx]):
+                component[field_name] = field_values[row_idx]
+        # Only add component if it has at least one field
+        if component:
+            components.append(component)
+    return components
+def create_publisher_section(data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """
+    Create publisher section with proper validation.
+    Args:
+        data: Dictionary containing all header data
+    Returns:
+        Publisher dictionary or None if no publisher data
+    """
+    publisher_fields = {
+        "name": data.get("publisher_name"),
+        "division": data.get("publisher_division"),
+        "projectName": data.get("publisher_projectName"),
+        "confidentialityLevel": data.get("publisher_confidentialityLevel"),
+        "publicKey": data.get("publisher_publicKey")
+    }
+    publisher = create_section_dict(
+        publisher_fields, required_fields=["confidentialityLevel"])
+    return publisher if publisher else None
+class ReportBuilder:
+    """
+    Builder class for creating BoAmps reports with optimized data processing.
+    """
+    def __init__(self):
+        self.report = {}
+    def add_header(self, header_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add header section to the report."""
+        header_fields = {
+            "licensing": header_data.get("licensing"),
+            "formatVersion": header_data.get("formatVersion"),
+            "formatVersionSpecificationUri": header_data.get("formatVersionSpecificationUri"),
+            "reportId": header_data.get("reportId"),
+            "reportDatetime": header_data.get("reportDatetime"),
+            "reportStatus": header_data.get("reportStatus")
+        }
+        header = create_section_dict(header_fields, required_fields=[
+                                     "reportId", "reportDatetime"])
+        # Add publisher if available
+        publisher = create_publisher_section(header_data)
+        if publisher:
+            header["publisher"] = publisher
+        if header:
+            self.report["header"] = header
+        return self
+    def add_task(self, task_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add task section to the report."""
+        task = {}
+        # Simple task fields
+        simple_fields = {
+            "taskStage": task_data.get("taskStage"),
+            "taskFamily": task_data.get("taskFamily"),
+            "nbRequest": task_data.get("nbRequest"),
+            "measuredAccuracy": task_data.get("measuredAccuracy"),
+            "estimatedAccuracy": task_data.get("estimatedAccuracy"),
+            "taskDescription": task_data.get("taskDescription")
+        }
+        task.update(create_section_dict(simple_fields,
+                    required_fields=["taskStage", "taskFamily"]))
+        # Process algorithms
+        if "algorithms" in task_data:
+            algorithms = process_dynamic_component_list(
+                task_data["algorithms"])
+            if algorithms:
+                task["algorithms"] = algorithms
+        # Process dataset
+        if "dataset" in task_data:
+            dataset = process_dynamic_component_list(task_data["dataset"])
+            if dataset:
+                task["dataset"] = dataset
+        self.report["task"] = task
+        return self
+    def add_measures(self, measures_data: Dict[str, List[Any]]) -> 'ReportBuilder':
+        """Add measures section to the report."""
+        measures = process_dynamic_component_list(measures_data)
+        if measures:
+            self.report["measures"] = measures
+        return self
+    def add_system(self, system_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add system section to the report."""
+        system_fields = {
+            "os": system_data.get("osystem"),
+            "distribution": system_data.get("distribution"),
+            "distributionVersion": system_data.get("distributionVersion")
+        }
+        system = create_section_dict(system_fields, required_fields=["os"])
+        # Only add system section if it has meaningful values
+        if system:
+            self.report["system"] = system
+        return self
+    def add_software(self, software_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add software section to the report."""
+        software_fields = {
+            "language": software_data.get("language"),
+            "version": software_data.get("version_software")
+        }
+        software = create_section_dict(
+            software_fields, required_fields=["language"])
+        # Only add software section if it has meaningful values
+        if software:
+            self.report["software"] = software
+        return self
+    def add_infrastructure(self, infra_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add infrastructure section to the report."""
+        infrastructure = {}
+        # Simple infrastructure fields
+        simple_fields = {
+            "infraType": infra_data.get("infraType"),
+            "cloudProvider": infra_data.get("cloudProvider"),
+            "cloudInstance": infra_data.get("cloudInstance"),
+            "cloudService": infra_data.get("cloudService")
+        }
+        # Add simple fields only if they have meaningful values
+        simple_infra = create_section_dict(
+            simple_fields, required_fields=["infraType"])
+        infrastructure.update(simple_infra)
+        # Process components
+        if "components" in infra_data:
+            components = process_dynamic_component_list(
+                infra_data["components"])
+            if components:
+                infrastructure["components"] = components
+        # Only add infrastructure section if it has meaningful content
+        if infrastructure:
+            self.report["infrastructure"] = infrastructure
+        return self
+    def add_environment(self, env_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add environment section to the report."""
+        env_fields = {
+            "country": env_data.get("country"),
+            "latitude": env_data.get("latitude"),
+            "longitude": env_data.get("longitude"),
+            "location": env_data.get("location"),
+            "powerSupplierType": env_data.get("powerSupplierType"),
+            "powerSource": env_data.get("powerSource"),
+            "powerSourceCarbonIntensity": env_data.get("powerSourceCarbonIntensity")
+        }
+        environment = create_section_dict(
+            env_fields, required_fields=["country"])
+        # Only add environment section if it has meaningful values
+        if environment:
+            self.report["environment"] = environment
+        return self
+    def add_quality(self, quality_value: Any) -> 'ReportBuilder':
+        """Add quality field to the report."""
+        if is_meaningful_value(quality_value):
+            self.report["quality"] = quality_value
+        return self
+    def build(self) -> Dict[str, Any]:
+        """Build and return the final report."""
+        return self.report