BoAmps_report_creation

Sleeping

App Files Files Community

soury commited on 15 days ago

Commit

2e85345

verified ·

1 Parent(s): 58f078c

push_data_pr (#2)

Browse files

- push json file to the dataset using a pr (43a2b78f1895422e99a5687323d47b142387c92f)
- crate json file well named (8bd8fa8db0eb8e86ee7889f3f0e49e9951629202)
- crate json file well named (e794a4b709696e06fe07770145a211e4ff5ba05a)
- fix pb of local files generated (beee3d386da0701718608a71ef31563c6f50aed8)
- better handling of dynamic sections to keep in memory fields that are already filled (0662adb28de0eddcf7afcd7ece6722e21d58ea07)
- handle problems with dynamic sections and implement boamps format validator (e2adf94cd8d9918a6463e2187ebb7e34bcc7248d)
- code refacto and cleaning (488a9f6127c0ee4a5ba79aaeb5264812fabffd80)

Files changed (10) hide show

README.md +92 -9
app.py +169 -23
assets/utils/validation.py +69 -28
src/services/dataset_upload.py +37 -0
src/services/form_parser.py +147 -0
src/services/huggingface.py +0 -244
src/services/json_generator.py +212 -190
src/services/report_builder.py +273 -0
src/services/util.py +1 -10
src/ui/form_components.py +128 -98

README.md CHANGED Viewed

@@ -11,20 +11,103 @@ license: apache-2.0
 short_description: Create a report in BoAmps format
 ---
 This tool is part of the initiative [BoAmps](https://github.com/Boavizta/BoAmps).
 The purpose of the BoAmps project is to build a large, open, database of energy consumption of IT / AI tasks depending on data nature, algorithms, hardware, etc., in order to improve energy efficiency approaches based on empiric knowledge.
 This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
-### Development
-Install prerequisites :
-Python >= 3.12
-Pip & Pipenv
-Clone and open the project
-Create a virtual environment: >python -m venv .venv
-Activate it: >.\.venv\Scripts\activate
-Install dependencies:  >pipenv install -d
-Launch the application: pipenv run python main.py

 short_description: Create a report in BoAmps format
 ---
+# BoAmps Report Creation Tool 🌿
 This tool is part of the initiative [BoAmps](https://github.com/Boavizta/BoAmps).
 The purpose of the BoAmps project is to build a large, open, database of energy consumption of IT / AI tasks depending on data nature, algorithms, hardware, etc., in order to improve energy efficiency approaches based on empiric knowledge.
 This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
+## 🚀 Quick Start
+### Prerequisites
+- **Python** >= 3.12
+### Installation Steps
+1. **Clone the repository**
+2. **Create and activate virtual environment (not mandatory)**
+   ```bash
+   # Windows
+   python -m venv .venv
+   .\.venv\Scripts\activate
+   # Linux/MacOS
+   python -m venv .venv
+   source .venv/bin/activate
+   ```
+3. **Install dependencies**
+   ```bash
+   pip install pipenv
+   pipenv install --dev
+   ```
+4. **Launch the application**
+   ```bash
+   python ./app.py
+   ```
+5. **Access the application**
+   - Open your browser and go to `http://localhost:7860`
+   - The Gradio interface will be available for creating BoAmps reports
+## 🏗️ Architecture Overview
+### Core Components
+1. **`app.py`** - Main application file
+   - Initializes the Gradio interface
+   - Orchestrates all UI components
+   - Handles application routing and main logic
+2. **Services Layer (`src/services/`)**
+   - **`json_generator.py`**: Generates BoAmps-compliant JSON reports
+   - **`report_builder.py`**: Constructs structured report data
+   - **`form_parser.py`**: Processes and validates form inputs
+   - **`dataset_upload.py`**: Manages Hugging Face dataset integration
+   - **`util.py`**: Common utility functions
+3. **UI Layer (`src/ui/`)**
+   - **`form_components.py`**: Gradio interface components for different report sections
+4. **Assets & Validation (`assets/`)**
+   - **`validation.py`**: BoAmps schema validation logic
+   - **`app.css`**: Application styling
+### Data Flow
+```
+User Input (Gradio Form)
+    ↓
+Form Parser & Validation
+    ↓
+JSON Generator
+    ↓
+Report Builder
+    ↓
+BoAmps Schema Validation
+    ↓
+JSON Report Output
+```
+## 🤝 Contributing
+Contributions are welcome! Please:
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request
+## 📄 License
+This project is licensed under the Apache 2.0 License - see the license information in the repository header.
+## 🙏 Acknowledgments
+This space was initiated by a group of students from Sud Telecom Paris, many thanks to [Hicham FILALI](https://huggingface.co/FILALIHicham) for his work.
+For more information about the BoAmps initiative, visit the [official repository](https://github.com/Boavizta/BoAmps).

app.py CHANGED Viewed

@@ -1,7 +1,8 @@
 import gradio as gr
 from os import path
-from src.services.huggingface import init_huggingface, update_dataset
 from src.services.json_generator import generate_json
 from src.ui.form_components import (
     create_header_tab,
     create_task_tab,
@@ -19,22 +20,148 @@ init_huggingface()
 def handle_submit(*inputs):
-    message, file_output, json_output = generate_json(*inputs)
     # Check if the message indicates validation failure
-    if message.startswith("The following fields are required"):
-        return message, file_output, json_output
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=True, elem_classes="pubbutton")
-    return "Report sucessefully created", file_output, json_output, publish_button
-def handle_publi(json_output):
-    # If validation passed, proceed to update_dataset
-    update_output = update_dataset(json_output)
-    return update_output
 # Create Gradio interface
@@ -57,30 +184,49 @@ with gr.Blocks(css_paths=css_path) as app:
     submit_button = gr.Button("Submit", elem_classes="subbutton")
     output = gr.Textbox(label="Output", lines=1)
     json_output = gr.Textbox(visible=False)
-    file_output = gr.File(label="Downloadable JSON")
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=False, elem_classes="pubbutton")
-    # Event Handlers
     submit_button.click(
         handle_submit,
-        inputs=[
-            *header_components,
-            *task_components,
-            *measures_components,
-            *system_components,
-            *software_components,
-            *infrastructure_components,
-            *environment_components,
-            *quality_components,
-        ],
-        outputs=[output, file_output, json_output, publish_button]
     )
     # Event Handlers
     publish_button.click(
         handle_publi,
         inputs=[
-            json_output
         ],
         outputs=[output]
     )

 import gradio as gr
 from os import path
+from src.services.dataset_upload import init_huggingface, update_dataset
 from src.services.json_generator import generate_json
+from src.services.form_parser import form_parser
 from src.ui.form_components import (
     create_header_tab,
     create_task_tab,
 def handle_submit(*inputs):
+    """Handle form submission with optimized parsing."""
+    try:
+        # Parse inputs using the structured parser
+        parsed_data = form_parser.parse_inputs(inputs)
+        # Extract data for generate_json function
+        header_params = list(parsed_data["header"].values())
+        # Task data
+        task_simple = parsed_data["task_simple"]
+        taskFamily, taskStage, nbRequest = task_simple[
+            "taskFamily"], task_simple["taskStage"], task_simple["nbRequest"]
+        # Dynamic sections - algorithm data
+        algorithms = parsed_data["algorithms"]
+        trainingType = algorithms["trainingType"]
+        algorithmType = algorithms["algorithmType"]
+        algorithmName = algorithms["algorithmName"]
+        algorithmUri = algorithms["algorithmUri"]
+        foundationModelName = algorithms["foundationModelName"]
+        foundationModelUri = algorithms["foundationModelUri"]
+        parametersNumber = algorithms["parametersNumber"]
+        framework = algorithms["framework"]
+        frameworkVersion = algorithms["frameworkVersion"]
+        classPath = algorithms["classPath"]
+        layersNumber = algorithms["layersNumber"]
+        epochsNumber = algorithms["epochsNumber"]
+        optimizer = algorithms["optimizer"]
+        quantization = algorithms["quantization"]
+        # Dynamic sections - dataset data
+        dataset = parsed_data["dataset"]
+        dataUsage = dataset["dataUsage"]
+        dataType = dataset["dataType"]
+        dataFormat = dataset["dataFormat"]
+        dataSize = dataset["dataSize"]
+        dataQuantity = dataset["dataQuantity"]
+        shape = dataset["shape"]
+        source = dataset["source"]
+        sourceUri = dataset["sourceUri"]
+        owner = dataset["owner"]
+        # Task final data
+        task_final = parsed_data["task_final"]
+        measuredAccuracy, estimatedAccuracy, taskDescription = task_final[
+            "measuredAccuracy"], task_final["estimatedAccuracy"], task_final["taskDescription"]
+        # Measures data
+        measures = parsed_data["measures"]
+        measurementMethod = measures["measurementMethod"]
+        manufacturer = measures["manufacturer"]
+        version = measures["version"]
+        cpuTrackingMode = measures["cpuTrackingMode"]
+        gpuTrackingMode = measures["gpuTrackingMode"]
+        averageUtilizationCpu = measures["averageUtilizationCpu"]
+        averageUtilizationGpu = measures["averageUtilizationGpu"]
+        powerCalibrationMeasurement = measures["powerCalibrationMeasurement"]
+        durationCalibrationMeasurement = measures["durationCalibrationMeasurement"]
+        powerConsumption = measures["powerConsumption"]
+        measurementDuration = measures["measurementDuration"]
+        measurementDateTime = measures["measurementDateTime"]
+        # System data
+        system = parsed_data["system"]
+        osystem, distribution, distributionVersion = system[
+            "osystem"], system["distribution"], system["distributionVersion"]
+        # Software data
+        software = parsed_data["software"]
+        language, version_software = software["language"], software["version_software"]
+        # Infrastructure data
+        infra_simple = parsed_data["infrastructure_simple"]
+        infraType, cloudProvider, cloudInstance, cloudService = infra_simple["infraType"], infra_simple[
+            "cloudProvider"], infra_simple["cloudInstance"], infra_simple["cloudService"]
+        # Infrastructure components
+        infra_components = parsed_data["infrastructure_components"]
+        componentName = infra_components["componentName"]
+        componentType = infra_components["componentType"]
+        nbComponent = infra_components["nbComponent"]
+        memorySize = infra_components["memorySize"]
+        manufacturer_infra = infra_components["manufacturer_infra"]
+        family = infra_components["family"]
+        series = infra_components["series"]
+        share = infra_components["share"]
+        # Environment data
+        environment = parsed_data["environment"]
+        country, latitude, longitude, location, powerSupplierType, powerSource, powerSourceCarbonIntensity = environment["country"], environment["latitude"], environment[
+            "longitude"], environment["location"], environment["powerSupplierType"], environment["powerSource"], environment["powerSourceCarbonIntensity"]
+        # Quality data
+        quality = parsed_data["quality"]["quality"]
+        # Call generate_json with structured parameters
+        message, file_path, json_output = generate_json(
+            *header_params,
+            taskFamily, taskStage, nbRequest,
+            trainingType, algorithmType, algorithmName, algorithmUri, foundationModelName, foundationModelUri, parametersNumber, framework, frameworkVersion, classPath, layersNumber, epochsNumber, optimizer, quantization,
+            dataUsage, dataType, dataFormat, dataSize, dataQuantity, shape, source, sourceUri, owner,
+            measuredAccuracy, estimatedAccuracy, taskDescription,
+            measurementMethod, manufacturer, version, cpuTrackingMode, gpuTrackingMode,
+            averageUtilizationCpu, averageUtilizationGpu, powerCalibrationMeasurement,
+            durationCalibrationMeasurement, powerConsumption,
+            measurementDuration, measurementDateTime,
+            osystem, distribution, distributionVersion,
+            language, version_software,
+            infraType, cloudProvider, cloudInstance, cloudService, componentName, componentType,
+            nbComponent, memorySize, manufacturer_infra, family,
+            series, share,
+            country, latitude, longitude, location,
+            powerSupplierType, powerSource, powerSourceCarbonIntensity,
+            quality
+        )
+    except Exception as e:
+        return f"Error: {e}", None, "", gr.Button("Share your data to the public repository", interactive=False, elem_classes="pubbutton")
     # Check if the message indicates validation failure
+    if message.startswith("The json file does not correspond"):
+        publish_button = gr.Button(
+            "Share your data to the public repository", interactive=False, elem_classes="pubbutton")
+        return message, file_path, json_output, publish_button
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=True, elem_classes="pubbutton")
+    return "Report sucessefully created", file_path, json_output, publish_button
+def handle_publi(file_path, json_output):
+    """Handle publication to Hugging Face dataset with improved error handling."""
+    try:
+        if not file_path or not json_output:
+            return "Error: No file or data to publish."
+        # If validation passed, proceed to update_dataset
+        update_output = update_dataset(file_path, json_output)
+        return update_output
+    except Exception as e:
+        return f"Error during publication: {str(e)}"
 # Create Gradio interface
     submit_button = gr.Button("Submit", elem_classes="subbutton")
     output = gr.Textbox(label="Output", lines=1)
     json_output = gr.Textbox(visible=False)
+    json_file = gr.File(label="Downloadable JSON")
     publish_button = gr.Button(
         "Share your data to the public repository", interactive=False, elem_classes="pubbutton")
+    # Event Handlers - Optimized input flattening
+    def flatten_inputs(components):
+        """
+        Recursively flatten nested lists of components with improved performance.
+        Uses iterative approach and generator expressions for better memory efficiency.
+        """
+        result = []
+        stack = list(reversed(components))  # Use stack to avoid recursion
+        while stack:
+            item = stack.pop()
+            if isinstance(item, list):
+                # Add items in reverse order to maintain original sequence
+                stack.extend(reversed(item))
+            else:
+                result.append(item)
+        return result
+    all_inputs = flatten_inputs(header_components + task_components + measures_components +
+                                system_components + software_components + infrastructure_components +
+                                environment_components + quality_components)
+    # Validate input count matches expected structure
+    expected_count = form_parser.get_total_input_count()
+    if len(all_inputs) != expected_count:
+        print(
+            f"Warning: Input count mismatch. Expected {expected_count}, got {len(all_inputs)}")
     submit_button.click(
         handle_submit,
+        inputs=all_inputs,
+        outputs=[output, json_file, json_output, publish_button]
     )
     # Event Handlers
     publish_button.click(
         handle_publi,
         inputs=[
+            json_file, json_output
         ],
         outputs=[output]
     )

assets/utils/validation.py CHANGED Viewed

@@ -1,33 +1,74 @@
-from src.services.util import OBLIGATORY_FIELDS
-def validate_obligatory_fields(data):
-    """Validate that all required fields are present in the data."""
-    def find_field(d, field):
-        if field in d:
-            return d[field]
-        for k, v in d.items():
-            if isinstance(v, dict):
-                result = find_field(v, field)
-                if result is not None:
-                    return result
-            elif isinstance(v, list):
-                for item in v:
-                    if isinstance(item, dict):
-                        result = find_field(item, field)
-                        if result is not None:
-                            return result
         return None
-    missing_fields = []
-    for field in OBLIGATORY_FIELDS:
-        # if the field is mandatory, check if it is inside a mandatory section
-        value = find_field(data, field)
-        if not value and value != 0:  # Allow 0 as a valid value
-            missing_fields.append(field)
-    if missing_fields:
-        return False, f"The following fields are required: {', '.join(missing_fields)}"
-    return True, "All required fields are filled."

+import json
+from referencing import Registry, Resource
+from jsonschema import Draft202012Validator
+import requests
+def fetch_json_from_url(url: str):
+    """Fetch JSON content from a GitHub raw URL"""
+    try:
+        response = requests.get(url, timeout=10)
+        response.raise_for_status()
+        return response.json()
+    except (requests.exceptions.RequestException, json.JSONDecodeError) as e:
+        print(f"Error fetching/parsing {url}: {e}")
         return None
+# GitHub URLs for the schemas
+SCHEMA_URLS = {
+    "algorithm": "https://raw.githubusercontent.com/Boavizta/BoAmps/main/model/algorithm_schema.json",
+    "dataset": "https://raw.githubusercontent.com/Boavizta/BoAmps/main/model/dataset_schema.json",
+    "measure": "https://raw.githubusercontent.com/Boavizta/BoAmps/main/model/measure_schema.json",
+    "hardware": "https://raw.githubusercontent.com/Boavizta/BoAmps/main/model/hardware_schema.json",
+    "report": "https://raw.githubusercontent.com/Boavizta/BoAmps/main/model/report_schema.json"
+}
+def load_schemas():
+    """Load all schemas from GitHub URLs"""
+    schemas = {}
+    for name, url in SCHEMA_URLS.items():
+        schemas[name] = fetch_json_from_url(url)
+    return schemas
+def create_registry(schemas):
+    """Create a registry with all sub-schemas"""
+    sub_schema_names = ["algorithm", "dataset", "measure", "hardware"]
+    resources = [
+        (SCHEMA_URLS[name], Resource.from_contents(schemas[name]))
+        for name in sub_schema_names
+    ]
+    return Registry().with_resources(resources)
+# Load schemas once at module import
+_schemas = load_schemas()
+_registry = create_registry(_schemas)
+def validate_boamps_schema(instance):
+    """Validate instance against BoAmps report schema"""
+    # Create validator using pre-loaded schemas and registry
+    validator = Draft202012Validator(_schemas["report"], registry=_registry)
+    # Validate
+    if validator.is_valid(instance):
+        return True, "All required fields are filled & your report has the right format!"
+    # Build error message
+    errors = list(validator.iter_errors(instance))
+    error_lines = [
+        f"The json file does not correspond to the schema, there are {len(errors)} errors:\n",
+        "-" * 50
+    ]
+    for err in errors:
+        error_lines.extend([
+            f"Error on data: {err.json_path}",
+            f" --> {err.message}",
+            "-" * 50
+        ])
+    return False, "\n".join(error_lines)

src/services/dataset_upload.py ADDED Viewed

	@@ -0,0 +1,37 @@

+from huggingface_hub import HfApi, login
+from src.services.util import HF_TOKEN, DATASET_NAME
+import os
+def init_huggingface():
+    """Initialize Hugging Face authentication."""
+    if HF_TOKEN is None:
+        raise ValueError(
+            "Hugging Face token not found in environment variables.")
+    login(token=HF_TOKEN)
+def update_dataset(file_path, json_data):
+    """Update the Hugging Face dataset with new data."""
+    if json_data is None or json_data.startswith("The following fields are required"):
+        return json_data or "No data to submit. Please fill in all required fields."
+    try:
+        # Initialize Hugging Face authentication
+        init_huggingface()
+        api = HfApi()
+        short_filename = os.path.basename(file_path)
+        api.upload_file(
+            path_or_fileobj=file_path,
+            repo_id=DATASET_NAME,
+            path_in_repo=f"data/{short_filename}",
+            repo_type="dataset",
+            commit_message=f"Add new BoAmps report data: {short_filename}",
+            create_pr=True,
+        )
+    except Exception as e:
+        return f"Error updating dataset: {str(e)}"
+    return "Data submitted successfully and dataset updated! Consult the data here: https://huggingface.co/datasets/boavizta/open_data_boamps"

src/services/form_parser.py ADDED Viewed

	@@ -0,0 +1,147 @@

+"""
+Form parser configuration and utilities for handling Gradio form inputs.
+This module provides a centralized way to manage form structure and parsing.
+"""
+from dataclasses import dataclass
+from typing import List, Any, Tuple
+@dataclass
+class FormSection:
+    """Represents a section of the form with its field count."""
+    name: str
+    field_count: int
+    fields: List[str] = None
+@dataclass
+class DynamicSection:
+    """Represents a dynamic section with multiple rows and fields."""
+    name: str
+    fields: List[str]
+    max_rows: int = 5
+    @property
+    def total_components(self) -> int:
+        return len(self.fields) * self.max_rows
+# Form structure configuration
+FORM_STRUCTURE = [
+    FormSection("header", 11, [
+        "licensing", "formatVersion", "formatVersionSpecificationUri", "reportId",
+        "reportDatetime", "reportStatus", "publisher_name", "publisher_division",
+        "publisher_projectName", "publisher_confidentialityLevel", "publisher_publicKey"
+    ]),
+    FormSection("task_simple", 3, [
+        "taskFamily", "taskStage", "nbRequest"
+    ]),
+    DynamicSection("algorithms", [
+        "trainingType", "algorithmType", "algorithmName", "algorithmUri",
+        "foundationModelName", "foundationModelUri", "parametersNumber", "framework",
+        "frameworkVersion", "classPath", "layersNumber", "epochsNumber", "optimizer", "quantization"
+    ]),
+    DynamicSection("dataset", [
+        "dataUsage", "dataType", "dataFormat", "dataSize", "dataQuantity",
+        "shape", "source", "sourceUri", "owner"
+    ]),
+    FormSection("task_final", 3, [
+        "measuredAccuracy", "estimatedAccuracy", "taskDescription"
+    ]),
+    DynamicSection("measures", [
+        "measurementMethod", "manufacturer", "version", "cpuTrackingMode", "gpuTrackingMode",
+        "averageUtilizationCpu", "averageUtilizationGpu", "powerCalibrationMeasurement",
+        "durationCalibrationMeasurement", "powerConsumption", "measurementDuration", "measurementDateTime"
+    ]),
+    FormSection("system", 3, [
+        "osystem", "distribution", "distributionVersion"
+    ]),
+    FormSection("software", 2, [
+        "language", "version_software"
+    ]),
+    FormSection("infrastructure_simple", 4, [
+        "infraType", "cloudProvider", "cloudInstance", "cloudService"
+    ]),
+    DynamicSection("infrastructure_components", [
+        "componentName", "componentType", "nbComponent", "memorySize",
+        "manufacturer_infra", "family", "series", "share"
+    ]),
+    FormSection("environment", 7, [
+        "country", "latitude", "longitude", "location",
+        "powerSupplierType", "powerSource", "powerSourceCarbonIntensity"
+    ]),
+    FormSection("quality", 1, ["quality"])
+]
+class FormParser:
+    """Utility class for parsing form inputs based on the form structure."""
+    def __init__(self):
+        self.structure = FORM_STRUCTURE
+    def parse_inputs(self, inputs: Tuple[Any, ...]) -> dict:
+        """
+        Parse form inputs into a structured dictionary.
+        Args:
+            inputs: Tuple of all form input values
+        Returns:
+            dict: Parsed form data organized by sections
+        """
+        parsed_data = {}
+        idx = 0
+        for section in self.structure:
+            if isinstance(section, FormSection):
+                # Simple section - extract values directly
+                section_data = inputs[idx:idx + section.field_count]
+                if section.fields:
+                    parsed_data[section.name] = dict(
+                        zip(section.fields, section_data))
+                else:
+                    parsed_data[section.name] = section_data
+                idx += section.field_count
+            elif isinstance(section, DynamicSection):
+                # Dynamic section - extract and reshape data
+                flat_data = inputs[idx:idx + section.total_components]
+                idx += section.total_components
+                # Reshape flat data into field-organized lists
+                section_data = {}
+                for field_idx, field_name in enumerate(section.fields):
+                    start_pos = field_idx * section.max_rows
+                    end_pos = start_pos + section.max_rows
+                    section_data[field_name] = flat_data[start_pos:end_pos]
+                parsed_data[section.name] = section_data
+        return parsed_data
+    def get_total_input_count(self) -> int:
+        """Get the total number of expected inputs."""
+        total = 0
+        for section in self.structure:
+            if isinstance(section, FormSection):
+                total += section.field_count
+            elif isinstance(section, DynamicSection):
+                total += section.total_components
+        return total
+# Global parser instance
+form_parser = FormParser()

src/services/huggingface.py DELETED Viewed

@@ -1,244 +0,0 @@
-from huggingface_hub import login
-from datasets import load_dataset, Dataset, concatenate_datasets
-import json
-from src.services.util import HF_TOKEN, DATASET_NAME
-def init_huggingface():
-    """Initialize Hugging Face authentication."""
-    if HF_TOKEN is None:
-        raise ValueError(
-            "Hugging Face token not found in environment variables.")
-    login(token=HF_TOKEN)
-def update_dataset(json_data):
-    """Update the Hugging Face dataset with new data."""
-    if json_data is None or json_data.startswith("The following fields are required"):
-        return json_data or "No data to submit. Please fill in all required fields."
-    try:
-        data = json.loads(json_data)
-    except json.JSONDecodeError:
-        return "Invalid JSON data. Please ensure all required fields are filled correctly."
-    try:
-        dataset = load_dataset(DATASET_NAME, split="train")
-        print(dataset)
-    except:
-        dataset = Dataset.from_dict({})
-    new_data = create_flattened_data(data)
-    new_dataset = Dataset.from_dict(new_data)
-    if len(dataset) > 0:
-        print("dataset intitial")
-        print(dataset)
-        print("data to add ")
-        print(new_dataset)
-        updated_dataset = concatenate_datasets([dataset, new_dataset])
-    else:
-        updated_dataset = new_dataset
-    updated_dataset.push_to_hub(DATASET_NAME)
-    return "Data submitted successfully and dataset updated! Consult the data [here](https://huggingface.co/datasets/boavizta/BoAmps_data)"
-def create_flattened_data(data):
-    """Create a flattened data structure for the algorithms."""
-    # Handle algorithms
-    algorithms = data.get("task", {}).get("algorithms", [])
-    fields = ["trainingType", "algorithmType", "algorithmName", "algorithmUri", "foundationModelName", "foundationModelUri",
-              "parametersNumber", "framework",  "frameworkVersion", "classPath", "layersNumber", "epochsNumber", "optimizer", "quantization"]
-    """Create a flattened data structure for the algorithms."""
-    algorithms_data = {field: "| ".join(str(algo.get(
-        field)) for algo in algorithms if algo.get(field)) or "" for field in fields}
-    trainingType_str = algorithms_data["trainingType"]
-    algorithmType_str = algorithms_data["algorithmType"]
-    algorithmName_str = algorithms_data["algorithmName"]
-    algorithmUri_str = algorithms_data["algorithmUri"]
-    foundationModelName_str = algorithms_data["foundationModelName"]
-    foundationModelUri_str = algorithms_data["foundationModelUri"]
-    parametersNumber_str = algorithms_data["parametersNumber"]
-    framework_str = algorithms_data["framework"]
-    frameworkVersion_str = algorithms_data["frameworkVersion"]
-    classPath_str = algorithms_data["classPath"]
-    layersNumber_str = algorithms_data["layersNumber"]
-    epochsNumber_str = algorithms_data["epochsNumber"]
-    optimizer_str = algorithms_data["optimizer"]
-    quantization_str = algorithms_data["quantization"]
-    """Create a flattened data structure for the dataset."""
-    # Handle dataset
-    dataset = data.get("task", {}).get("dataset", [])
-    fields = ["dataUsage", "dataType", "dataFormat", "dataSize",
-              "dataQuantity", "shape", "source", "sourceUri",  "owner"]
-    """Create a flattened data structure for the dataset."""
-    dataset_data = {field: "| ".join(
-        str(d.get(field)) for d in dataset if d.get(field)) or "" for field in fields}
-    dataUsage_str = dataset_data["dataUsage"]
-    dataType_str = dataset_data["dataType"]
-    dataFormat_str = dataset_data["dataFormat"]
-    dataSize_str = dataset_data["dataSize"]
-    dataQuantity_str = dataset_data["dataQuantity"]
-    shape_str = dataset_data["shape"]
-    source_str = dataset_data["source"]
-    sourceUri_str = dataset_data["sourceUri"]
-    owner_str = dataset_data["owner"]
-    """Create a flattened data structure for the measures."""
-    # Handle measures
-    measures = data.get("measures", [])
-    fields = ["measurementMethod", "manufacturer", "version", "cpuTrackingMode", "gpuTrackingMode", "averageUtilizationCpu", "averageUtilizationGpu",
-              "powerCalibrationMeasurement",  "durationCalibrationMeasurement", "powerConsumption", "measurementDuration", "measurementDateTime"]
-    """Create a flattened data structure for the measures."""
-    measures_data = {field: "| ".join(str(measure.get(
-        field)) for measure in measures if measure.get(field)) or "" for field in fields}
-    measurementMethod_str = measures_data["measurementMethod"]
-    manufacturer_str = measures_data["manufacturer"]
-    version_str = measures_data["version"]
-    cpuTrackingMode_str = measures_data["cpuTrackingMode"]
-    gpuTrackingMode_str = measures_data["gpuTrackingMode"]
-    averageUtilizationCpu_str = measures_data["averageUtilizationCpu"]
-    averageUtilizationGpu_str = measures_data["averageUtilizationGpu"]
-    powerCalibrationMeasurement_str = measures_data["powerCalibrationMeasurement"]
-    durationCalibrationMeasurement_str = measures_data["durationCalibrationMeasurement"]
-    powerConsumption_str = measures_data["powerConsumption"]
-    measurementDuration_str = measures_data["measurementDuration"]
-    measurementDateTime_str = measures_data["measurementDateTime"]
-    # Handle components
-    components = data.get("infrastructure", {}).get("components", [])
-    fields = ["componentName", "componentType", "nbComponent", "memorySize",
-              "manufacturer", "family", "series", "share"]
-    # Generate concatenated strings for each field
-    component_data = {field: "| ".join(str(comp.get(
-        field)) for comp in components if comp.get(field)) or "" for field in fields}
-    componentName_str = component_data["componentName"]
-    componentType_str = component_data["componentType"]
-    nbComponent_str = component_data["nbComponent"]
-    memorySize_str = component_data["memorySize"]
-    manufacturer_infra_str = component_data["manufacturer"]
-    family_str = component_data["family"]
-    series_str = component_data["series"]
-    share_str = component_data["share"]
-    return {
-        # Header
-        "licensing": [data.get("header", {}).get("licensing", "")],
-        "formatVersion": [data.get("header", {}).get("formatVersion", "")],
-        "formatVersionSpecificationUri": [data.get("header", {}).get("formatVersionSpecificationUri", "")],
-        "reportId": [data.get("header", {}).get("reportId", "")],
-        "reportDatetime": [data.get("header", {}).get("reportDatetime", "")],
-        "reportStatus": [data.get("header", {}).get("reportStatus", "")],
-        "publisher_name": [data.get("header", {}).get("publisher", {}).get("name", "")],
-        "publisher_division": [data.get("header", {}).get("publisher", {}).get("division", "")],
-        "publisher_projectName": [data.get("header", {}).get("publisher", {}).get("projectName", "")],
-        "publisher_confidentialityLevel": [data.get("header", {}).get("publisher", {}).get("confidentialityLevel", "")],
-        "publisher_publicKey": [data.get("header", {}).get("publisher", {}).get("publicKey", "")],
-        # Task
-        "taskStage": [data.get("task", {}).get("taskStage", "")],
-        "taskFamily": [data.get("task", {}).get("taskFamily", "")],
-        "nbRequest": [data.get("task", {}).get("nbRequest", "")],
-        # Algorithms
-        "trainingType": [trainingType_str],
-        "algorithmType": [algorithmType_str],
-        "algorithmName": [algorithmName_str],
-        "algorithmUri": [algorithmUri_str],
-        "foundationModelName": [foundationModelName_str],
-        "foundationModelUri": [foundationModelUri_str],
-        "parametersNumber": [parametersNumber_str],
-        "framework": [framework_str],
-        "frameworkVersion": [frameworkVersion_str],
-        "classPath": [classPath_str],
-        "layersNumber": [layersNumber_str],
-        "epochsNumber": [epochsNumber_str],
-        "optimizer": [optimizer_str],
-        "quantization": [quantization_str],
-        # Dataset
-        "dataUsage": [dataUsage_str],
-        "dataType": [dataType_str],
-        "dataFormat": [dataFormat_str],
-        "dataSize": [dataSize_str],
-        "dataQuantity": [dataQuantity_str],
-        "shape": [shape_str],
-        "source": [source_str],
-        "sourceUri": [sourceUri_str],
-        "owner": [owner_str],
-        "measuredAccuracy": [data.get("task", {}).get("measuredAccuracy", "")],
-        "estimatedAccuracy": [data.get("task", {}).get("estimatedAccuracy", "")],
-        "taskDescription": [data.get("task", {}).get("taskDescription", "")],
-        # Measures
-        "measurementMethod": [measurementMethod_str],
-        "manufacturer": [manufacturer_str],
-        "version": [version_str],
-        "cpuTrackingMode": [cpuTrackingMode_str],
-        "gpuTrackingMode": [gpuTrackingMode_str],
-        "averageUtilizationCpu": [averageUtilizationCpu_str],
-        "averageUtilizationGpu": [averageUtilizationGpu_str],
-        "powerCalibrationMeasurement": [powerCalibrationMeasurement_str],
-        "durationCalibrationMeasurement": [durationCalibrationMeasurement_str],
-        "powerConsumption": [powerConsumption_str],
-        "measurementDuration": [measurementDuration_str],
-        "measurementDateTime": [measurementDateTime_str],
-        # System
-        "os": [data.get("system", {}).get("os", "")],
-        "distribution": [data.get("system", {}).get("distribution", "")],
-        "distributionVersion": [data.get("system", {}).get("distributionVersion", "")],
-        # Software
-        "language": [data.get("software", {}).get("language", "")],
-        "version_software": [data.get("software", {}).get("version_software", "")],
-        # Infrastructure
-        "infraType": [data.get("infrastructure", {}).get("infra_type", "")],
-        "cloudProvider": [data.get("infrastructure", {}).get("cloudProvider", "")],
-        "cloudInstance": [data.get("infrastructure", {}).get("cloudInstance", "")],
-        "cloudService": [data.get("infrastructure", {}).get("cloudService", "")],
-        "componentName": [componentName_str],
-        "componentType": [componentType_str],
-        "nbComponent": [nbComponent_str],
-        "memorySize": [memorySize_str],
-        "manufacturer_infra": [manufacturer_infra_str],
-        "family": [family_str],
-        "series": [series_str],
-        "share": [share_str],
-        # Environment
-        "country": [data.get("environment", {}).get("country", "")],
-        "latitude": [data.get("environment", {}).get("latitude", "")],
-        "longitude": [data.get("environment", {}).get("longitude", "")],
-        "location": [data.get("environment", {}).get("location", "")],
-        "powerSupplierType": [data.get("environment", {}).get("powerSupplierType", "")],
-        "powerSource": [data.get("environment", {}).get("powerSource", "")],
-        "powerSourceCarbonIntensity": [data.get("environment", {}).get("powerSourceCarbonIntensity", "")],
-        # Quality
-        "quality": [data.get("quality", "")],
-    }
-"""
-def create_flattened_data(data):
-    out = {}
-    def flatten(x, name=''):
-        if type(x) is dict:
-            for a in x:
-                flatten(x[a], name + a + '_')
-        elif type(x) is list:
-            i = 0
-            for a in x:
-                flatten(a, name + str(i) + '_')
-                i += 1
-        else:
-            out[name[:-1]] = x
-    flatten(data)
-    return out
-"""

src/services/json_generator.py CHANGED Viewed

@@ -1,7 +1,67 @@
 import json
 import tempfile
 from datetime import datetime
-from assets.utils.validation import validate_obligatory_fields
 def generate_json(
@@ -20,7 +80,7 @@ def generate_json(
     durationCalibrationMeasurement, powerConsumption,
     measurementDuration, measurementDateTime,
     # System
-    os, distribution, distributionVersion,
     # Software
     language, version_software,
     # Infrastructure
@@ -33,192 +93,154 @@ def generate_json(
     # Quality
     quality
 ):
-    """Generate JSON data from form inputs."""
-    # Process algorithms
-    algorithms_list = []
-    algorithm_fields = {"trainingType": trainingType, "algorithmType": algorithmType, "algorithmName": algorithmName, "algorithmUri": algorithmUri, "foundationModelName": foundationModelName, "foundationModelUri": foundationModelUri,
-                        "parametersNumber": parametersNumber, "framework": framework,  "frameworkVersion": frameworkVersion, "classPath": classPath, "layersNumber": layersNumber, "epochsNumber": epochsNumber, "optimizer": optimizer, "quantization": quantization}
-    nb_algo = 0
-    # ça ça marche pas
-    for f in algorithm_fields:
-        nb_algo = max(nb_algo, len(algorithm_fields[f]))
-    for i in range(nb_algo):
-        algortithm = {}
-        for f in algorithm_fields:
-            if i < len(algorithm_fields[f]) and algorithm_fields[f][i]:
-                algortithm[f] = algorithm_fields[f][i]
-        algorithms_list.append(algortithm)
-    # Process dataset
-    dataset_list = []
-    dataset_fields = {"dataUsage": dataUsage, "dataType": dataType, "dataFormat": dataFormat, "dataSize": dataSize,
-                      "dataQuantity": dataQuantity, "shape": shape, "source": source, "sourceUri": sourceUri, "owner": owner}
-    nb_data = 0
-    for f in dataset_fields:
-        nb_data = max(nb_data, len(dataset_fields[f]))
-    for i in range(nb_data):
-        data = {}
-        for f in dataset_fields:
-            if i < len(dataset_fields[f]) and dataset_fields[f][i]:
-                data[f] = dataset_fields[f][i]
-        dataset_list.append(data)
-    # Process measures
-    measures_list = []
-    measure_fields = {"measurementMethod": measurementMethod, "manufacturer": manufacturer, "version": version, "cpuTrackingMode": cpuTrackingMode,
-                      "gpuTrackingMode": gpuTrackingMode, "averageUtilizationCpu": averageUtilizationCpu, "averageUtilizationGpu": averageUtilizationGpu,
-                      "powerCalibrationMeasurement": powerCalibrationMeasurement,  "durationCalibrationMeasurement": durationCalibrationMeasurement,
-                      "powerConsumption": powerConsumption, "measurementDuration": measurementDuration, "measurementDateTime": measurementDateTime}
-    nb_measures = 0
-    for f in measure_fields:
-        nb_measures = max(nb_measures, len(measure_fields[f]))
-    for i in range(nb_measures):
-        measure = {}
-        for f in measure_fields:
-            if i < len(measure_fields[f]) and measure_fields[f][i]:
-                measure[f] = measure_fields[f][i]
-        measures_list.append(measure)
-    # Process components
-    components_list = []
-    component_fields = {"componentName": componentName, "componentType": componentType, "nbComponent": nbComponent,
-                        "memorySize": memorySize, "manufacturer_infra": manufacturer_infra, "family": family,
-                        "series": series, "share": share}
-    nb_components = 0
-    for f in component_fields:
-        nb_components = max(nb_components, len(component_fields[f]))
-    for i in range(nb_components):
-        component = {}
-        for f in component_fields:
-            if i < len(component_fields[f]) and component_fields[f][i]:
-                component[f] = component_fields[f][i]
-        components_list.append(component)
-    # process report
-    report = {}
-    # Process header
-    header = {}
-    if licensing:
-        header["licensing"] = licensing
-    if formatVersion:
-        header["formatVersion"] = formatVersion
-    if formatVersionSpecificationUri:
-        header["formatVersionSpecificationUri"] = formatVersionSpecificationUri
-    if reportId:
-        header["reportId"] = reportId
-    if reportDatetime:
-        header["reportDatetime"] = reportDatetime or datetime.now().isoformat()
-    if reportStatus:
-        header["reportStatus"] = reportStatus
-    publisher = {}
-    if publisher_name:
-        publisher["name"] = publisher_name
-    if publisher_division:
-        publisher["division"] = publisher_division
-    if publisher_projectName:
-        publisher["projectName"] = publisher_projectName
-    if publisher_confidentialityLevel:
-        publisher["confidentialityLevel"] = publisher_confidentialityLevel
-    if publisher_publicKey:
-        publisher["publicKey"] = publisher_publicKey
-    if publisher:
-        header["publisher"] = publisher
-    if header:
-        report["header"] = header
-    # proceed task
-    task = {}
-    if taskStage:
-        task["taskStage"] = taskStage
-    if taskFamily:
-        task["taskFamily"] = taskFamily
-    if nbRequest:
-        task["nbRequest"] = nbRequest
-    if algorithms_list:
-        task["algorithms"] = algorithms_list
-    if dataset_list:
-        task["dataset"] = dataset_list
-    if measuredAccuracy:
-        task["measuredAccuracy"] = measuredAccuracy
-    if estimatedAccuracy:
-        task["estimatedAccuracy"] = estimatedAccuracy
-    if taskDescription:
-        task["taskDescription"] = taskDescription
-    report["task"] = task
-    # proceed measures
-    if measures_list:
-        report["measures"] = measures_list
-    # proceed system
-    system = {}
-    if os:
-        system["os"] = os
-    if distribution:
-        system["distribution"] = distribution
-    if distributionVersion:
-        system["distributionVersion"] = distributionVersion
-    if system:
-        report["system"] = system
-    # proceed software
-    software = {}
-    if language:
-        software["language"] = language
-    if version_software:
-        software["version"] = version_software
-    if software:
-        report["software"] = software
-    # proceed infrastructure
-    infrastructure = {}
-    if infraType:
-        infrastructure["infraType"] = infraType
-    if cloudProvider:
-        infrastructure["cloudProvider"] = cloudProvider
-    if cloudInstance:
-        infrastructure["cloudInstance"] = cloudInstance
-    if cloudService:
-        infrastructure["cloudService"] = cloudService
-    if components_list:
-        infrastructure["components"] = components_list
-    report["infrastructure"] = infrastructure
-    # proceed environment
-    environment = {}
-    if country:
-        environment["country"] = country
-    if latitude:
-        environment["latitude"] = latitude
-    if longitude:
-        environment["longitude"] = longitude
-    if location:
-        environment["location"] = location
-    if powerSupplierType:
-        environment["powerSupplierType"] = powerSupplierType
-    if powerSource:
-        environment["powerSource"] = powerSource
-    if powerSourceCarbonIntensity:
-        environment["powerSourceCarbonIntensity"] = powerSourceCarbonIntensity
-    if environment:
-        report["environment"] = environment
-    # proceed quality
-    if quality:
-        report["quality"] = quality
-    # Validate obligatory fields
-    is_valid, message = validate_obligatory_fields(report)
-    if not is_valid:
-        return message, None, ""
         # Create the JSON string
-    json_str = json.dumps(report)
-    print(json_str)
-    # Create and save the JSON file
-    with tempfile.NamedTemporaryFile(mode='w', prefix="report", delete=False, suffix='.json') as file:
-        json.dump(report, file, indent=4)
-        return message, file.name, json_str

 import json
 import tempfile
 from datetime import datetime
+from assets.utils.validation import validate_boamps_schema
+from src.services.report_builder import ReportBuilder
+import os
+def process_component_list(fields_dict):
+    """
+    Fonction générique pour traiter une liste de composants à partir d'un dictionnaire de champs.
+    Args:
+        fields_dict (dict): Dictionnaire où les clés sont les noms des champs
+                           et les valeurs sont des listes de composants Gradio ou des objets gr.State.
+    Returns:
+        list: Liste de dictionnaires représentant les composants.
+    """
+    component_list = []
+    # Extract values from different input types
+    processed_fields = {}
+    for field_name, field_values in fields_dict.items():
+        if hasattr(field_values, 'value'):  # It's a gr.State object
+            processed_fields[field_name] = field_values.value if field_values.value else [
+            ]
+        elif isinstance(field_values, list) and len(field_values) > 0:
+            # It's a list of Gradio components or values
+            values = []
+            for item in field_values:
+                if hasattr(item, '__class__') and 'gradio' in str(item.__class__):
+                    # It's a Gradio component, the value was passed as input to this function
+                    # We need to handle this in the calling function by passing the values directly
+                    values.append(item if item is not None else "")
+                else:
+                    # It's already a value
+                    values.append(item if item is not None else "")
+            processed_fields[field_name] = values
+        else:
+            processed_fields[field_name] = field_values if field_values else []
+    # Trouver le nombre maximum d'éléments parmi tous les champs
+    max_items = 0
+    for field_values in processed_fields.values():
+        if field_values:
+            max_items = max(max_items, len(field_values))
+    # Créer les composants
+    for i in range(max_items):
+        component = {}
+        for field_name, field_values in processed_fields.items():
+            if i < len(field_values):
+                value = field_values[i]
+                # Only add the field if it has a meaningful value (not empty, not just whitespace)
+                if value is not None and str(value).strip() != "":
+                    component[field_name] = value
+        # Add component if it has any field (as requested by user)
+        if component:
+            component_list.append(component)
+    return component_list
 def generate_json(
     durationCalibrationMeasurement, powerConsumption,
     measurementDuration, measurementDateTime,
     # System
+    osystem, distribution, distributionVersion,
     # Software
     language, version_software,
     # Infrastructure
     # Quality
     quality
 ):
+    """Generate JSON data from form inputs using optimized ReportBuilder."""
+    try:
+        # Use ReportBuilder for cleaner, more maintainable code
+        builder = ReportBuilder()
+        # Build header section
+        header_data = {
+            "licensing": licensing,
+            "formatVersion": formatVersion,
+            "formatVersionSpecificationUri": formatVersionSpecificationUri,
+            "reportId": reportId,
+            "reportDatetime": reportDatetime or datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+            "reportStatus": reportStatus,
+            "publisher_name": publisher_name,
+            "publisher_division": publisher_division,
+            "publisher_projectName": publisher_projectName,
+            "publisher_confidentialityLevel": publisher_confidentialityLevel,
+            "publisher_publicKey": publisher_publicKey
+        }
+        builder.add_header(header_data)
+        # Build task section
+        task_data = {
+            "taskStage": taskStage,
+            "taskFamily": taskFamily,
+            "nbRequest": nbRequest,
+            "measuredAccuracy": measuredAccuracy,
+            "estimatedAccuracy": estimatedAccuracy,
+            "taskDescription": taskDescription,
+            "algorithms": {
+                "trainingType": trainingType,
+                "algorithmType": algorithmType,
+                "algorithmName": algorithmName,
+                "algorithmUri": algorithmUri,
+                "foundationModelName": foundationModelName,
+                "foundationModelUri": foundationModelUri,
+                "parametersNumber": parametersNumber,
+                "framework": framework,
+                "frameworkVersion": frameworkVersion,
+                "classPath": classPath,
+                "layersNumber": layersNumber,
+                "epochsNumber": epochsNumber,
+                "optimizer": optimizer,
+                "quantization": quantization
+            },
+            "dataset": {
+                "dataUsage": dataUsage,
+                "dataType": dataType,
+                "dataFormat": dataFormat,
+                "dataSize": dataSize,
+                "dataQuantity": dataQuantity,
+                "shape": shape,
+                "source": source,
+                "sourceUri": sourceUri,
+                "owner": owner
+            }
+        }
+        builder.add_task(task_data)
+        # Build measures section
+        measures_data = {
+            "measurementMethod": measurementMethod,
+            "manufacturer": manufacturer,
+            "version": version,
+            "cpuTrackingMode": cpuTrackingMode,
+            "gpuTrackingMode": gpuTrackingMode,
+            "averageUtilizationCpu": averageUtilizationCpu,
+            "averageUtilizationGpu": averageUtilizationGpu,
+            "powerCalibrationMeasurement": powerCalibrationMeasurement,
+            "durationCalibrationMeasurement": durationCalibrationMeasurement,
+            "powerConsumption": powerConsumption,
+            "measurementDuration": measurementDuration,
+            "measurementDateTime": measurementDateTime
+        }
+        builder.add_measures(measures_data)
+        # Build system section
+        system_data = {
+            "osystem": osystem,
+            "distribution": distribution,
+            "distributionVersion": distributionVersion
+        }
+        builder.add_system(system_data)
+        # Build software section
+        software_data = {
+            "language": language,
+            "version_software": version_software
+        }
+        builder.add_software(software_data)
+        # Build infrastructure section
+        infrastructure_data = {
+            "infraType": infraType,
+            "cloudProvider": cloudProvider,
+            "cloudInstance": cloudInstance,
+            "cloudService": cloudService,
+            "components": {
+                "componentName": componentName,
+                "componentType": componentType,
+                "nbComponent": nbComponent,
+                "memorySize": memorySize,
+                "manufacturer": manufacturer_infra,
+                "family": family,
+                "series": series,
+                "share": share
+            }
+        }
+        builder.add_infrastructure(infrastructure_data)
+        # Build environment section
+        environment_data = {
+            "country": country,
+            "latitude": latitude,
+            "longitude": longitude,
+            "location": location,
+            "powerSupplierType": powerSupplierType,
+            "powerSource": powerSource,
+            "powerSourceCarbonIntensity": powerSourceCarbonIntensity
+        }
+        builder.add_environment(environment_data)
+        # Add quality
+        builder.add_quality(quality)
+        # Build the final report
+        report = builder.build()
+        # Validate that the schema follows the BoAmps format
+        is_valid, message = validate_boamps_schema(report)
+        if not is_valid:
+            return message, None, ""
+        # Create and save the JSON file
+        filename = f"report_{taskStage}_{taskFamily}_{infraType}_{reportId}.json"
+        filename = filename.replace(" ", "-")
         # Create the JSON string
+        json_str = json.dumps(report, indent=4, ensure_ascii=False)
+        # Write JSON to a temporary file with the desired filename
+        temp_dir = tempfile.gettempdir()
+        temp_path = os.path.join(temp_dir, filename)
+        with open(temp_path, "w", encoding="utf-8") as tmp:
+            tmp.write(json_str)
+        return message, temp_path, json_str
+    except Exception as e:
+        return f"Error generating JSON: {str(e)}", None, ""

src/services/report_builder.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""
+JSON processing utilities for BoAmps report generation.
+Provides optimized functions for data transformation and organization.
+"""
+from typing import Dict, List, Any, Optional
+def create_section_dict(data: Dict[str, Any], required_fields: List[str] = None) -> Dict[str, Any]:
+    """
+    Create a section dictionary, including only non-empty values.
+    Args:
+        data: Dictionary of field values
+        required_fields: List of fields that should always be included if provided
+    Returns:
+        Dictionary with non-empty values only, or empty dict if no meaningful values
+    """
+    section = {}
+    required_fields = required_fields or []
+    for key, value in data.items():
+        # Include only if it's a required field with meaningful value, or if it's meaningful
+        if key in required_fields and is_meaningful_value(value):
+            section[key] = value
+        elif key not in required_fields and is_meaningful_value(value):
+            section[key] = value
+    return section
+def is_meaningful_value(value: Any) -> bool:
+    """
+    Check if a value is meaningful (not empty, not just whitespace).
+    Args:
+        value: Value to check
+    Returns:
+        True if the value is meaningful, False otherwise
+    """
+    if value is None:
+        return False
+    if isinstance(value, str):
+        return value.strip() != ""
+    if isinstance(value, (int, float)):
+        return True
+    if isinstance(value, (list, dict)):
+        return len(value) > 0
+    return bool(value)
+def process_dynamic_component_list(field_data: Dict[str, List[Any]], max_rows: int = 5) -> List[Dict[str, Any]]:
+    """
+    Process dynamic component data into a list of component dictionaries.
+    Optimized version of the original process_component_list function.
+    Args:
+        field_data: Dictionary where keys are field names and values are lists of row values
+        max_rows: Maximum number of rows to process
+    Returns:
+        List of component dictionaries
+    """
+    components = []
+    # Find the actual number of rows with data
+    actual_rows = 0
+    for field_values in field_data.values():
+        if field_values:
+            # Count non-empty values from the end
+            for i in range(len(field_values) - 1, -1, -1):
+                if is_meaningful_value(field_values[i]):
+                    actual_rows = max(actual_rows, i + 1)
+                    break
+    # Create components for rows that have data
+    for row_idx in range(min(actual_rows, max_rows)):
+        component = {}
+        # Add fields that have meaningful values for this row
+        for field_name, field_values in field_data.items():
+            if row_idx < len(field_values) and is_meaningful_value(field_values[row_idx]):
+                component[field_name] = field_values[row_idx]
+        # Only add component if it has at least one field
+        if component:
+            components.append(component)
+    return components
+def create_publisher_section(data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """
+    Create publisher section with proper validation.
+    Args:
+        data: Dictionary containing all header data
+    Returns:
+        Publisher dictionary or None if no publisher data
+    """
+    publisher_fields = {
+        "name": data.get("publisher_name"),
+        "division": data.get("publisher_division"),
+        "projectName": data.get("publisher_projectName"),
+        "confidentialityLevel": data.get("publisher_confidentialityLevel"),
+        "publicKey": data.get("publisher_publicKey")
+    }
+    publisher = create_section_dict(
+        publisher_fields, required_fields=["confidentialityLevel"])
+    return publisher if publisher else None
+class ReportBuilder:
+    """
+    Builder class for creating BoAmps reports with optimized data processing.
+    """
+    def __init__(self):
+        self.report = {}
+    def add_header(self, header_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add header section to the report."""
+        header_fields = {
+            "licensing": header_data.get("licensing"),
+            "formatVersion": header_data.get("formatVersion"),
+            "formatVersionSpecificationUri": header_data.get("formatVersionSpecificationUri"),
+            "reportId": header_data.get("reportId"),
+            "reportDatetime": header_data.get("reportDatetime"),
+            "reportStatus": header_data.get("reportStatus")
+        }
+        header = create_section_dict(header_fields, required_fields=[
+                                     "reportId", "reportDatetime"])
+        # Add publisher if available
+        publisher = create_publisher_section(header_data)
+        if publisher:
+            header["publisher"] = publisher
+        if header:
+            self.report["header"] = header
+        return self
+    def add_task(self, task_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add task section to the report."""
+        task = {}
+        # Simple task fields
+        simple_fields = {
+            "taskStage": task_data.get("taskStage"),
+            "taskFamily": task_data.get("taskFamily"),
+            "nbRequest": task_data.get("nbRequest"),
+            "measuredAccuracy": task_data.get("measuredAccuracy"),
+            "estimatedAccuracy": task_data.get("estimatedAccuracy"),
+            "taskDescription": task_data.get("taskDescription")
+        }
+        task.update(create_section_dict(simple_fields,
+                    required_fields=["taskStage", "taskFamily"]))
+        # Process algorithms
+        if "algorithms" in task_data:
+            algorithms = process_dynamic_component_list(
+                task_data["algorithms"])
+            if algorithms:
+                task["algorithms"] = algorithms
+        # Process dataset
+        if "dataset" in task_data:
+            dataset = process_dynamic_component_list(task_data["dataset"])
+            if dataset:
+                task["dataset"] = dataset
+        self.report["task"] = task
+        return self
+    def add_measures(self, measures_data: Dict[str, List[Any]]) -> 'ReportBuilder':
+        """Add measures section to the report."""
+        measures = process_dynamic_component_list(measures_data)
+        if measures:
+            self.report["measures"] = measures
+        return self
+    def add_system(self, system_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add system section to the report."""
+        system_fields = {
+            "os": system_data.get("osystem"),
+            "distribution": system_data.get("distribution"),
+            "distributionVersion": system_data.get("distributionVersion")
+        }
+        system = create_section_dict(system_fields, required_fields=["os"])
+        # Only add system section if it has meaningful values
+        if system:
+            self.report["system"] = system
+        return self
+    def add_software(self, software_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add software section to the report."""
+        software_fields = {
+            "language": software_data.get("language"),
+            "version": software_data.get("version_software")
+        }
+        software = create_section_dict(
+            software_fields, required_fields=["language"])
+        # Only add software section if it has meaningful values
+        if software:
+            self.report["software"] = software
+        return self
+    def add_infrastructure(self, infra_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add infrastructure section to the report."""
+        infrastructure = {}
+        # Simple infrastructure fields
+        simple_fields = {
+            "infraType": infra_data.get("infraType"),
+            "cloudProvider": infra_data.get("cloudProvider"),
+            "cloudInstance": infra_data.get("cloudInstance"),
+            "cloudService": infra_data.get("cloudService")
+        }
+        # Add simple fields only if they have meaningful values
+        simple_infra = create_section_dict(
+            simple_fields, required_fields=["infraType"])
+        infrastructure.update(simple_infra)
+        # Process components
+        if "components" in infra_data:
+            components = process_dynamic_component_list(
+                infra_data["components"])
+            if components:
+                infrastructure["components"] = components
+        # Only add infrastructure section if it has meaningful content
+        if infrastructure:
+            self.report["infrastructure"] = infrastructure
+        return self
+    def add_environment(self, env_data: Dict[str, Any]) -> 'ReportBuilder':
+        """Add environment section to the report."""
+        env_fields = {
+            "country": env_data.get("country"),
+            "latitude": env_data.get("latitude"),
+            "longitude": env_data.get("longitude"),
+            "location": env_data.get("location"),
+            "powerSupplierType": env_data.get("powerSupplierType"),
+            "powerSource": env_data.get("powerSource"),
+            "powerSourceCarbonIntensity": env_data.get("powerSourceCarbonIntensity")
+        }
+        environment = create_section_dict(
+            env_fields, required_fields=["country"])
+        # Only add environment section if it has meaningful values
+        if environment:
+            self.report["environment"] = environment
+        return self
+    def add_quality(self, quality_value: Any) -> 'ReportBuilder':
+        """Add quality field to the report."""
+        if is_meaningful_value(quality_value):
+            self.report["quality"] = quality_value
+        return self
+    def build(self) -> Dict[str, Any]:
+        """Build and return the final report."""
+        return self.report

src/services/util.py CHANGED Viewed

@@ -2,16 +2,7 @@ import os
 # Hugging Face Configuration
 HF_TOKEN = os.environ.get("HF_TOKEN")
-DATASET_NAME = "boavizta/BoAmps_data"
-# Form Field Configurations
-# not used and verified for now
-MANDATORY_SECTIONS = ["task", "measures", "infrastructure"]
-OBLIGATORY_FIELDS = [
-    "taskStage", "taskFamily", "dataUsage", "dataType",
-    "measurementMethod", "powerConsumption", "infraType", "componentType",
-    "nbComponent"
-]
 # Dropdown Options
 REPORT_STATUS_OPTIONS = ["draft", "final", "corrective", "other"]

 # Hugging Face Configuration
 HF_TOKEN = os.environ.get("HF_TOKEN")
+DATASET_NAME = "boavizta/open_data_boamps"
 # Dropdown Options
 REPORT_STATUS_OPTIONS = ["draft", "final", "corrective", "other"]

src/ui/form_components.py CHANGED Viewed

@@ -1,4 +1,6 @@
 import gradio as gr
 from src.services.util import (
     REPORT_STATUS_OPTIONS, CONFIDENTIALITY_LEVELS, DATA_USAGE_OPTIONS, DATA_FORMAT,
     DATA_TYPES, DATA_SOURCE,
@@ -9,126 +11,150 @@ from src.services.util import (
 def create_dynamic_section(section_name, fields_config, initial_count=1, layout="row"):
     """
-    Creates a dynamic section in a Gradio interface where users can add or remove rows of input fields.
     Args:
         section_name (str): The name of the section (e.g., "Algorithms", "Components").
-        fields_config (list): A list of dictionaries defining the configuration for each field in the section.
-                              Each dictionary should include:
-                              - "type": The Gradio component type (e.g., gr.Textbox, gr.Number).
-                              - "label": The label for the field.
-                              - "info": Additional information or tooltip for the field.
-                              - "value" (optional): The default value for the field.
-                              - "kwargs" (optional): Additional keyword arguments for the component.
-                              - "elem_classes" (optional): CSS classes for styling the field.
-        initial_count (int): The initial number of rows to render in the section.
         layout (str): The layout of the fields in each row ("row" or "column").
     Returns:
-        tuple: A tuple containing:
-            - count_state: A Gradio state object tracking the number of rows.
-            - field_states: A list of Gradio state objects, one for each field, to store the values of the fields.
-            - add_btn: The "Add" button component for adding new rows.
     """
-    # State management
-    # Tracks the number of rows in the section.
-    count_state = gr.State(value=initial_count+1)
-    # Stores the values for each field across all rows.
     field_states = [gr.State([]) for _ in fields_config]
-    # A list to store all dynamically generated components.
     all_components = []
-    def update_fields(*states_and_values):
-        """
-        Updates the state of the fields when a value changes.
-        Args:
-            *states_and_values: A combination of the current states and the new values for the fields.
-        Returns:
-            tuple: Updated states for all fields.
-        """
-        # Split states and current values
-        # Extract the current states for each field.
-        states = list(states_and_values[:len(fields_config)])
-        # Extract the new values for the fields.
-        current_values = states_and_values[len(fields_config):-1]
-        index = states_and_values[-1]  # The index of the row being updated.
-        # Update each field's state
-        for field_idx, (state, value) in enumerate(zip(states, current_values)):
-            # Ensure the state list is long enough to accommodate the current index.
-            while len(state) <= index:
-                state.append("")
-            # Update the value at the specified index.
-            state[index] = value if value is not None else ""
-        return tuple(states)
-    @gr.render(inputs=count_state)
-    def render_dynamic_section(count):
-        """
-        Renders the dynamic section with the current number of rows and their states.
-        Args:
-            count (int): The number of rows to render.
-        Returns:
-            list: A list of dynamically generated components for the section.
-        """
-        nonlocal all_components
-        all_components = []  # Reset the list of components for re-rendering.
-        for i in range(count):
-            # Create a row or column layout for the current row of fields.
             with (gr.Row() if layout == "row" else gr.Column()):
-                row_components = []  # Components for the current row.
-                field_refs = []  # References to the current row's components.
                 for field_idx, config in enumerate(fields_config):
-                    # Create a component for the field using its configuration.
                     component = config["type"](
-                        label=f"{config['label']} ({section_name}{i + 1})",
                         info=config.get("info", ""),
                         value=config.get("value", ""),
                         **config.get("kwargs", {}),
                         elem_classes=config.get("elem_classes", "")
                     )
                     row_components.append(component)
-                    field_refs.append(component)
-                    # Create a change event to update the field states when the value changes.
-                    component.change(
-                        fn=update_fields,
-                        inputs=[*field_states, *field_refs, gr.State(i)],
-                        outputs=field_states
-                    )
-                # Add a "Remove" button to delete the current row.
-                remove_btn = gr.Button("❌", variant="secondary")
-                remove_btn.click(
-                    lambda x, idx=i, fs=field_states: (
-                        max(0, x - 1),  # Decrease the count of rows.
-                        # Remove the row's values.
-                        *[fs[i].value[:idx] + fs[i].value[idx + 1:] for i in range(len(fs))]
-                    ),
-                    inputs=count_state,
-                    outputs=[count_state, *field_states]
-                )
                 row_components.append(remove_btn)
-                # Add the row's components to the list of all components.
-                all_components.extend(row_components)
-        return all_components
-    # Initialize the section with the initial count of rows.
-    render_dynamic_section(count=initial_count)
-    # Create an "Add" button to add new rows to the section.
     add_btn = gr.Button(f"Add {section_name}")
-    add_btn.click(lambda x: x + 1, count_state, count_state)
-    return (count_state, *field_states, add_btn)
 def create_header_tab():
@@ -141,9 +167,13 @@ def create_header_tab():
         formatVersionSpecificationUri = gr.Textbox(
             label="Format Version Specification URI", info="(the URI of the present specification of this set of schemas)")
         reportId = gr.Textbox(
-            label="Report ID", info="(the unique identifier of this report, preferably as a uuid4 string)")
         reportDatetime = gr.Textbox(
-            label="Report Datetime", info="Required field<br>(the publishing date of this report in format YYYY-MM-DD HH:MM:SS)", elem_classes="mandatory_field")
         reportStatus = gr.Dropdown(value=None,
                                    label="Report Status",
                                    choices=REPORT_STATUS_OPTIONS,
@@ -259,7 +289,7 @@ def create_task_tab():
                         "info": "(the type of quantization used : fp32, fp16, b16, int8 ...)",
                     }
                 ],
-                initial_count=0,
                 layout="column"
             )
@@ -323,7 +353,7 @@ def create_task_tab():
                         "info": "(the owner of the dataset if available)",
                     }
                 ],
-                initial_count=0,
                 layout="column"
             )
@@ -421,7 +451,7 @@ def create_measures_tab():
                         "info": "(the date when the measurement began, in format YYYY-MM-DD HH:MM:SS)",
                     }
                 ],
-                initial_count=0,
                 layout="column"
             )
@@ -520,7 +550,7 @@ def create_infrastructure_tab():
                         "info": "(the percentage of the physical equipment used by the task, this sharing property should be set to 1 by default (if no share) and otherwise to the correct percentage, e.g. 0.5 if you share half-time.)",
                     }
                 ],
-                initial_count=0,
                 layout="column"
             )

+import uuid
 import gradio as gr
+import datetime
 from src.services.util import (
     REPORT_STATUS_OPTIONS, CONFIDENTIALITY_LEVELS, DATA_USAGE_OPTIONS, DATA_FORMAT,
     DATA_TYPES, DATA_SOURCE,
 def create_dynamic_section(section_name, fields_config, initial_count=1, layout="row"):
     """
+    Creates a simplified dynamic section with a fixed number of pre-created rows.
+    This approach prioritizes data preservation over true dynamic functionality.
     Args:
         section_name (str): The name of the section (e.g., "Algorithms", "Components").
+        fields_config (list): A list of dictionaries defining the configuration for each field.
+        initial_count (int): The initial number of rows to show (up to MAX_ROWS).
         layout (str): The layout of the fields in each row ("row" or "column").
     Returns:
+        tuple: A tuple containing states and the add button, compatible with existing code.
     """
+    # Fixed number of rows - simple but reliable approach
+    MAX_ROWS = 5
+    # Create field states for compatibility with existing code
     field_states = [gr.State([]) for _ in fields_config]
+    # Initialize field states with empty values for all possible rows
+    for field_state in field_states:
+        field_state.value = [""] * MAX_ROWS
+    # Create all rows upfront (some hidden initially)
     all_components = []
+    all_field_components = []  # Store all field components for event binding
+    for row_idx in range(MAX_ROWS):
+        # Use accordion instead of Group for better visibility control
+        # Show only initial_count rows at the beginning
+        is_visible = row_idx < initial_count
+        # Use accordion that's open for visible rows
+        with gr.Accordion(f"{section_name} {row_idx + 1}", open=is_visible, visible=is_visible) as group:
             with (gr.Row() if layout == "row" else gr.Column()):
+                row_components = []
                 for field_idx, config in enumerate(fields_config):
+                    # Create component
                     component = config["type"](
+                        label=f"{config['label']} ({section_name} {row_idx + 1})",
                         info=config.get("info", ""),
                         value=config.get("value", ""),
                         **config.get("kwargs", {}),
                         elem_classes=config.get("elem_classes", "")
                     )
                     row_components.append(component)
+                    # Store component and indices for later event binding
+                    all_field_components.append(
+                        (component, field_idx, row_idx))
+                # Add remove button for this row
+                remove_btn = gr.Button(
+                    "❌ Remove", variant="secondary", size="sm", visible=True)
                 row_components.append(remove_btn)
+        all_components.append((group, row_components))
+    # Visibility state
+    visible_count = gr.State(initial_count)
+    # Add button
     add_btn = gr.Button(f"Add {section_name}")
+    def handle_add(current_count):
+        """Show one more row if available"""
+        new_count = min(current_count + 1, MAX_ROWS)
+        # Update visibility for all groups
+        visibility_updates = []
+        for i in range(MAX_ROWS):
+            # For accordion, we need to update both visible and open states
+            visibility_updates.append(
+                gr.update(visible=(i < new_count), open=(i < new_count)))
+        return new_count, *visibility_updates
+    def handle_remove(current_count):
+        """Hide the last visible row"""
+        new_count = max(current_count - 1,
+                        1)  # Always keep at least 1 row visible
+        # Update visibility for all groups
+        visibility_updates = []
+        for i in range(MAX_ROWS):
+            # For accordion, we need to update both visible and open states
+            visibility_updates.append(
+                gr.update(visible=(i < new_count), open=(i < new_count)))
+        return new_count, *visibility_updates
+    # Connect add button
+    group_outputs = [group for group, _ in all_components]
+    add_btn.click(
+        fn=handle_add,
+        inputs=[visible_count],
+        outputs=[visible_count] + group_outputs
+    )
+    # Connect remove buttons for each row
+    for row_idx, (group, row_components) in enumerate(all_components):
+        remove_btn = row_components[-1]  # Remove button is the last component
+        remove_btn.click(
+            fn=handle_remove,
+            inputs=[visible_count],
+            outputs=[visible_count] + group_outputs
+        )
+    # Force initial visibility on interface load
+    def force_initial_visibility():
+        """Force initial visibility when the interface loads"""
+        visibility_updates = []
+        for i in range(MAX_ROWS):
+            visibility_updates.append(
+                gr.update(visible=(i < initial_count), open=(i < initial_count)))
+        return visibility_updates
+    # Create a simple info display
+    info_display = gr.Markdown(f"**{section_name}** (Max {MAX_ROWS} items)")
+    # Dummy count state for compatibility
+    count_state = gr.State(initial_count)
+    # Apply initial visibility immediately after component creation
+    if initial_count > 0:
+        # Use app load event to ensure visibility
+        for i, (group, _) in enumerate(all_components):
+            if i < initial_count:
+                group.visible = True
+                group.open = True
+    # Store the actual components to return instead of gr.State
+    components_to_return = []
+    for field_idx in range(len(fields_config)):
+        field_components = []
+        for row_idx in range(MAX_ROWS):
+            # Find the component for this field and row
+            for component, f_idx, r_idx in all_field_components:
+                if f_idx == field_idx and r_idx == row_idx:
+                    field_components.append(component)
+                    break
+        components_to_return.append(field_components)
+    return (count_state, *components_to_return, add_btn)
 def create_header_tab():
         formatVersionSpecificationUri = gr.Textbox(
             label="Format Version Specification URI", info="(the URI of the present specification of this set of schemas)")
         reportId = gr.Textbox(
+            label="Report ID", info="(the unique identifier of this report, preferably as a uuid4 string)", value=str(uuid.uuid4()))
         reportDatetime = gr.Textbox(
+            label="Report Datetime",
+            info="Required field<br>(the publishing date of this report in format YYYY-MM-DD HH:MM:SS)",
+            elem_classes="mandatory_field",
+            value=datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+        )
         reportStatus = gr.Dropdown(value=None,
                                    label="Report Status",
                                    choices=REPORT_STATUS_OPTIONS,
                         "info": "(the type of quantization used : fp32, fp16, b16, int8 ...)",
                     }
                 ],
+                initial_count=1,
                 layout="column"
             )
                         "info": "(the owner of the dataset if available)",
                     }
                 ],
+                initial_count=1,
                 layout="column"
             )
                         "info": "(the date when the measurement began, in format YYYY-MM-DD HH:MM:SS)",
                     }
                 ],
+                initial_count=1,
                 layout="column"
             )
                         "info": "(the percentage of the physical equipment used by the task, this sharing property should be set to 1 by default (if no share) and otherwise to the correct percentage, e.g. 0.5 if you share half-time.)",
                     }
                 ],
+                initial_count=1,
                 layout="column"
             )