Spaces:

Agents-MCP-Hackathon
/

AutoDocsSmartDocumentationGenerator

Sleeping

App Files Files Community

The0eau commited on Jun 10

Commit

d12bff5

1 Parent(s): c00f77a

Upload

Browse files

Files changed (9) hide show

.gitignore +1 -0
.well-known/mcp.yaml +3 -0
README.md +183 -4
app.py +144 -0
ask_agent.py +62 -0
doc_generator.py +141 -0
index.md +10 -0
readme_generator.py +111 -0
requirements.txt +11 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .env

.well-known/mcp.yaml ADDED Viewed

	@@ -0,0 +1,3 @@

+schema_version: 1
+type: mcp_server
+mcp_version: 0.1

README.md CHANGED Viewed

@@ -1,14 +1,193 @@
 ---
-title: AutoDocsSmartDocumentationGenerator
-emoji: 🐢
-colorFrom: gray
-colorTo: purple
 sdk: gradio
 sdk_version: 5.33.1
 app_file: app.py
 pinned: false
 license: mit
 short_description: Automatic documentation generator for GitHub or zipped repos
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Agents MCP Hackathon
+emoji: 🐠
+colorFrom: indigo
+colorTo: red
 sdk: gradio
 sdk_version: 5.33.1
 app_file: app.py
 pinned: false
 license: mit
 short_description: Automatic documentation generator for GitHub or zipped repos
+tags:
+    - mcp-server-track
+    - gradio-app
+    - hackathon
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🤖 AutoDocs – MCP Server for Automatic Code Documentation
+Automatic documentation generator for GitHub or zipped repositories.
+**AutoDocs** is a Gradio-based application that serves as an **MCP Server (Track 1)** for the Agents & MCP Hackathon.
+It automatically generates documentation, README files, and requirements.txt for any Python code repository provided via GitHub URL or ZIP file.
+---
+## 🚀 Features
+- 📦 Upload and process any ZIP file of a code repo.
+- 🌐 Clone and process any GitHub repository.
+- 📄 Auto-generate:
+  - Docstrings (Google style)
+  - Typings
+  - Inline code comments
+  - `requirements.txt`
+  - `README.md` + `index.md`
+- 🧠 Integrated AI agent to ask questions about the code.
+---
+## 🛠️ MCP Server Information
+✅ This Space is an **MCP Server (Track 1)** compliant with the MCP protocol.
+MCP Metadata (`.well-known/mcp.yaml`)
+## 💻 Usage (as MCP Client)
+This server can be queried via Claude Desktop, Cursor, or Tiny Agents MCP clients.
+Example with Tiny Agents:
+```bash
+tiny-agents call --url https://huggingface.co/spaces/your-space-name
+```
+## Project Description
+AutoDocs is a tool designed to automatically generate documentation, requirements files, and README files for Python projects. It leverages generative AI to add helpful comments and type annotations to your code, making it easier to understand and maintain.  It can process a local repository, a GitHub repository via URL, or a zipped source code directory.
+## Installation
+1.  **Clone the repository:**
+    ```bash
+    git clone <repository_url>
+    cd <repository_name>
+    ```
+2.  **Create a virtual environment (recommended):**
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate  # On Linux/macOS
+    venv\Scripts\activate  # On Windows
+    ```
+3.  **Install the dependencies:**
+    ```bash
+    pip install -r requirements.txt
+    ```
+4.  **Set up the environment variables:**
+    *   Create a `.env` file in the root directory of the project.
+    *   Add your Google Gemini API key to the `.env` file:
+        ```
+        GOOGLE_API_KEY=<your_google_api_key>
+        ```
+        **Note:**  You will need a Google Gemini API key to use the documentation generation features. You can obtain one from the Google AI Studio.
+## Usage
+### Using the `app.py` module:
+The `app.py` module contains the core logic for processing a repository and generating documentation.
+```python
+import gradio as gr
+import os
+import shutil
+import tempfile
+import zipfile
+import subprocess
+import uuid
+from doc_generator import generate_documented_code, generate_requirements_txt
+from readme_generator import generate_readme_from_zip
+def process_repo(repo_path: str, zip_output_name: str = "AutoDocs") -> str:
+    """Processes a repository to generate documentation, requirements, and a README.
+    Args:
+        repo_path: The path to the repository.
+        zip_output_name: The name of the output zip file (default: "AutoDocs").
+    Returns:
+        The path to the generated zip file.
+    """
+    with tempfile.TemporaryDirectory() as temp_output_dir:
+        # Iterate through all Python files in the repository and generate documented code.
+        for root, _, files in os.walk(repo_path):
+            for file in files:
+                if file.endswith(".py"):
+                    file_path = os.path.join(root, file)
+                    generate_documented_code(file_path, file_path)
+# Example Usage (not executable directly from this file, intended for integration):
+# repo_path = "/path/to/your/repository"
+# output_zip = process_repo(repo_path)
+# print(f"Generated documentation zip file: {output_zip}")
+```
+### Using the FastAPI server (`mcp_server.py`):
+The `mcp_server.py` module provides a FastAPI server with endpoints for generating documentation from a GitHub URL or a zip file upload.
+1.  **Run the FastAPI server:**
+    ```bash
+    uvicorn mcp_server:app --reload
+    ```
+2.  **Access the endpoints:**
+    *   **Generate documentation from a GitHub URL:**
+        ```
+        POST /generate_docs
+        Content-Type: multipart/form-data
+        github_url=<your_github_url>
+        ```
+    *   **Generate documentation from a zip file upload:**
+        ```
+        POST /generate_docs
+        Content-Type: multipart/form-data
+        zip_file=@<path_to_your_zip_file>
+        ```
+    *   **MCP Manifest (/.well-known/mcp.yaml):**
+        ```
+        GET /.well-known/mcp.yaml
+        ```
+        This endpoint serves the MCP manifest file.
+## Features
+*   **Automated Documentation Generation:** Uses generative AI to add comments and type annotations to Python code.
+*   **Requirements File Generation:**  Automatically creates a `requirements.txt` file listing the project dependencies.
+*   **README Generation:**  Generates a basic README file based on the project structure and code content.
+*   **GitHub URL Processing:**  Can process repositories directly from GitHub URLs.
+*   **Zip File Upload:**  Supports uploading zip files of source code for documentation generation.
+*   **MCP Manifest Serving:** Includes an endpoint to serve an MCP (Meta Control Protocol) manifest.
+## Authors
+Aguet Theau, Azdad Bilal.
+## License
+MIT License.

app.py ADDED Viewed

	@@ -0,0 +1,144 @@

+import gradio as gr
+import os
+import shutil
+import tempfile
+import zipfile
+import subprocess
+import uuid
+from ask_agent import ask_agent
+from doc_generator import generate_documented_code, generate_requirements_txt
+from readme_generator import generate_readme_from_zip
+last_processed_repo_path = ""
+def process_repo(repo_path, zip_output_name="AutoDocs"):
+    with tempfile.TemporaryDirectory() as temp_output_dir:
+        # Document .py files
+        for root, _, files in os.walk(repo_path):
+            for file in files:
+                if file.endswith(".py"):
+                    file_path = os.path.join(root, file)
+                    generate_documented_code(file_path, file_path)
+        # requirements.txt
+        requirements_path = os.path.join(repo_path, "requirements.txt")
+        generate_requirements_txt(repo_path, requirements_path)
+        # Create a temporary .zip for README/index
+        with tempfile.NamedTemporaryFile(suffix=".zip", delete=False) as tmp_zip:
+            zip_path = tmp_zip.name
+            with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zipf:
+                for root, _, files in os.walk(repo_path):
+                    for file in files:
+                        full_path = os.path.join(root, file)
+                        rel_path = os.path.relpath(full_path, repo_path)
+                        zipf.write(full_path, rel_path)
+        # README + index.md
+        readme_path, index_path = generate_readme_from_zip(zip_path, temp_output_dir)
+        # Copy the processed repo
+        for item in os.listdir(repo_path):
+            s = os.path.join(repo_path, item)
+            d = os.path.join(temp_output_dir, item)
+            if os.path.isdir(s):
+                shutil.copytree(s, d, dirs_exist_ok=True)
+            else:
+                shutil.copy2(s, d)
+        dest_readme = os.path.join(temp_output_dir, "README.md")
+        dest_index = os.path.join(temp_output_dir, "index.md")
+        if os.path.abspath(readme_path) != os.path.abspath(dest_readme):
+            shutil.copy2(readme_path, dest_readme)
+        if os.path.abspath(index_path) != os.path.abspath(dest_index):
+            shutil.copy2(index_path, dest_index)
+        # Output zip file with consistent name
+        output_zip_path = os.path.join(
+            tempfile.gettempdir(), f"{zip_output_name}.zip"
+        )
+        with zipfile.ZipFile(output_zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
+            for root, _, files in os.walk(temp_output_dir):
+                for file in files:
+                    full_path = os.path.join(root, file)
+                    arcname = os.path.relpath(full_path, temp_output_dir)
+                    zipf.write(full_path, arcname)
+        global last_processed_repo_path
+        last_processed_repo_path = output_zip_path
+        return output_zip_path
+def process_zip_upload(uploaded_zip_file):
+    zip_path = uploaded_zip_file.name
+    zip_name = os.path.splitext(os.path.basename(zip_path))[0]  # e.g., my_project.zip → my_project
+    with tempfile.TemporaryDirectory() as temp_input_dir:
+        input_zip_path = os.path.join(temp_input_dir, "input_repo.zip")
+        shutil.copy(zip_path, input_zip_path)
+        with zipfile.ZipFile(input_zip_path, "r") as zip_ref:
+            zip_ref.extractall(temp_input_dir)
+        extracted_dirs = [d for d in os.listdir(temp_input_dir) if os.path.isdir(os.path.join(temp_input_dir, d))]
+        repo_root = os.path.join(temp_input_dir, extracted_dirs[0]) if extracted_dirs else temp_input_dir
+        return process_repo(repo_root, zip_name)
+def process_github_clone(github_url):
+    with tempfile.TemporaryDirectory() as clone_dir:
+        try:
+            subprocess.check_call(["git", "clone", github_url, clone_dir])
+            return process_repo(clone_dir)
+        except subprocess.CalledProcessError:
+            return "❌ Error cloning the GitHub repository. Please check the URL."
+# Wrapper for process_zip_upload that also returns the path for the state
+def process_zip_and_update_state(uploaded_zip_file):
+    zip_path = process_zip_upload(uploaded_zip_file)
+    return zip_path, zip_path  # (output for gr.File, output for gr.State)
+# Wrapper for process_github_clone as well
+def process_git_and_update_state(github_url):
+    zip_path = process_github_clone(github_url)
+    return zip_path, zip_path
+# Gradio user interface
+with gr.Blocks() as demo:
+    gr.Markdown("# 🤖 AutoDocs – Smart Documentation Generator")
+    last_processed_repo_path_state = gr.State(value="")
+    with gr.Tab("📦 Upload .zip"):
+        zip_file_input = gr.File(label="Drop your repo .zip file here", file_types=['.zip'])
+        generate_btn_zip = gr.Button("📄 Generate from ZIP")
+        output_zip_zip = gr.File(label="⬇️ Download your documented repo")
+    with gr.Tab("🌐 GitHub URL"):
+        github_url_input = gr.Text(label="Link to GitHub repository", placeholder="https://github.com/user/repo.git")
+        generate_btn_git = gr.Button("📄 Generate from GitHub")
+        output_zip_git = gr.File(label="⬇️ Download your documented repo")
+    with gr.Tab("🧠 Ask the agent about the repo"):
+        chatbot = gr.Chatbot()
+        user_input = gr.Textbox(placeholder="Ask your question here...")
+        send_btn = gr.Button("Send")
+        send_btn.click(
+            fn=ask_agent,
+            inputs=[chatbot, user_input, last_processed_repo_path_state],
+            outputs=[chatbot, user_input]
+        )
+    generate_btn_zip.click(
+        fn=process_zip_and_update_state,
+        inputs=[zip_file_input],
+        outputs=[output_zip_zip, last_processed_repo_path_state]
+    )
+    generate_btn_git.click(
+        fn=process_git_and_update_state,
+        inputs=[github_url_input],
+        outputs=[output_zip_git, last_processed_repo_path_state]
+    )
+if __name__ == "__main__":
+    demo.queue()
+    demo.launch()

ask_agent.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import os
+import tempfile
+import zipfile
+import google.generativeai as genai
+from dotenv import load_dotenv
+load_dotenv()
+API_KEY = os.getenv("GOOGLE_API_KEY")
+genai.configure(api_key=API_KEY)
+model = genai.GenerativeModel("models/gemini-2.0-flash")
+chat_session = model.start_chat(history=[])
+def ask_agent(history, message, last_processed_repo_path):
+    if not last_processed_repo_path or not os.path.exists(last_processed_repo_path):
+        return history, "📂 No repository has been processed yet. Please generate documentation first."
+    with tempfile.TemporaryDirectory() as tmpdir:
+        with zipfile.ZipFile(last_processed_repo_path, 'r') as zip_ref:
+            zip_ref.extractall(tmpdir)
+        # Extensions for docs and code to consider
+        extensions_docs = [".md", ".txt"]
+        extensions_code = [".py", ".js", ".java", ".ts", ".cpp", ".c", ".cs", ".go", ".rb", ".swift", ".php"]
+        all_files = []
+        for root, _, files in os.walk(tmpdir):
+            for file in files:
+                ext = os.path.splitext(file)[1].lower()
+                if ext in extensions_docs or ext in extensions_code:
+                    all_files.append(os.path.join(root, file))
+        if not all_files:
+            return history, "📄 No documentation or code files found in the generated zip."
+        # Read and concatenate content
+        docs_and_code_content = ""
+        for file_path in all_files:
+            try:
+                with open(file_path, "r", encoding="utf-8") as f:
+                    file_content = f.read()
+                rel_path = os.path.relpath(file_path, tmpdir)
+                docs_and_code_content += f"\n\n===== File: {rel_path} =====\n\n"
+                docs_and_code_content += file_content
+            except Exception as e:
+                docs_and_code_content += f"\n\n===== Error reading file {file_path}: {str(e)} =====\n\n"
+    prompt = (
+        f"Here is the content of the project (documentation and code):\n\n{docs_and_code_content}\n\n"
+        f"Question: {message}\n\nPlease respond clearly and precisely."
+    )
+    try:
+        response = chat_session.send_message(prompt)
+        answer = response.text
+    except Exception as e:
+        answer = f"❌ Error when calling Gemini: {str(e)}"
+    history = history or []
+    history.append((message, answer))
+    return history, ""

doc_generator.py ADDED Viewed

	@@ -0,0 +1,141 @@

+import google.generativeai as genai
+import re
+import os
+import ast
+from dotenv import load_dotenv
+import sys
+import importlib.util
+load_dotenv()
+API_KEY = os.getenv("GOOGLE_API_KEY")
+if API_KEY is None:
+    raise ValueError("⚠️ The API key MY_API_KEY is missing! Check the Secrets in Hugging Face.")
+genai.configure(api_key=API_KEY)
+model = genai.GenerativeModel("models/gemini-2.0-flash")
+PROMPT = """You are an expert programming assistant.
+For the following code, perform the following actions:
+- The code must remain exactly the same
+- Add clear comments for each important step.
+- Rename variables if it makes the code easier to understand.
+- Add type annotations if the language supports it.
+- For each function, add a Google-style docstring (or equivalent format depending on the language).
+Respond only with the updated code, no explanation.
+Here is the code:
+{code}
+"""
+def generate_documented_code(input_path: str, output_path: str) -> str:
+    """
+    Generate a documented version of the code from the given input file and save it to the output file.
+    Args:
+        input_path (str): Path to the original code file.
+        output_path (str): Path where the documented code will be saved.
+    Returns:
+        str: The updated and documented code.
+    """
+    with open(input_path, "r", encoding="utf-8") as f:
+        original_code = f.read()
+    prompt = PROMPT.format(code=original_code)
+    response = model.generate_content(prompt)
+    updated_code = response.text.strip()
+    # Clean up Markdown blocks if present
+    lines = updated_code.splitlines()
+    if len(lines) > 2:
+        lines = lines[1:-1]  # remove the first and last lines
+        updated_code = "\n".join(lines)
+    else:
+        # if less than 3 lines, clear everything or keep as is depending on needs
+        updated_code = ""
+    with open(output_path, "w", encoding="utf-8") as output_file:
+        output_file.write(updated_code)
+    return updated_code
+def extract_imports_from_file(file_path):
+    """
+    Extract imported modules from a Python file to generate requirements.txt.
+    Args:
+        file_path (str): Path to the Python file.
+    Returns:
+        set: A set of imported module names.
+    """
+    try:
+        with open(file_path, "r", encoding="utf-8") as f:
+            tree = ast.parse(f.read())
+    except SyntaxError:
+        return set()
+    imports = set()
+    for node in ast.walk(tree):
+        if isinstance(node, ast.Import):
+            for alias in node.names:
+                imports.add(alias.name.split('.')[0])
+        elif isinstance(node, ast.ImportFrom):
+            if node.module and not node.module.startswith("."):
+                imports.add(node.module.split('.')[0])
+    return imports
+def is_std_lib(module_name):
+    """
+    Check if a module is part of the Python standard library.
+    Args:
+        module_name (str): The name of the module.
+    Returns:
+        bool: True if the module is part of the standard library, False otherwise.
+    """
+    if module_name in sys.builtin_module_names:
+        return True
+    spec = importlib.util.find_spec(module_name)
+    return spec is not None and "site-packages" not in (spec.origin or "")
+def generate_requirements_txt(base_path, output_path):
+    """
+    Generate a requirements.txt file based on external imports found in Python files.
+    Args:
+        base_path (str): Root directory of the codebase.
+        output_path (str): Path to save the generated requirements.txt file.
+    """
+    all_imports = set()
+    local_modules = set()
+    # Get names of internal modules (i.e., .py files in the repo)
+    for root, _, files in os.walk(base_path):
+        for file in files:
+            if file.endswith(".py"):
+                module_name = os.path.splitext(file)[0]
+                local_modules.add(module_name)
+    # Extract all imports used in the project
+    for root, _, files in os.walk(base_path):
+        for file in files:
+            if file.endswith(".py"):
+                file_path = os.path.join(root, file)
+                all_imports.update(extract_imports_from_file(file_path))
+    # Remove internal modules and standard library modules
+    external_imports = sorted([
+        imp for imp in all_imports
+        if imp not in local_modules and not is_std_lib(imp)
+    ])
+    # Write the requirements.txt file
+    with open(output_path, "w", encoding="utf-8") as f:
+        for package in external_imports:
+            f.write(f"{package}\n")

index.md ADDED Viewed

	@@ -0,0 +1,10 @@

+📁 repo/
+├── .well-known/
+│   ├── mcp.yaml
+├── app.py ← Gradio + MCP server
+├── doc_generator.py
+├── mcp_server.py
+├── readme_generator.py
+├── requirements.txt
+├── README.md
+└── index.md

readme_generator.py ADDED Viewed

	@@ -0,0 +1,111 @@

+import os
+import zipfile
+import tempfile
+import google.generativeai as genai
+from dotenv import load_dotenv
+from doc_generator import generate_requirements_txt
+load_dotenv()
+genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
+model = genai.GenerativeModel("models/gemini-2.0-flash")
+ANNOTATIONS = {
+    "app.py": "← Gradio + MCP server",
+    "README.md": "← With demo + tag \"mcp-server-track\"",
+    "demo_video.mp4": "← Link embedded in README"
+}
+PROMPT = """You are an expert software project documentation assistant.
+You will write a clear, complete, and well-structured `README.md` file for a source code repository with the following files and content excerpts:
+{file_summaries}
+The README must contain:
+1. A title
+2. A short project description
+3. An "Installation" section
+4. A "Usage" section
+5. A "Features" section
+6. An "Authors" section (write "To be completed" if not detected)
+7. A "License" section (write "To be completed" if not detected)
+Respond only with the README.md content, without markdown ``` tags.
+"""
+def summarize_files(dir_path, max_files=20, max_chars=5000):
+    summaries = []
+    for root, _, files in os.walk(dir_path):
+        for file in files:
+            if file.endswith((".py", ".js", ".ts", ".java", ".md", ".json", ".txt")):
+                try:
+                    with open(os.path.join(root, file), "r", encoding="utf-8") as f:
+                        content = f.read()
+                    rel_path = os.path.relpath(os.path.join(root, file), dir_path)
+                    summaries.append(f"### {rel_path}\n```\n{content[:1000]}\n```")
+                    if len("".join(summaries)) > max_chars:
+                        break
+                except Exception:
+                    continue
+            if len(summaries) >= max_files:
+                break
+    return "\n\n".join(summaries)
+def generate_readme_from_zip(zip_file_path: str, output_dir: str) -> (str, str):
+    with tempfile.TemporaryDirectory() as tempdir:
+        with zipfile.ZipFile(zip_file_path, "r") as zip_ref:
+            zip_ref.extractall(tempdir)
+        file_summaries = summarize_files(tempdir)
+        prompt = PROMPT.format(file_summaries=file_summaries)
+        response = model.generate_content(prompt)
+        readme_content = response.text.strip()
+        readme_path = os.path.join(output_dir, "README.md")
+        index_path = os.path.join(output_dir, "index.md")
+        os.makedirs(output_dir, exist_ok=True)
+        # Clean markdown code blocks if they exist
+        lines = readme_content.splitlines()
+        if len(lines) > 2:
+            lines = lines[1:-1]  # remove the first and last lines
+            readme_content = "\n".join(lines)
+        else:
+            # if less than 3 lines, empty or keep as needed
+            readme_content = ""
+        with open(readme_path, "w", encoding="utf-8") as f:
+            f.write(readme_content)
+        # ✅ Generate index from tempdir (correct location of extracted files)
+        write_index_file(tempdir, index_path)
+        return readme_path, index_path
+def generate_tree_structure(path: str, prefix: str = "") -> str:
+    entries = sorted(os.listdir(path))
+    lines = []
+    dir_name = os.path.basename(os.path.abspath(path))
+    lines.append(f"📁 repo/")
+    for idx, entry in enumerate(entries):
+        full_path = os.path.join(path, entry)
+        connector = "├── "
+        comment = f" {ANNOTATIONS.get(entry, '')}".rstrip()
+        lines.append(prefix + connector + (entry + "/" if os.path.isdir(full_path) else entry) + comment)
+        if os.path.isdir(full_path):
+            extension_prefix = "│   "
+            subtree = generate_tree_structure(full_path, prefix + extension_prefix)
+            lines.extend(subtree.splitlines()[1:])  # skip repeated dir name
+    lines.extend(["├── README.md",
+                  "└── index.md"])
+    return "\n".join(lines)
+def write_index_file(project_path: str, output_path: str):
+    structure = generate_tree_structure(project_path)
+    with open(output_path, "w", encoding="utf-8") as f:
+        f.write(structure)

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+gradio>=4.0.0
+openai
+tqdm
+requests
+gitpython
+python-dotenv
+google.generativeai
+dotenv
+jinja2
+python-multipart
+stdlib-list