Spaces:

Molbap
/

transformers-modular-refactor

Running

App Files Files Community

transformers-modular-refactor / README.md

Molbap HF Staff

update

4cfe8d8 26 days ago

preview code

raw

history blame

4.93 kB

metadata

title: Transformers Modular Refactor
emoji: 😻
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive analyzer for modular models in Transformers lib

🔍 Transformers Modular Refactor Analyzer

This interactive tool helps analyze modular refactoring opportunities in the HuggingFace Transformers library by visualizing model relationships, similarity patterns, and the impact of modularization on code maintainability.

📊 Features Overview

🕒 Tab 1: Chronological Timeline

Interactive timeline showing the evolution of transformer models with modular dependencies positioned by their creation dates.

Key Features:

Models positioned chronologically by git history
Modular dependency connections between models
Similarity scores between candidate models (red dashed edges)
Timeline axis with year/month markers
Modular Logic Milestone: May 31, 2024 marker showing when modular logic was introduced
Search functionality to highlight specific models and their connections
Zoom and pan to explore the full timeline

Visual Legend:

🟡 Base models: Foundation models that others depend on
🔵 Modular models: Models with existing modular_*.py implementations
🔴 Candidate models: Models without modular implementations (refactoring opportunities)
Blue edges: Import dependencies between modular implementations
Red dashed edges: High similarity scores indicating refactoring potential

📈 Tab 2: LOC Growth

Chart visualizing how modular refactoring impacts Lines of Code (LOC) over time in the transformers repository.

Metrics Tracked:

Effective LOC: Total maintainable code (modeling LOC for non-modular + modular LOC)
Modular LOC: Lines of code in modular_*.py files
Modeling LOC (all): Total lines in all modeling_*.py files
Modeling LOC (included): Lines in modeling_*.py files for models without modular versions

Key Insights:

Shows the trajectory toward reduced code duplication
Demonstrates how modular refactoring can reduce total maintainable code
May 31, 2024 annotation marks the introduction of modular logic
Interactive chart with time-series data from git history

🌐 Tab 3: Dependency Graph

Static network visualization focusing on model relationships and similarity patterns without chronological constraints.

Features:

Force-directed graph layout optimized for relationship visibility
Toggle to show/hide candidate models and similarity edges
Node sizes reflect connection degree (more connected = larger)
Interactive drag-and-drop for graph exploration
Zoom and pan capabilities

Analysis Capabilities:

Identify clusters of highly similar models (refactoring targets)
Understand modular dependency patterns
Spot potential consolidation opportunities
Explore the current modular architecture

🛠️ Technical Details

Similarity Methods

Jaccard Similarity: Token-based similarity using identifier overlap in source code
Embedding Similarity: CodeBERT-based semantic similarity (when available)

Data Sources

Git History: Model creation dates from transformers repository commits
Source Analysis: AST parsing of modeling_*.py and modular_*.py files
Dependency Tracking: Import analysis to build modular dependency graphs
Cached Embeddings: Pre-computed similarity matrices for performance

Filtering Options

Similarity Threshold: Adjustable cutoff for showing similarity edges (0.5-0.95)
Multimodal Filter: Focus on models with multimodal capabilities (models mentioning "pixel_values")
Show/Hide Candidates: Toggle visibility of non-modular models and their similarities

🎯 Use Cases

Refactoring Planning: Identify which models would benefit most from modularization
Architecture Analysis: Understand current modular dependencies and patterns
Code Reduction: Quantify the impact of modular refactoring on maintainability
Timeline Analysis: See how the transformers library evolved toward modular architecture

📚 How to Use

Chronological Timeline: Use the search box to find specific models, zoom to explore different time periods, click nodes to highlight connections
LOC Growth: Hover over data points to see exact metrics, observe the trend toward code reduction
Dependency Graph: Drag nodes to reorganize the layout, toggle candidates on/off, use zoom for detailed exploration

🔬 Research Context

This tool supports analysis of modular refactoring in large-scale ML libraries, helping identify code duplication patterns and measure the effectiveness of architectural improvements in reducing maintenance burden.

Built with Gradio, D3.js, and ApexCharts for interactive data visualization