title: Transformers Modular Refactor
emoji: π»
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive analyzer for modular models in Transformers lib
π Transformers Modular Refactor Analyzer
This interactive tool helps analyze modular refactoring opportunities in the HuggingFace Transformers library by visualizing model relationships, similarity patterns, and the impact of modularization on code maintainability.
π Features Overview
π Tab 1: Chronological Timeline
Interactive timeline showing the evolution of transformer models with modular dependencies positioned by their creation dates.
Key Features:
- Models positioned chronologically by git history
- Modular dependency connections between models
- Similarity scores between candidate models (red dashed edges)
- Timeline axis with year/month markers
- Modular Logic Milestone: May 31, 2024 marker showing when modular logic was introduced
- Search functionality to highlight specific models and their connections
- Zoom and pan to explore the full timeline
Visual Legend:
- π‘ Base models: Foundation models that others depend on
- π΅ Modular models: Models with existing
modular_*.py
implementations - π΄ Candidate models: Models without modular implementations (refactoring opportunities)
- Blue edges: Import dependencies between modular implementations
- Red dashed edges: High similarity scores indicating refactoring potential
π Tab 2: LOC Growth
Chart visualizing how modular refactoring impacts Lines of Code (LOC) over time in the transformers repository.
Metrics Tracked:
- Effective LOC: Total maintainable code (modeling LOC for non-modular + modular LOC)
- Modular LOC: Lines of code in
modular_*.py
files - Modeling LOC (all): Total lines in all
modeling_*.py
files - Modeling LOC (included): Lines in
modeling_*.py
files for models without modular versions
Key Insights:
- Shows the trajectory toward reduced code duplication
- Demonstrates how modular refactoring can reduce total maintainable code
- May 31, 2024 annotation marks the introduction of modular logic
- Interactive chart with time-series data from git history
π Tab 3: Dependency Graph
Static network visualization focusing on model relationships and similarity patterns without chronological constraints.
Features:
- Force-directed graph layout optimized for relationship visibility
- Toggle to show/hide candidate models and similarity edges
- Node sizes reflect connection degree (more connected = larger)
- Interactive drag-and-drop for graph exploration
- Zoom and pan capabilities
Analysis Capabilities:
- Identify clusters of highly similar models (refactoring targets)
- Understand modular dependency patterns
- Spot potential consolidation opportunities
- Explore the current modular architecture
π οΈ Technical Details
Similarity Methods
- Jaccard Similarity: Token-based similarity using identifier overlap in source code
- Embedding Similarity: CodeBERT-based semantic similarity (when available)
Data Sources
- Git History: Model creation dates from transformers repository commits
- Source Analysis: AST parsing of
modeling_*.py
andmodular_*.py
files - Dependency Tracking: Import analysis to build modular dependency graphs
- Cached Embeddings: Pre-computed similarity matrices for performance
Filtering Options
- Similarity Threshold: Adjustable cutoff for showing similarity edges (0.5-0.95)
- Multimodal Filter: Focus on models with multimodal capabilities (models mentioning "pixel_values")
- Show/Hide Candidates: Toggle visibility of non-modular models and their similarities
π― Use Cases
- Refactoring Planning: Identify which models would benefit most from modularization
- Architecture Analysis: Understand current modular dependencies and patterns
- Code Reduction: Quantify the impact of modular refactoring on maintainability
- Timeline Analysis: See how the transformers library evolved toward modular architecture
π How to Use
- Chronological Timeline: Use the search box to find specific models, zoom to explore different time periods, click nodes to highlight connections
- LOC Growth: Hover over data points to see exact metrics, observe the trend toward code reduction
- Dependency Graph: Drag nodes to reorganize the layout, toggle candidates on/off, use zoom for detailed exploration
π¬ Research Context
This tool supports analysis of modular refactoring in large-scale ML libraries, helping identify code duplication patterns and measure the effectiveness of architectural improvements in reducing maintenance burden.
Built with Gradio, D3.js, and ApexCharts for interactive data visualization