Spaces:

singarajusaiteja
/

corpus-collection-engine

Sleeping

App Files Files Community

singarajusaiteja commited on Aug 12

Commit

a31d3a4

verified ·

1 Parent(s): 84e0286

update

Browse files

Files changed (1) hide show

README.md +221 -0

README.md CHANGED Viewed

@@ -12,3 +12,224 @@ short_description: AI-powered platform for preserving Indian cultural heritage
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🇮🇳 Corpus Collection Engine
+Team Information
+- **Team Name**: Heritage Collectors
+- **Team Members**:
+  - Member 1: Singaraju Saiteja (Role: Streamlit app development)
+  - Member 2: Muthyapu Sudeepthi (Role: AI Integration)
+  - Member 3: Rithika Sadhu (Role: Documentation)
+  - Member 4: Golla Bharath Kumar (Role: developement stratergy)
+  - Member 5: k. Vamshi Kumar (Role: App design and user experience)
+**AI-powered platform for preserving Indian cultural heritage through interactive data collection**
+## 📋 Setup & Installation
+### Prerequisites
+- Python 3.8 or higher
+- pip package manager
+- Git (for cloning the repository)
+### Quick Start
+1. **Clone the Repository**
+   ```bash
+   git clone [repository-url]
+   cd corpus-collection-engine
+   ```
+2. **Create Virtual Environment**
+   ```bash
+   python -m venv venv
+   # On Windows
+   venv\Scripts\activate
+   # On macOS/Linux
+   source venv/bin/activate
+   ```
+3. **Install Dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Run the Application**
+   ```bash
+   streamlit run corpus_collection_engine/main.py
+   ```
+5. **Access the App**
+   Open your browser and navigate to localhost:8501
+### Alternative Installation Methods
+#### Using Docker
+```bash
+docker build -t corpus-collection-engine .
+docker run -p 8501:8501 corpus-collection-engine
+```
+#### Using the Smart Installer
+```bash
+python install_dependencies.py
+python start_app.py
+```
+## 🌟 What is this?
+The Corpus Collection Engine is an innovative Streamlit application designed to collect and preserve diverse data about Indian languages, history, and culture. Through engaging activities, users contribute to building culturally-aware AI systems while helping preserve India's rich heritage.
+## 🎯 Features
+### 🎭 Interactive Cultural Activities
+- **Meme Creator**: Generate culturally relevant memes in Indian languages
+- **Recipe Collector**: Share traditional recipes with cultural context
+- **Folklore Archive**: Preserve stories, legends, and oral traditions
+- **Landmark Identifier**: Document historical and cultural landmarks
+### 🌍 Multi-language Support
+- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese
+- Native script support and cultural context preservation
+### 📊 Real-time Analytics
+- Contribution tracking and cultural impact metrics
+- Language diversity and regional distribution analysis
+- User engagement and platform growth insights
+### 🔒 Privacy-First Design
+- No authentication required - start contributing immediately
+- Minimal data collection with full transparency
+- User-controlled privacy settings
+## 🚀 How to Use
+1. **Choose an Activity**: Select from meme creation, recipe sharing, folklore collection, or landmark documentation
+2. **Select Your Language**: Pick from 11 supported Indian languages
+3. **Contribute Content**: Share your cultural knowledge and creativity
+4. **Add Context**: Provide cultural significance and regional information
+5. **Submit**: Your contribution helps build culturally-aware AI!
+## 🎨 Activities Overview
+### 🎭 Meme Creator
+Create humorous content that reflects Indian culture, festivals, traditions, and daily life. Perfect for capturing contemporary cultural expressions.
+### 🍛 Recipe Collector
+Share traditional family recipes, regional specialties, and festival foods. Include cultural significance, occasions, and regional variations.
+### 📚 Folklore Archive
+Preserve oral traditions, folk tales, legends, and cultural stories. Help maintain the rich narrative heritage of India.
+### 🏛️ Landmark Identifier
+Document historical sites, cultural landmarks, and places of significance. Share stories and cultural importance of locations.
+## 🛠️ Technical Architecture
+### Built With
+- **Frontend**: Streamlit with custom components
+- **Backend**: Python with modular service architecture
+- **AI Integration**: Fallback text generation for public deployment
+- **Storage**: SQLite for local development, extensible for production
+- **Analytics**: Real-time metrics and reporting
+- **PWA**: Progressive Web App features for offline access
+### Project Structure
+```
+corpus_collection_engine/
+├── main.py                 # Application entry point
+├── config.py              # Configuration settings
+├── activities/            # Activity implementations
+│   ├── meme_creator.py
+│   ├── recipe_collector.py
+│   ├── folklore_collector.py
+│   └── landmark_identifier.py
+├── services/              # Core services
+│   ├── ai_service.py
+│   ├── analytics_service.py
+│   ├── engagement_service.py
+│   └── privacy_service.py
+├── models/                # Data models
+├── utils/                 # Utility functions
+└── pwa/                   # Progressive Web App files
+```
+## 🧪 Testing
+Run the test suite:
+```bash
+python -m pytest tests/
+```
+Run specific tests:
+```bash
+python test_app_startup.py
+```
+## 🚀 Deployment
+### Hugging Face Spaces
+1. Upload files to your Hugging Face Space
+2. Use `app.py` as the entry point
+3. Ensure `requirements.txt` and `.streamlit/config.toml` are included
+### Local Production
+```bash
+streamlit run corpus_collection_engine/main.py --server.port 8501
+```
+## 🤝 Contributing
+We welcome contributions! Please see CONTRIBUTING.md for guidelines.
+## 📝 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🌟 Why Contribute?
+- **Preserve Culture**: Help maintain India's diverse cultural heritage for future generations
+- **Build Better AI**: Contribute to creating more culturally-aware and inclusive AI systems
+- **Share Knowledge**: Connect with others who value cultural preservation
+- **Make Impact**: See real-time analytics of your cultural preservation impact
+## 📈 Platform Impact
+Track the collective impact of cultural preservation efforts:
+- Total contributions across all languages
+- Geographic distribution of cultural content
+- Language diversity metrics
+- Cultural significance scoring
+## 🔧 Development
+### Environment Setup
+```bash
+# Install development dependencies
+pip install -r requirements-dev.txt
+# Run linting
+flake8 corpus_collection_engine/
+# Run type checking
+mypy corpus_collection_engine/
+```
+### Configuration
+- Copy `.env.example` to `.env` and configure your settings
+- Modify `corpus_collection_engine/config.py` for application settings
+## 📞 Support
+- **Issues**: Report bugs and request features via GitHub Issues
+- **Documentation**: Check our comprehensive guides in the docs folder
+- **Community**: Join our discussions via GitHub Discussions
+---
+**Start preserving Indian culture today! 🇮🇳✨**
+*Every contribution matters in building a more culturally-aware digital future.*