import streamlit as st def run(): st.title("Data Understanding") st.write("## Overview") st.write(""" Data Understanding is the second phase of the CRISP-DM process. It involves collecting initial data, describing the data, exploring the data, and verifying data quality. """) st.write("## Key Concepts & Explanations") st.markdown(""" - **Data Collection**: Gathering data from various sources. - **Data Description**: Summarizing the main characteristics of the data. - **Data Exploration**: Using statistical and visualization techniques to understand the data. - **Data Quality Verification**: Ensuring the data is accurate, complete, and reliable. """) st.write("## Introduction") st.write(""" The Data Understanding phase is crucial for identifying potential issues with the data and gaining insights that will inform the subsequent phases of the CRISP-DM process. """) st.header("Objectives") st.write(""" - **Collect Initial Data**: Gather data from various sources to get a comprehensive dataset. - **Describe the Data**: Summarize the main characteristics of the data, including its structure and content. - **Explore the Data**: Use statistical and visualization techniques to identify patterns, trends, and anomalies. - **Verify Data Quality**: Assess the quality of the data to ensure it is suitable for analysis. """) st.header("Key Activities") st.write(""" - **Data Collection**: Gather data from internal and external sources. - **Data Description**: Generate summary statistics and visualizations to describe the data. - **Data Exploration**: Perform exploratory data analysis (EDA) to uncover patterns and relationships. - **Data Quality Verification**: Check for missing values, outliers, and inconsistencies in the data. """) st.write("## Detailed Steps") st.write(""" 1. **Collect Initial Data**: - Identify relevant data sources. - Extract data from various sources and consolidate it into a single dataset. 2. **Describe the Data**: - Generate summary statistics (e.g., mean, median, standard deviation). - Create visualizations (e.g., histograms, box plots) to describe the data distribution. 3. **Explore the Data**: - Perform exploratory data analysis (EDA) to identify patterns, trends, and anomalies. - Use visualization tools (e.g., scatter plots, heatmaps) to explore relationships between variables. 4. **Verify Data Quality**: - Check for missing values and handle them appropriately. - Identify and address outliers and inconsistencies in the data. - Assess the overall quality of the data to ensure it is suitable for analysis. """) st.write("## Quiz: Conceptual Questions") q1 = st.radio("What is the main purpose of the Data Understanding phase?", ["Collect data", "Describe data", "Explore data", "All of the above"]) if q1 == "All of the above": st.success("✅ Correct!") else: st.error("❌ Incorrect. The main purpose is to collect, describe, and explore data.") st.write("## Learning Resources") st.markdown(""" - 📘 [CRISP-DM Guide](https://www.sv-europe.com/crisp-dm-methodology/) - 🎓 [Data Understanding in Data Science](https://towardsdatascience.com/data-understanding-in-data-science-1a1d5e8b1c3d) - 🔬 [Exploratory Data Analysis (EDA)](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-a-step-by-step-guide/) """)