File size: 3,586 Bytes
de2b822
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import streamlit as st

def run():
    st.title("Data Understanding")
    
    st.write("## Overview")
    st.write("""
    Data Understanding is the second phase of the CRISP-DM process. It involves collecting initial data, describing the data, exploring the data, and verifying data quality.
    """)

    st.write("## Key Concepts & Explanations")
    st.markdown("""
    - **Data Collection**: Gathering data from various sources.
    - **Data Description**: Summarizing the main characteristics of the data.
    - **Data Exploration**: Using statistical and visualization techniques to understand the data.
    - **Data Quality Verification**: Ensuring the data is accurate, complete, and reliable.
    """)

    st.write("## Introduction")
    st.write("""
    The Data Understanding phase is crucial for identifying potential issues with the data and gaining insights that will inform the subsequent phases of the CRISP-DM process.
    """)

    st.header("Objectives")
    st.write("""
    - **Collect Initial Data**: Gather data from various sources to get a comprehensive dataset.
    - **Describe the Data**: Summarize the main characteristics of the data, including its structure and content.
    - **Explore the Data**: Use statistical and visualization techniques to identify patterns, trends, and anomalies.
    - **Verify Data Quality**: Assess the quality of the data to ensure it is suitable for analysis.
    """)

    st.header("Key Activities")
    st.write("""
    - **Data Collection**: Gather data from internal and external sources.
    - **Data Description**: Generate summary statistics and visualizations to describe the data.
    - **Data Exploration**: Perform exploratory data analysis (EDA) to uncover patterns and relationships.
    - **Data Quality Verification**: Check for missing values, outliers, and inconsistencies in the data.
    """)

    st.write("## Detailed Steps")
    st.write("""
    1. **Collect Initial Data**:
        - Identify relevant data sources.
        - Extract data from various sources and consolidate it into a single dataset.
    2. **Describe the Data**:
        - Generate summary statistics (e.g., mean, median, standard deviation).
        - Create visualizations (e.g., histograms, box plots) to describe the data distribution.
    3. **Explore the Data**:
        - Perform exploratory data analysis (EDA) to identify patterns, trends, and anomalies.
        - Use visualization tools (e.g., scatter plots, heatmaps) to explore relationships between variables.
    4. **Verify Data Quality**:
        - Check for missing values and handle them appropriately.
        - Identify and address outliers and inconsistencies in the data.
        - Assess the overall quality of the data to ensure it is suitable for analysis.
    """)

    st.write("## Quiz: Conceptual Questions")
    q1 = st.radio("What is the main purpose of the Data Understanding phase?", ["Collect data", "Describe data", "Explore data", "All of the above"])
    if q1 == "All of the above":
        st.success("βœ… Correct!")
    else:
        st.error("❌ Incorrect. The main purpose is to collect, describe, and explore data.")

    st.write("## Learning Resources")
    st.markdown("""
    - πŸ“˜ [CRISP-DM Guide](https://www.sv-europe.com/crisp-dm-methodology/)
    - πŸŽ“ [Data Understanding in Data Science](https://towardsdatascience.com/data-understanding-in-data-science-1a1d5e8b1c3d)
    - πŸ”¬ [Exploratory Data Analysis (EDA)](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-a-step-by-step-guide/)
    """)