Spaces:
Sleeping
Sleeping
\documentclass{article}% | |
\usepackage[T1]{fontenc}% | |
\usepackage[utf8]{inputenc}% | |
\usepackage{lmodern}% | |
\usepackage{textcomp}% | |
\usepackage{lastpage}% | |
\usepackage{graphicx}% | |
% | |
\title{EV Dataset Hypothesis-Driven Report}% | |
\author{AutoStatAgent}% | |
\date{\today}% | |
% | |
\begin{document}% | |
\normalsize% | |
\maketitle% | |
\section{Summary Statistics}% | |
\label{sec:SummaryStatistics}% | |
\begin{tabular}{lllrrlrrrrrrlrlrllrrrll} | |
\toprule | |
& brand & model & top_speed_kmh & battery_capacity_kWh & battery_type & number_of_cells & torque_nm & efficiency_wh_per_km & range_km & acceleration_0_100_s & fast_charging_power_kw_dc & fast_charge_port & towing_capacity_kg & cargo_volume_l & seats & drivetrain & segment & length_mm & width_mm & height_mm & car_body_type & source_url \\ | |
\midrule | |
count & 478 & 477 & 478.000000 & 478.000000 & 478 & 276.000000 & 471.000000 & 478.000000 & 478.000000 & 478.000000 & 477.000000 & 477 & 452.000000 & 477 & 478.000000 & 478 & 478 & 478.000000 & 478.000000 & 478.000000 & 478 & 478 \\ | |
unique & 59 & 477 & NaN & NaN & 1 & NaN & NaN & NaN & NaN & NaN & NaN & 2 & NaN & 140 & NaN & 3 & 15 & NaN & NaN & NaN & 8 & 478 \\ | |
top & Mercedes-Benz & X Privilege AWD (MY25) & NaN & NaN & Lithium-ion & NaN & NaN & NaN & NaN & NaN & NaN & CCS & NaN & 520 & NaN & AWD & JC - Medium & NaN & NaN & NaN & SUV & https://ev-database.org/car/3178/firefly-firefly \\ | |
freq & 42 & 1 & NaN & NaN & 478 & NaN & NaN & NaN & NaN & NaN & NaN & 476 & NaN & 17 & NaN & 191 & 91 & NaN & NaN & NaN & 244 & 1 \\ | |
mean & NaN & NaN & 185.487448 & 74.043724 & NaN & 485.293478 & 498.012739 & 162.903766 & 393.179916 & 6.882636 & 125.008386 & NaN & 1052.261062 & NaN & 5.263598 & NaN & NaN & 4678.506276 & 1887.359833 & 1601.125523 & NaN & NaN \\ | |
std & NaN & NaN & 34.252773 & 20.331058 & NaN & 1210.819733 & 241.461128 & 34.317532 & 103.287335 & 2.730696 & 58.205012 & NaN & 737.851774 & NaN & 1.003961 & NaN & NaN & 369.210573 & 73.656807 & 130.754851 & NaN & NaN \\ | |
min & NaN & NaN & 125.000000 & 21.300000 & NaN & 72.000000 & 113.000000 & 109.000000 & 135.000000 & 2.200000 & 29.000000 & NaN & 0.000000 & NaN & 2.000000 & NaN & NaN & 3620.000000 & 1610.000000 & 1329.000000 & NaN & NaN \\ | |
25% & NaN & NaN & 160.000000 & 60.000000 & NaN & 150.000000 & 305.000000 & 143.000000 & 320.000000 & 4.800000 & 80.000000 & NaN & 500.000000 & NaN & 5.000000 & NaN & NaN & 4440.000000 & 1849.000000 & 1514.000000 & NaN & NaN \\ | |
50% & NaN & NaN & 180.000000 & 76.150000 & NaN & 216.000000 & 430.000000 & 155.000000 & 397.500000 & 6.600000 & 113.000000 & NaN & 1000.000000 & NaN & 5.000000 & NaN & NaN & 4720.000000 & 1890.000000 & 1596.000000 & NaN & NaN \\ | |
75% & NaN & NaN & 201.000000 & 90.600000 & NaN & 324.000000 & 679.000000 & 177.750000 & 470.000000 & 8.200000 & 150.000000 & NaN & 1600.000000 & NaN & 5.000000 & NaN & NaN & 4961.000000 & 1939.000000 & 1665.000000 & NaN & NaN \\ | |
max & NaN & NaN & 325.000000 & 118.000000 & NaN & 7920.000000 & 1350.000000 & 370.000000 & 685.000000 & 19.100000 & 281.000000 & NaN & 2500.000000 & NaN & 9.000000 & NaN & NaN & 5908.000000 & 2080.000000 & 1986.000000 & NaN & NaN \\ | |
\bottomrule | |
\end{tabular} | |
% | |
\section{Generated Hypotheses}% | |
\label{sec:GeneratedHypotheses}% | |
{-} \#\#\# 5 Intelligent and Statistically Meaningful Questions for Data Analysis:\newline% | |
% | |
{-} 1. **Performance Efficiency Analysis**: \newline% | |
% | |
{-} *What is the relationship between energy efficiency (efficiency\_wh\_per\_km) and range (range\_km), and how does this interaction correlate with acceleration performance (acceleration\_0\_100\_s) and top speed (top\_speed\_kmh)?* \newline% | |
% | |
{-} *Insight:* Investigate trade{-}offs between speed, acceleration, range, and energy loss (via slope efficiency vs. range). Use Pearson or Spearman correlation, linear regression, or grouped box plots to explore this link.\newline% | |
% | |
{-} 2. **Battery Utilization Across Brands**: \newline% | |
% | |
{-} *Does battery capacity (battery\_capacity\_kWh) and the type of battery (battery\_type) correlate with usable range (range\_km) and efficiency (efficiency\_wh\_per\_km)? How do brands vary in battery utilization to achieve similar ranges?* \newline% | |
% | |
{-} *Insight:* Perform ANOVA or t{-}tests, paired with regression analysis to assess battery capacity vs. range. Stratify by battery type and brand for nuanced comparisons.\newline% | |
% | |
{-} 3. **Towing Capacity and Electromechanical Features**: \newline% | |
% | |
{-} *How does torque (torque\_nm) and battery capacity (battery\_capacity\_kWh) relate to towing capacity (towing\_capacity\_kg)? Are high{-}torque EVs prioritizing battery size for towing, and does this phase lag across segments (e.g., economy vs. performance)?* \newline% | |
% | |
{-} *Insight:* Use regression to model torque and battery capacity as predictors of towing. Map results onto drivetrain segments to spot inconsistencies.\newline% | |
% | |
{-} 4. **Segment Variations in EV Design**: \newline% | |
% | |
{-} *What statistical variations exist in range (range\_km), efficiency (efficiency\_wh\_per\_km), and body dimensions (length\_mm, width\_mm, height\_mm) across car segments? Do larger or more capable segments sacrifice efficiency for size/capability?* \newline% | |
% | |
{-} *Insight:* Apply ANOVA or Kruskal{-}Wallis tests comparing segments. Cluster analysis could uncover design trade{-}offs between efficiency, size, and utility.\newline% | |
% | |
{-} 5. **Chartering Infrastructure vs. Practicality Features**: \newline% | |
% | |
{-} *Does fast{-}charging availability (fast\_charging\_power\_kw\_dc) correlate with practical features like seats, cargo volume, and fuel economy? How does this relationship differ between battery types and brands?* \newline% | |
% | |
{-} *Insight:* Regression analysis testing charging power as a covariate with utility features. Stratify by battery type and brand to validate "charging{-}friendly" design trends.\newline% | |
% | |
{-} {-}{-}{-}\newline% | |
% | |
{-} \#\#\# Tools and Methods:\newline% | |
% | |
{-} {-} **EDA:** Scatter plots, box plots, density plots, correlation matrices, and dimensionality reduction (PCA for clustering technical features).\newline% | |
% | |
{-} {-} **Statistical Tests:** *t*{-}tests for brand comparisons, ANOVA for segments, Pearson/Spearman for correlations, and linear regressions to model feature interactions.\newline% | |
% | |
{-} {-} **Advanced Techniques:** Clustering (K{-}means) to group EVs by performance/utility profiles, factor analysis to decompose battery{-}related metrics, or time{-}series analysis (if time data exists) to track norm changes. \newline% | |
% | |
{-} These questions surface key trade{-}offs, inconsistencies, and innovations within EV design and market segmentation.\newline% | |
% | |
\section{Hypothesis Tests \& Visualizations}% | |
\label{sec:HypothesisTestsVisualizations}% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: \#\#\# 5 Intelligent and Statistically Meaningful Questions for Data Analysis: | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: 1. **Performance Efficiency Analysis**: | |
% | |
\subsection{Correlation between top\_speed\_kmh and efficiency\_wh\_per\_km}% | |
\label{subsec:Correlationbetweentopspeedkmhandefficiencywhperkm}% | |
Correlation between top\_speed\_kmh and efficiency\_wh\_per\_km: r = 0.17, p = 0.0002% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/top_speed_kmh_vs_efficiency_wh_per_km_scatter.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: *Insight:* Investigate trade{-}offs between speed, acceleration, range, and energy loss (via slope efficiency vs. range). Use Pearson or Spearman correlation, linear regression, or grouped box plots to explore this link. | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: 2. **Battery Utilization Across Brands**: | |
% | |
\subsection{ANOVA result for brand vs battery\_capacity\_kWh}% | |
\label{subsec:ANOVAresultforbrandvsbatterycapacitykWh}% | |
ANOVA result for brand vs battery\_capacity\_kWh: p = 0.0000% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/brand_vs_battery_capacity_kWh_boxplot.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: *Insight:* Perform ANOVA or t{-}tests, paired with regression analysis to assess battery capacity vs. range. Stratify by battery type and brand for nuanced comparisons. | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: 3. **Towing Capacity and Electromechanical Features**: | |
% | |
\subsection{Correlation between battery\_capacity\_kWh and torque\_nm}% | |
\label{subsec:CorrelationbetweenbatterycapacitykWhandtorquenm}% | |
Correlation between battery\_capacity\_kWh and torque\_nm: r = nan, p = nan% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/battery_capacity_kWh_vs_torque_nm_scatter.png}% | |
\end{figure} | |
% | |
\subsection{Chi{-}square test between model and drivetrain}% | |
\label{subsec:Chi{-}squaretestbetweenmodelanddrivetrain}% | |
Chi{-}square test between model and drivetrain: χ² = 954.00, p = 0.4756% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/model_vs_drivetrain_heatmap.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: 4. **Segment Variations in EV Design**: | |
% | |
\subsection{Correlation between efficiency\_wh\_per\_km and range\_km}% | |
\label{subsec:Correlationbetweenefficiencywhperkmandrangekm}% | |
Correlation between efficiency\_wh\_per\_km and range\_km: r = 0.02, p = 0.6168% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/efficiency_wh_per_km_vs_range_km_scatter.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: *Insight:* Apply ANOVA or Kruskal{-}Wallis tests comparing segments. Cluster analysis could uncover design trade{-}offs between efficiency, size, and utility. | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: 5. **Chartering Infrastructure vs. Practicality Features**: | |
% | |
\subsection{ANOVA result for brand vs fast\_charging\_power\_kw\_dc}% | |
\label{subsec:ANOVAresultforbrandvsfastchargingpowerkwdc}% | |
ANOVA result for brand vs fast\_charging\_power\_kw\_dc: p = 0.0000% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/brand_vs_fast_charging_power_kw_dc_boxplot.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: *Insight:* Regression analysis testing charging power as a covariate with utility features. Stratify by battery type and brand to validate "charging{-}friendly" design trends. | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: {-}{-}{-} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: \#\#\# Tools and Methods: | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: {-} **EDA:** Scatter plots, box plots, density plots, correlation matrices, and dimensionality reduction (PCA for clustering technical features). | |
% | |
\subsection{Chi{-}square test between brand and model}% | |
\label{subsec:Chi{-}squaretestbetweenbrandandmodel}% | |
Chi{-}square test between brand and model: χ² = 27189.00, p = 0.4023% | |
\begin{figure}[h!]% | |
\centering% | |
\includegraphics[width=0.8\linewidth]{outputs/plots/brand_vs_model_heatmap.png}% | |
\end{figure} | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: {-} **Advanced Techniques:** Clustering (K{-}means) to group EVs by performance/utility profiles, factor analysis to decompose battery{-}related metrics, or time{-}series analysis (if time data exists) to track norm changes. | |
% | |
\subsection{Insufficient columns for hypothesis}% | |
\label{subsec:Insufficientcolumnsforhypothesis}% | |
Insufficient columns for hypothesis: These questions surface key trade{-}offs, inconsistencies, and innovations within EV design and market segmentation. | |
% | |
\end{document} |