metadata

license: apache-2.0
metrics:
  - accuracy
pipeline_tag: tabular-classification
library_name: ml-agents

EMIT Model - Environmental Monitoring and Intelligence Tool

Title

EMIT Model - Environmental Monitoring and Intelligence Tool

Overview

The EMIT Model (Environmental Monitoring and Intelligence Tool) is an advanced XGBoost classifier specifically designed to predict potential mining areas by analyzing environmental data. Developed as part of the EMiTAL (Environmental Monitoring and Intelligence Tool Algorithm) framework, this tool combines cutting-edge Remote Sensing techniques with RayCasting and Polygon Gridding to achieve high-precision identification of viable mining zones.

Goal

The goal of this model is to support decision-making in mining by providing a predictive tool that identifies areas with high mining potential based on environmental characteristics. This application is highly valuable for regulatory bodies, mining companies, and environmental agencies looking to balance resource extraction with ecological sustainability.

Framework: EMiTAL

The EMiTAL framework serves as the foundation for this model. It includes a unique combination of algorithms and data collection methods:

Remote Sensing: Utilized to gather extensive environmental data on a regional scale, including soil, vegetation, and atmospheric readings.
RayCasting and Polygon Gridding (RGP): This technique segments geographic regions into grids, allowing precise localization of areas under study.
Environmental Indicators: The model leverages data such as:
- NDVI (Normalized Difference Vegetation Index): Captures vegetation health.
- NDWI (Normalized Difference Water Index): Measures surface moisture and water presence.
- NDTI (Normalized Difference Tillage Index): Identifies soil disturbance and human activity impact.
- Air Quality Metrics: NO2, PM10, and CO concentrations to assess environmental impact factors.

Model Pipeline

The pipeline includes preprocessing and feature engineering stages to optimize environmental data for classification. The EMiTAL framework ensures precise location detection using the RGP algorithm, and remote sensing data covers all latitudes and longitudes under consideration.

Model Specifications

Model Type: XGBoost Classifier
Objective: Binary classification to determine if a region is suitable for mining (True for viable, False for non-viable).
Training Metrics:
- Accuracy Score: Measures the proportion of correct predictions.
- Mean Absolute Error (MAE): Average error in prediction.
- R-Squared: Assesses how well features explain label variability.
- High Confidence Accuracy: Accuracy for predictions with a confidence level above 90%.
Data Split: The dataset was divided as follows:
- Training: 70%
- Validation: 20%
- Testing: 10%

Input Data Features

Latitude and Longitude coordinates for precise geolocation.
Vegetation Index: Indicates vegetation density and type.
NDVI, NDWI, NDTI: Capture soil and surface characteristics crucial for identifying mining areas.
Land Elevation: Provides terrain insights.
NO2, PM10, CO: Environmental pollution metrics critical for impact analysis.

Usage Instructions

To use this model, prepare your dataset with the required environmental features. Ensure the feature names match those in the training dataset for optimal results. Predictions can be generated using the pre-trained model with the following script:

import joblib
import pandas as pd

# Load the model
model = joblib.load("emit_model.joblib")

# Load and preprocess your data
data = pd.read_csv("path/to/your/data.csv")
predictions = model.predict(data)

Model Performance Metrics

The EMIT Model's performance was evaluated using various metrics, giving insights into its accuracy, error rates, and ability to generalize predictions effectively. Below are the primary metrics observed during testing:

Accuracy: 0.8125
The accuracy score represents the proportion of correct predictions out of the total predictions, with the model correctly predicting the target class 81.25% of the time.
Mean Absolute Error (MAE): 0.1875
MAE indicates the average difference between predicted and actual values, with an error rate of approximately 18.75%.
R-Squared: 0.238
The R-Squared metric, or coefficient of determination, suggests that around 23.8% of the variance in the target variable can be explained by the model’s features, providing some insight into feature influence on predictions.

Classification Report

The classification report provides a detailed look at the precision, recall, and F1-score across both classes (True and False):

Class	Precision	Recall	F1-score	Support
False	1.00	0.57	0.73	7
True	0.75	1.00	0.86	9

Precision measures the accuracy of the positive predictions.
Recall represents the proportion of actual positives that were correctly identified by the model.
F1-Score is the harmonic mean of precision and recall, providing a balanced view of accuracy for each class.

Overall Classification Report:

Accuracy: 0.81 (or 81%)
Macro Average: Precision = 0.88, Recall = 0.79, F1-score = 0.79
Weighted Average: Precision = 0.86, Recall = 0.81, F1-score = 0.80

Confusion Matrix

The confusion matrix below shows the true positive, true negative, false positive, and false negative counts:

	Predicted False	Predicted True
Actual False	4	3
Actual True	0	9

This matrix indicates:

True Positives (TP): 9
True Negatives (TN): 4
False Positives (FP): 3
False Negatives (FN): 0

The model demonstrates a high recall for the True class, correctly identifying all actual True instances (recall of 1.00) but has lower precision for the False class, with some misclassification present.

Authors and Team

Developed by Team Explorers, a collaborative effort aimed at advancing predictive environmental analysis:

Joseph Ackon
Felix Kudjo Mlagada
Aristotle Mbroh
Prince Mawuko Dzorkpe
Manford Ehuntem

Acknowledgments

Our work would not have been possible without the support and resources provided by:

Takoradi Technical University
Data Hackathon Ghana Statistical Service, 2024

The dataset was created using the EMiTAL architecture, complemented by insights from StatsBank and Common Data Resources.

Model Repository and Future Development

This model is hosted on Hugging Face, enabling others to leverage it for environmental mining predictions. Future updates may include refined data collection techniques and enhancements to incorporate additional environmental variables.

How to Contribute

If you would like to contribute to this project, please reach out to the team for collaboration opportunities. We welcome insights, data contributions, and suggestions for model improvements.


This expanded version incorporates all of the specific details you requested, providing a comprehensive model card that would be informative and helpful for users on Hugging Face. Let me know if you need further customization!