license: apache-2.0
metrics:
- accuracy
pipeline_tag: tabular-classification
library_name: ml-agents
EMIT Model - Environmental Monitoring and Intelligence Tool
Title
EMIT Model - Environmental Monitoring and Intelligence Tool
Overview
The EMIT Model (Environmental Monitoring and Intelligence Tool) is an advanced XGBoost classifier specifically designed to predict potential mining areas by analyzing environmental data. Developed as part of the EMiTAL (Environmental Monitoring and Intelligence Tool Algorithm) framework, this tool combines cutting-edge Remote Sensing techniques with RayCasting and Polygon Gridding to achieve high-precision identification of viable mining zones.
Goal
The goal of this model is to support decision-making in mining by providing a predictive tool that identifies areas with high mining potential based on environmental characteristics. This application is highly valuable for regulatory bodies, mining companies, and environmental agencies looking to balance resource extraction with ecological sustainability.
Framework: EMiTAL
The EMiTAL framework serves as the foundation for this model. It includes a unique combination of algorithms and data collection methods:
- Remote Sensing: Utilized to gather extensive environmental data on a regional scale, including soil, vegetation, and atmospheric readings.
- RayCasting and Polygon Gridding (RGP): This technique segments geographic regions into grids, allowing precise localization of areas under study.
- Environmental Indicators: The model leverages data such as:
- NDVI (Normalized Difference Vegetation Index): Captures vegetation health.
- NDWI (Normalized Difference Water Index): Measures surface moisture and water presence.
- NDTI (Normalized Difference Tillage Index): Identifies soil disturbance and human activity impact.
- Air Quality Metrics: NO2, PM10, and CO concentrations to assess environmental impact factors.
Model Pipeline
The pipeline includes preprocessing and feature engineering stages to optimize environmental data for classification. The EMiTAL framework ensures precise location detection using the RGP algorithm, and remote sensing data covers all latitudes and longitudes under consideration.
Model Specifications
Model Type: XGBoost Classifier
Objective: Binary classification to determine if a region is suitable for mining (
True
for viable,False
for non-viable).Training Metrics:
- Accuracy Score: Measures the proportion of correct predictions.
- Mean Absolute Error (MAE): Average error in prediction.
- R-Squared: Assesses how well features explain label variability.
- High Confidence Accuracy: Accuracy for predictions with a confidence level above 90%.
Data Split: The dataset was divided as follows:
- Training: 70%
- Validation: 20%
- Testing: 10%
Input Data Features
- Latitude and Longitude coordinates for precise geolocation.
- Vegetation Index: Indicates vegetation density and type.
- NDVI, NDWI, NDTI: Capture soil and surface characteristics crucial for identifying mining areas.
- Land Elevation: Provides terrain insights.
- NO2, PM10, CO: Environmental pollution metrics critical for impact analysis.
Usage Instructions
To use this model, prepare your dataset with the required environmental features. Ensure the feature names match those in the training dataset for optimal results. Predictions can be generated using the pre-trained model with the following script:
import joblib
import pandas as pd
# Load the model
model = joblib.load("emit_model.joblib")
# Load and preprocess your data
data = pd.read_csv("path/to/your/data.csv")
predictions = model.predict(data)
Model Performance Metrics
The EMIT Model's performance was evaluated using various metrics, giving insights into its accuracy, error rates, and ability to generalize predictions effectively. Below are the primary metrics observed during testing:
Accuracy: 0.8125
The accuracy score represents the proportion of correct predictions out of the total predictions, with the model correctly predicting the target class 81.25% of the time.Mean Absolute Error (MAE): 0.1875
MAE indicates the average difference between predicted and actual values, with an error rate of approximately 18.75%.R-Squared: 0.238
The R-Squared metric, or coefficient of determination, suggests that around 23.8% of the variance in the target variable can be explained by the model’s features, providing some insight into feature influence on predictions.
Classification Report
The classification report provides a detailed look at the precision, recall, and F1-score across both classes (True and False):
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
False | 1.00 | 0.57 | 0.73 | 7 |
True | 0.75 | 1.00 | 0.86 | 9 |
- Precision measures the accuracy of the positive predictions.
- Recall represents the proportion of actual positives that were correctly identified by the model.
- F1-Score is the harmonic mean of precision and recall, providing a balanced view of accuracy for each class.
Overall Classification Report:
- Accuracy: 0.81 (or 81%)
- Macro Average: Precision = 0.88, Recall = 0.79, F1-score = 0.79
- Weighted Average: Precision = 0.86, Recall = 0.81, F1-score = 0.80
Confusion Matrix
The confusion matrix below shows the true positive, true negative, false positive, and false negative counts:
Predicted False | Predicted True | |
---|---|---|
Actual False | 4 | 3 |
Actual True | 0 | 9 |
This matrix indicates:
- True Positives (TP): 9
- True Negatives (TN): 4
- False Positives (FP): 3
- False Negatives (FN): 0
The model demonstrates a high recall for the True
class, correctly identifying all actual True
instances (recall of 1.00) but has lower precision for the False
class, with some misclassification present.
Authors and Team
Developed by Team Explorers, a collaborative effort aimed at advancing predictive environmental analysis:
- Joseph Ackon
- Felix Kudjo Mlagada
- Aristotle Mbroh
- Prince Mawuko Dzorkpe
- Manford Ehuntem
Acknowledgments
Our work would not have been possible without the support and resources provided by:
- Takoradi Technical University
- Data Hackathon Ghana Statistical Service, 2024
The dataset was created using the EMiTAL architecture, complemented by insights from StatsBank and Common Data Resources.
Model Repository and Future Development
This model is hosted on Hugging Face, enabling others to leverage it for environmental mining predictions. Future updates may include refined data collection techniques and enhancements to incorporate additional environmental variables.
How to Contribute
If you would like to contribute to this project, please reach out to the team for collaboration opportunities. We welcome insights, data contributions, and suggestions for model improvements.
This expanded version incorporates all of the specific details you requested, providing a comprehensive model card that would be informative and helpful for users on Hugging Face. Let me know if you need further customization!