Modelos de machine learning para el análisis de calidad del agua y su contribución para la agricultura

Delgado Pinedo, Daniel Francisco; Trujillo Vasquez, Sergio Alejandro

Ver/

Descargar
(application/pdf: 886.1Kb)

Autorización
(application/pdf: 229.3Kb)

Reporte de similitud
(application/pdf: 1.162Mb)

Fecha

2025

Resumen

El presente trabajo aplica modelos de machine learning para analizar los parámetros que influyen en la calidad del agua en la cuenca del río Chancay - Lambayeque, con especial énfasis en su impacto sobre la agricultura. Se utilizó la metodología CRISP-DM, empleando la plataforma Orange para el modelado y evaluación de distintos modelos de machine learning (multiple linear regression (MLR), support vector machine (SVM), decision tree (DT), random forest (RF), artificial neural network (ANN) y xgboost (extreme gradient boosting)). Los modelos fueron evaluados mediante R2, MAE, MSE y RMSE, destacando el buen desempeño del Random Forest, Decision Tree, ANN para el análisis del OD con un R2 de 0,741, 0714 y 0,785 respectivamente; para el análisis de la DBO los resultados del R2 fueron 0,856, 0,901 y 0,871. Por otro lado, el modelo de XGBoost sólo presentó buenos resultados con la DBO, siendo un R2 de 0,815. Los parámetros con mayor relevancia para las variables Oxígeno Disuelto (OD) y Demanda Bioquímica de Oxígeno (DBO) fueron el Cadmio y Fósforo Total, presentando un score máximo de 0,684 y 0,287 respectivamente. Asimismo, los parámetros que también presentaron un buen score fueron Boro y Litio (para OD) y Nitrógeno Total y Cromo Total (para DBO), según análisis de importancia de atributos (RReliefF y Score obtenido). Se recomienda priorizar el monitoreo de estos elementos y revisar su inclusión en los Estándares de Calidad Ambiental para el Agua (ECA) para categoría 3 (riego y bebida de animales). Los resultados evidencian el potencial del machine learning como herramienta para mejorar la gestión hídrica en contextos agrícolas.

This study applies machine learning models to analyze the parameters that influence water quality in the Chancay - Lambayeque river basin, with a special focus on their impact on agriculture. The CRISP-DM methodology was used, employing the Orange platform for modeling and evaluating various machine learning models (multiple linear regression (MLR), support vector machine (SVM), decision tree (DT), random forest (RF), artificial neural network (ANN), and XGBoost (extreme gradient boosting)). The models were evaluated using R², MAE, MSE, and RMSE, highlighting the strong performance of Random Forest, Decision Tree, and ANN in the analysis of dissolved oxygen (DO), with R² values of 0.741, 0.714, and 0.785 respectively. For the analysis of biochemical oxygen demand (BOD), the R² results were 0.856, 0.901, and 0.871, respectively. On the other hand, the XGBoost model only performed well for BOD, achieving an R² of 0.815. The most relevant parameters for the variables Dissolved Oxygen (DO) and Biochemical Oxygen Demand (BOD) were Cadmium and Total Phosphorus, with maximum scores of 0.684 and 0.287, respectively. Additionally, parameters such as Boron and Lithium (for DO), and Total Nitrogen and Total Chromium (for BOD) also showed strong scores, according to attribute importance analysis (RReliefF and obtained scores). It is recommended to prioritize the monitoring of these elements and to review their inclusion in the Water Enviroment Quality Standards (EQS) for category 3 (irrigation and animal drinking). The results demonstrate the potential of machine learning as a tool to improve water management in agricultural contexts.

URI

https://hdl.handle.net/20.500.12724/23146

Editor

Universidad de Lima

Temas

Pendiente

Coleccion(es)

Tesis [1207]

Excepto si se señala otra cosa, la licencia del ítem se describe como info:eu-repo/semantics/openAccess