Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning

Mantilla Saavedra, Camila Stefany; Gutiérrez Cárdenas, Juan Manuel

dc.contributor.author	Mantilla Saavedra, Camila Stefany
dc.contributor.author	Gutiérrez Cárdenas, Juan Manuel
dc.contributor.other	Gutiérrez Cárdenas, Juan Manuel
dc.date.accessioned	2023-02-07T15:53:20Z
dc.date.available	2023-02-07T15:53:20Z
dc.date.issued	2022
dc.identifier.citation	Mantilla-Saavedra, C. & Gutiérrez-Cárdenas, J. (2022). Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning. En J. A. Lossio-Ventura, J. Valverde-Rebaza, E. Díaz, D. Muñante, C. Gavidia-Calderon, A. D. B. Valejo & H. Alatrista-Salas (Eds.), Information Management and Big Data: Eighth Annual International Conference, SIMBig 2021, Proceedings, Communications in Computer and Information Science (vol. 1577, pp. 253-263). Springer. https://doi.org/10.1007/978-3-031-04447-2_17	es_PE
dc.identifier.issn	1865-0929
dc.identifier.uri	https://hdl.handle.net/20.500.12724/17555
dc.description	Indexado en Scopus	es_PE
dc.description.abstract	In recent years, suicide has become one of the most critical issues regarding public health between teenagers and adults. On the other hand, the growth and wide-spread of social networks and mobile devices have allowed us to compile relevant information that helps us understand the thoughts, feelings, and emotions extracted from these platforms. The detection of suicidal traits on social media has be-come one relevant research topic. It has permitted the identification of probable suicide traits among media users by examining their posts on known social net-works such as Reddit. For that reason, the purpose of the present research is to compare different supervised classification models such as Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boosting, and XGBoost; together with feature extraction techniques such as TF-IDF and Glove. The results from our experiments show that the best model is SVM with TF-IDF obtaining metrics of 91.50% in Accuracy, 92.40% in Precision, 90.30% in Re-call, and 91.50% regarding the F1-score. This study also shows that TF-IDF for feature extraction outperforms Glove when applied to the different models tested.	es_PE
dc.format	application/pdf	es_PE
dc.language.iso	eng	es_PE
dc.publisher	Springer	es_PE
dc.relation.ispartof	urn:issn:18650929
dc.relation.ispartof	urn:isbn:978-303104446-5
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.source	Repositorio Institucional - Ulima	es_PE
dc.source	Universidad de Lima	es_PE
dc.subject	Suicidio	es_PE
dc.subject	Redes sociales	es_PE
dc.subject	Programación neurolingüística	es_PE
dc.subject	Suicide	es_PE
dc.subject	Social networks	es_PE
dc.subject	Neurolinguistic programming	es_PE
dc.title	Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning	es_PE
dc.type	info:eu-repo/semantics/conferenceObject
dc.type.other	Artículo de conferencia en Scopus	es_PE
dc.identifier.journal	Communications in Computer and Information Science	es_PE
dc.publisher.country	CH	es_PE
dc.description.peer-review	Revisión por pares	es_PE
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#2.02.04
dc.identifier.doi	https://doi.org/10.1007/978-3-031-04447-2_17
dc.type.version	info:eu-repo/semantics/publishedVersion
dc.contributor.student	Mantilla Saavedra, Camila Stefany (Ingeniería de Sistemas)	es_PE
ulima.cat	009
ulima.autor.afiliacion	Universidad de Lima (Scopus)	es_PE
ulima.autor.carrera	Ingeniería de Sistemas	es_PE
dc.identifier.scopusid	2-s2.0-85128982461

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ingeniería de Sistemas [86]

Mostrar el registro sencillo del ítem