Dr. José Guevara Coto

Dr. José Guevara Coto

Es estudiante: 
No
Programa en que estudia: 

Proyectos

Publicaciones

Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation

Descripción:

Studies in software effort estimation (SEE) have explored the use of hyper-parameter tuning for machine learning algorithms (MLA) to improve the accuracy of effort estimates. In other contexts random search (RS) has shown similar results to grid search, while being less computationally-expensive. In this paper, we investigate to what extent the random search hyper-parameter tuning approach affects the accuracy and stability of support vector regression (SVR) in SEE. Results were compared to those obtained from ridge regression models and grid search-tuned models. A case study with four data sets extracted from the ISBSG 2018 repository shows that random search exhibits similar performance to grid search, rendering it an attractive alternative technique for hyper-parameter tuning. RS-tuned SVR achieved an increase of 0.227 standardized accuracy (SA) with respect to default hyper-parameters. In addition, random search improved prediction stability of SVR models to a minimum ratio of 0.840. The analysis showed that RS-tuned SVR attained performance equivalent to GS-tuned SVR. Future work includes extending this research to cover other hyper-parameter tuning approaches and machine learning algorithms, as well as using additional data sets.

Tipo de publicación: Conference Paper

Publicado en: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

Classifying and Understanding Tor Traffic Using Tree-Based Models

Descripción:

Over the past years the use of anonymization services has gained significant relevance as more users are interested in protecting their data and privacy on the internet. One of the most popular ways to achieve this result is Tor. The anonymity and untraceability that Tor provides, however, can also be used by ill-intentioned users who try to take advantage of bypassing security control and policies. The Cybersecurity and Infrastructure Security Agency (CISA) mentions two methods of recognizing Tor traffic in the enterprise: indicator- or behavior-based analysis. The first one uses log analysis and lists of Tor exit nodes to identify the suspicious activity while the latter inspects patterns in TCP and UDP ports, DNS queries and inspecting the payload of the packets. In this paper, we propose a different approach using white-box machine learning models such as decision trees and Random Forest. On the one hand, our classifier achieves accuracy levels above 95%. On the other hand, our approach is the first one to allow understanding the importance of each traffic feature in the classification. Our results demonstrate that the TCP window size, the frame size and time related traffic features can be used to identify Tor traffic. In this paper we will describe a Machine Learning methodology used to identify Tor network traffic utilizing decision trees C5.0 and Random Forest. We followed a white-box approach and accomplished accuracy of over 95% in the prediction in both models. We also present an analysis of the importance of the top predictor variables.

Tipo de publicación: Conference Paper

Publicado en: 2020 IEEE Latin-American Conference on Communications (LATINCOM)

Tor Traffic Classification using Decision Trees

Descripción:

The amount of users interested in protecting their data and privacy on the Internet has increased lately. This has augmented the popularity of anonymization services such as Tor. However, the anonymization and the complication of being tracked provided by Tor has also been used for illintended purposes, such as evading security policies and controls. In this work, we implemented and evaluated an offline Tor traffic detector using white-box machine learning algorithms such as decision trees and random forests. On the one hand, our classifier achieves precision levels above 99 %. On the other hand, our approach is the first one to allow understanding and interpreting the classifier, thus understanding which variables play a significant role in the classification. We show that TCP window size, packet size and some time-related features can be used to identify Tor traffic.

Tipo de publicación: Conference Paper

Publicado en: 2023 XLIX Latin American Computer Conference (CLEI)