Student Seminar 2020-01-20


Laboratoire de Recherche et Développement de l’EPITA
Séminaire des étudiants-chercheurs
20 January 2020
13h00-16h30, Amphi 1
14-16 rue Voltaire
94276 Le Kremlin-Bicêtre


13h00 Detecting Botnets Behaviors over Network Flows using Hidden Markov ModelsAntoine Sainson

Botnets are one of the most common and powerful cyber attacks tools, from DDoS attacks to crypto currencies mining. Due to the extreme diversity of Botnets types and interactions, it is very difficult to detect their influence using pay-load data only. Within this contextthe goal is to build a Botnets detection system using metadata information from network flows. To do so, we propose a new system based on probabilistic machine learning techniques using Hidden Markov Models to model interactions inside of suspicious networks. Our work is based on a dataset from the Stratosphere project released in 2014.

13h30 Identifying Botnets in the Network using Gaussian Mixture ModelsHugo Linsenmaier

Botnets are the primary way of attacking computer networks and are being used to steal information, spy organizations or send spams, by compromising devices connected to the internet. More recently, botnets have also seen themselves being used for financial interests such as mining bitcoins at a large scale. It is a primary threat which is essential to identify in order to defend the interests of users and any type of organization. Unfortunately, public research has often been one step behind the fast adaptation of attackers to detection systems. Our work consists in using unsupervised machine learning techniques unprecedentedly used on such tasks to detect botnets on different scenarios.

14h00 Learning Morphological OperationsAlexandre Kirszenberg

Image Segmentation is the process of identifying the outline of objects in the composition of an image. In recent years, the use of Deep Convolutional Neural Networks for the purpose of Image Segmentation has spiked, with results outperforming more classical approaches. We will explore the implementation and potential applications of integrating filters from the theory of Mathematical Morphology within the structure of a Deep Convolutional Neural Network.

14h30 Estimation of the Noise Level Function in Multivariate Images using the Tree of Shapes and non-parametric statisticsBaptiste Esteban

Nowadays, a lot of image processing applications need to know the noise level of an image to take it into account in these processes or to remove it. To do so, we developed a method to estimate the noise level, modeled by the noise level function, for grayscale images, and then for multivariate images, using simplifying hypotheses. This semester, we introduced new tools to improve this method and to remove the simplifying hypotheses defined the last semester.

15h00 Detecting danger in marine environment: Part 1 - Making the datasetCharles Ginane

The generation of different videos is important for the training of an artificial intelligence. Finding a way to generate realistic marine images by computer helps ensure this generation. In more details, we want to generate videos and metadata associated with the video. Metadata helps us to check and correct our artificial intelligence. To generate those videos, we use use the software MAYA. This software allows us to generate marine environment and extract some images and videos. MoreoverMAYA supports Python's scripts. With Python, we can automate the generation of videos.


15h30 Implementing Baker's SUBTYPEP decision procedureLeo Valais

The Common Lisp language provides a predicate functionSUBTYPEP, for instrospecting sub-type relationship. In some situations, and given the type system of this languageknowing whether a type is a sub-type of another would require enumerating all the element of that type, possibly an infinite number of them. Because of that, SUBTYPEP is allowed to return the two values (NIL NIL), indicating that it couldn't answer the question. Common Lisp implementations have a tendency to frequently not answereven in situations where they could. Such an abusive behavior prevents potential optimizations to occur, or even leads to violating the standard. In his paper entitled “A Decision Procedure for Common Lisp's SUBTYPEP Predicate”Henry Baker proposes an algorithm that he claims to be both more accurate and more efficient than the average SUBTYPEP implementation. We present here the current state the current state of our implementation and discuss one potential improvement based on R-trees of Baker's algorithm.


16h00 Model classification in model checking using random forestThomas De Carvalho

Model checking aims to verify that a system meets the given specification by exploring all its possible states. To achieve that, there are several techniques. The Multi-Core Nested Depth-First Search (CNDFS) has a low memory requirement and works well with the simplest Büchi automatons. The Multi-Core On-The-Fly SCC Decomposition (UFSCC) has a greater memory requirement and works well with generalized Büchi automatons. The Symbolic method has a lower memory requirement but depends a lot on the order of the variables. The performances of these algorithms can be very different and choosing the best one given a specific model without testing all of them is not something easy. Here, we are trying to use machine learning to predict the best method to use for a given model. For that purpose, we train a random forest, an ensemble learning method that uses a multitude of decision treesusing only the first states of the state space.