README.md · harpomaxx/dga-detector at main

metadata

datasets:
  - harpomaxx/dga-detection
library_name: tf-keras
license: bsd

Model description

A Domain Generation Algoritm (DGA) lexicographical detector using 1DCNN. As described in the article Deep Convolutional Neural Networks for DGA Detection (Catania et al.,2018)

The rest of source code is available at GitHub

Intended uses & limitations

Use it wisely...

Training and evaluation data

The DGA detection method was trained and evaluated on a dataset containing both DGA and normal domain names.

The normal domain names were taken from the Alexa top one million domains. An additional 3,161 normal domains were included in the dataset, provided by the Bambenek Consulting feed. This later group is particularly interesting since it consists of suspicious domain names that were not generated by DGA. Therefore, the total amount of domains normal in the dataset is 1,003,161. DGA domains were obtained from the repositories of DGA domains of Andrey Abakumov and John Bambenek . The total amount of DGA domains is 1,915,335, and they correspond to 51 different malware families.

Training procedure

A traditional grid search was conducted through a specified subset on the training set. For a robust estimation, the evaluation of each parameter combination was carried out using a k-fold cross validation with k=10 folds. The 1D-CNN layer was trained using the back propagation algorithm considering the Adaptive Moment Estimation optimizer. The 1D-CNN training was carried out during 10 epochs. The number of epochs was selected to avoid overfitting.

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameters	Value
name	Adam
learning_rate	0.0010000000474974513
decay	0.0
beta_1	0.8999999761581421
beta_2	0.9990000128746033
epsilon	1e-07
amsgrad	False
training_precision	float32