datasets:
- harpomaxx/dga-detection
library_name: tf-keras
license: bsd
Model description
A Domain Generation Algoritm (DGA) lexicographical detector using 1DCNN. As described in the article Deep Convolutional Neural Networks for DGA Detection (Catania et al.,2018)
The rest of source code is available at GitHub
Intended uses & limitations
Use it wisely...
Training and evaluation data
The DGA detection method was trained and evaluated on a dataset containing both DGA and normal domain names.
The normal domain names were taken from the Alexa top one million domains. An additional 3,161 normal domains were included in the dataset, provided by the Bambenek Consulting feed. This later group is particularly interesting since it consists of suspicious domain names that were not generated by DGA. Therefore, the total amount of domains normal in the dataset is 1,003,161. DGA domains were obtained from the repositories of DGA domains of Andrey Abakumov and John Bambenek . The total amount of DGA domains is 1,915,335, and they correspond to 51 different malware families.
Training procedure
A traditional grid search was conducted through a specified subset on the training set. For a robust estimation, the evaluation of each parameter combination was carried out using a k-fold cross validation with k=10 folds. The 1D-CNN layer was trained using the back propagation algorithm considering the Adaptive Moment Estimation optimizer. The 1D-CNN training was carried out during 10 epochs. The number of epochs was selected to avoid overfitting.
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameters | Value |
---|---|
name | Adam |
learning_rate | 0.0010000000474974513 |
decay | 0.0 |
beta_1 | 0.8999999761581421 |
beta_2 | 0.9990000128746033 |
epsilon | 1e-07 |
amsgrad | False |
training_precision | float32 |