File size: 2,288 Bytes
aadb9a0 89d32d7 aadb9a0 d04a44d aadb9a0 89d32d7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
language:
- vi
- en
base_model:
- microsoft/phi-4
pipeline_tag: text-generation
tags:
- cybersecurity
- text-generation-inference
- transformers
license: mit
---
## Model Overview
| | |
|-------------------------|-------------------------------------------------------------------------------|
| **Developers** | Microsoft |
| **Architecture** | 14B parameters, dense decoder-only Transformer model |
| **Inputs** | Text, best suited for prompts in the chat format |
| **Context length** | 16K tokens |
| **Outputs** | Generated text in response to input |
| **License** | MIT |
## Training Datasets
Our training data is an extension of the data used for `cyber-llm-14b` and includes a wide variety of sources from:
1. Publicly available blogs, papers, reference from: https://github.com/PEASEC/cybersecurity_dataset.
2. Newly created synthetic, "textbook-like" data for the purpose of teaching cybersecurity (use GPT-4o).
3. Acquired academic books and Q&A datasets
## Usage
### Input Formats
Given the nature of the training data, `cyber-llm-14b` is best suited for prompts using the chat format as follows:
```bash
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>
I'm great thanks!<|eot_id|>
```
### With `transformers`
```python
import transformers
pipeline = transformers.pipeline(
"text-generation",
model="viettelsecurity-ai/cyber-llm-14b",
model_kwargs={"torch_dtype": "auto"},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a SOC-tier3"},
{"role": "user", "content": "What is the url phishing?"},
]
outputs = pipeline(messages, max_new_tokens=2048)
print(outputs[0]["generated_text"][-1])
``` |