viettelsecurity-ai
/

cyber-llm-14b

Text Generation

text-generation-inference

Model card Files Files and versions

hoangph3 commited on Mar 11

Commit

aadb9a0

·

verified ·

1 Parent(s): 439c1eb

Create README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+language:
+- vi
+- en
+base_model:
+- microsoft/phi-4
+pipeline_tag: text-generation
+tags:
+- cybersecurity
+- text-generation-inference
+- transformers
+---
+## Model Overview
+|                         |                                                                               |
+|-------------------------|-------------------------------------------------------------------------------|
+| **Developers**          | Meta                                                                          |
+| **Architecture**        | 14B parameters, dense decoder-only Transformer model                           |
+| **Inputs**              | Text, best suited for prompts in the chat format                              |
+| **Context length**      | 16K tokens                                                                     |
+| **Outputs**             | Generated text in response to input                                           |
+| **License**             | MIT                                                                           |
+## Training Datasets
+Our training data is an extension of the data used for `cyber-llm-14b` and includes a wide variety of sources from:
+1. Publicly available blogs, papers, reference from: https://github.com/PEASEC/cybersecurity_dataset.
+2. Newly created synthetic, "textbook-like" data for the purpose of teaching cybersecurity (use GPT-4o).
+3. Acquired academic books and Q&A datasets
+## Usage
+### Input Formats
+Given the nature of the training data, `cyber-llm-14b` is best suited for prompts using the chat format as follows:
+```bash
+<|begin_of_text|><|start_header_id|>user<|end_header_id|>
+Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>
+I'm great thanks!<|eot_id|>
+```
+### With `transformers`
+```python
+import transformers
+pipeline = transformers.pipeline(
+    "text-generation",
+    model="viettelsecurity-ai/cyber-llm-14b",
+    model_kwargs={"torch_dtype": "auto"},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a SOC-tier3"},
+    {"role": "user", "content": "What is the url phishing?"},
+]
+outputs = pipeline(messages, max_new_tokens=2048)
+print(outputs[0]["generated_text"][-1])
+```