Upload vulnerability detection model

Files changed (4) hide show

README.md CHANGED Viewed

@@ -1,34 +1,54 @@
 ---
 library_name: transformers
 tags:
-- Code
-- Vulnerability
-- Detection
-datasets:
-- DetectVul/bigvul
-language:
-- en
-base_model:
-- microsoft/codebert-base
 license: mit
-metrics:
-- accuracy
-- precision
-- f1
-- recall
 ---
-## CodeBERT for Code Vulnerability Detection
-## Model Summary
-This model is a fine-tuned version of **microsoft/codebert-base**, optimized for detecting vulnerabilities in code. It is trained on the **bigvul** dataset. The model takes in a code snippet and classifies it as either **benign (0)** or **vulnerable (1)**.
-## Model Details
-- **Developed by:** Eun Jung
-- **Finetuned from:** `microsoft/codebert-base`
-- **Language(s):** English (for code comments & metadata), C/C++
-- **License:** MIT
-- **Task:** Code vulnerability detection
-- **Dataset Used:** `bigvul`
-- **Architecture:** Transformer-based sequence classification

 ---
+language:
+- code
 library_name: transformers
+pipeline_tag: text-classification
 tags:
+- code-analysis
+- vulnerability-detection
+- security
+- cwe
 license: mit
+base_model: microsoft/codebert-base
 ---
+# CodeBERT Vulnerability Detector (Multi-class)
+C/C++ 코드의 취약점을 탐지하는 다중 클래스 분류 모델입니다.
+## 모델 정보
+- **기반 모델**: microsoft/codebert-base
+- **분류 클래스**: 4개 (CWE-79, CWE-89, CWE-119, 기타)
+- **입력**: C/C++ 소스 코드 텍스트
+## 사용 방법
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# 모델 로드
+model_name = "eunJ/codebert_vulnerability_detector_multi"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# 코드 분석
+code = '''
+char buffer[100];
+gets(buffer);
+'''
+inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions)
+print(f"예측 클래스: {predicted_class.item()}")
+```
+## 클래스 레이블
+- 0: CWE-79 (Cross-site Scripting)
+- 1: CWE-89 (SQL Injection)
+- 2: CWE-119 (Buffer Overflow)
+- 3: CWE-Other (기타)

config.json CHANGED Viewed

@@ -1,19 +1,40 @@
 {
-  "model_type": "roberta",
   "architectures": [
     "RobertaForSequenceClassification"
   ],
-  "num_labels": 4,
   "id2label": {
     "0": "LABEL_0",
     "1": "LABEL_1",
     "2": "LABEL_2",
     "3": "LABEL_3"
   },
   "label2id": {
     "LABEL_0": 0,
     "LABEL_1": 1,
     "LABEL_2": 2,
     "LABEL_3": 3
-  }
-}

 {
+  "_name_or_path": "microsoft/codebert-base",
   "architectures": [
     "RobertaForSequenceClassification"
   ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
   "id2label": {
     "0": "LABEL_0",
     "1": "LABEL_1",
     "2": "LABEL_2",
     "3": "LABEL_3"
   },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
   "label2id": {
     "LABEL_0": 0,
     "LABEL_1": 1,
     "LABEL_2": 2,
     "LABEL_3": 3
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "output_past": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 50265
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c8037175a0f7980967910a796119cf0250a6c0200f7568225f2e9aaeb43b9b68
-size 498633008

 version https://git-lfs.github.com/spec/v1
+oid sha256:c601158bf8733adb819d956d6e3c418480f72fa7216b30463b1f5aa291ce2756
+size 498618976

tokenizer_config.json CHANGED Viewed

@@ -45,7 +45,6 @@
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "cls_token": "<s>",
-  "do_lower_case": false,
   "eos_token": "</s>",
   "errors": "replace",
   "extra_special_tokens": {},

   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "cls_token": "<s>",
   "eos_token": "</s>",
   "errors": "replace",
   "extra_special_tokens": {},