Model Details

Model Description


train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

Instruction:

μ•„λž˜ λ‰΄μŠ€λ₯Ό 읽고 '경제', '금리', 'μ™Έν™˜' 쀑 ν•˜λ‚˜λ‘œ λΆ„λ₯˜ν•˜μ„Έμš”.

Question:

{}

Response:

{} {}""" -------------------------------------------------

Inference Code

인퍼런슀 μ½”λ“œ

import os import pandas as pd import torch from transformers import AutoTokenizer, AutoModelForCausalLM from tqdm.auto import tqdm import time

1) 경둜 μ„€μ •

base_dir = ## μ„€μ • test_excel = ## μ„€μ • output_excel = ## μ„€μ •

2) ν—ˆκΉ…νŽ˜μ΄μŠ€ ν—ˆλΈŒ 레포 ID

model_id = ## μ„€μ •

3) λͺ¨λΈ & ν† ν¬λ‚˜μ΄μ € λ‘œλ“œ

tokenizer = AutoTokenizer.from_pretrained( model_id, use_fast=True, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map={"": "cuda"}, # μ „ νŒŒλΌλ―Έν„°λ₯Ό GPU둜만 배치 # low_cpu_mem_usage=True, # (선택) λ©”λͺ¨λ¦¬ μ‚¬μš©μ„ μ€„μ΄λŠ” λ‘œλ“œ μ˜΅μ…˜ ) model.config.use_cache = True

4) Inference ν”„λ‘¬ν”„νŠΈ μŠ€νƒ€μΌ μ •μ˜

inference_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

Instruction:

μ•„λž˜ λ‰΄μŠ€λ₯Ό 읽고 '경제', '금리', 'μ™Έν™˜' 쀑 ν•˜λ‚˜λ‘œ λΆ„λ₯˜ν•˜μ„Έμš”.

Question:

{}

Response:

{} {}"""

5) ν…ŒμŠ€νŠΈμ…‹ λ‘œλ“œ

df = pd.read_excel(test_excel, engine='openpyxl') print(f"Loaded {len(df)} examples from {test_excel}")

6) 인퍼런슀 ν•¨μˆ˜ μˆ˜μ •: THEME_HIST만 μž…λ ₯으둜 λ°›μ•„μ„œ μš”μ•½ 생성

def predict_label(text: str) -> str: # THEME_HIST(λ‰΄μŠ€ λ³Έλ¬Έ)만 question에 λ„£μŠ΅λ‹ˆλ‹€ question = text.strip() # inference_prompt_style에 question만 첫 번째 {}에, λ‚˜λ¨Έμ§€ 두 μžλ¦¬λŠ” 빈 λ¬Έμžμ—΄("")둜 μ±„μ›Œμ€λ‹ˆλ‹€ prompt = inference_prompt_style.format(question, "", "") + tokenizer.eos_token

inputs = tokenizer(
    prompt,
    return_tensors='pt',
    truncation=True,
    max_length=2048
).to('cuda')
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=100,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
# "### Response:" λ’€μ˜ ν…μŠ€νŠΈλ₯Ό μš”μ•½μœΌλ‘œ κ°€μ Έμ˜΅λ‹ˆλ‹€
summary = decoded.split("### Response:")[-1].strip()
return summary

──────────────────────────────────────────────────────────────

7) 직렬 루프 λŒ€μ‹  ThreadPoolExecutor둜 병렬 인퍼런슀

import torch from concurrent.futures import ThreadPoolExecutor, as_completed

7-1) λͺ¨λ“  THEME_HISTλ₯Ό 미리 ν”„λ‘¬ν”„νŠΈν™”

prompts = [ inference_prompt_style.format(row['THEME_HIST'].strip(), "", "") + tokenizer.eos_token for _, row in df.iterrows() ]

7-2) μŠ€λ ˆλ“œ λ‹¨μœ„λ‘œ ν•œ 건씩 infer μ‹€ν–‰

def infer_one(prompt: str) -> str: # LangSmith에 β€œllm” νƒ€μž…μœΌλ‘œ run 생성 with trace(name="Qwen3-8B Summarization", run_type="llm", inputs={"prompt": prompt}) as run: start = time.time()

    # ν† ν¬λ‚˜μ΄μ§•
    inputs_tok = tokenizer(
        prompt,
        return_tensors="pt",
        truncation=True,
        max_length=2048
    ).to("cuda")
    input_tokens = inputs_tok.input_ids.numel()

    # λͺ¨λΈ 생성
    outputs = model.generate(
        input_ids=inputs_tok.input_ids,
        attention_mask=inputs_tok.attention_mask,
        max_new_tokens=100,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=True,
    )
    output_tokens = outputs.sequences.shape[1]

    # λ””μ½”λ”© 및 μš”μ•½ μΆ”μΆœ
    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    summary = decoded.split("### Response:")[-1].strip()
    latency_ms = int((time.time() - start) * 1000)

    # 메타데이터 기둝
    run.metadata["input_tokens"] = int(input_tokens)
    run.metadata["output_tokens"] = int(output_tokens)
    run.metadata["latency_ms"] = latency_ms

    # κ²°κ³Ό μ €μž₯ 및 run μ’…λ£Œ
    run.end(outputs={"summary": summary})

    return summary

7-6) ν›„μ²˜λ¦¬: νƒœκ·Έ λ’€λ§Œ 남기기

df['summary'] = ( df['summary'] .astype(str) .str.split(r'', n=1) .str[-1] .str.strip() )

8) μ—‘μ…€ μ €μž₯μ½”λ“œ 별도 μž‘μ„± εΏ…

Framework versions

  • PEFT 0.15.2
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for junghan/News_category_segmentation

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Adapter
(194)
this model