KoalaSeg π¨π£οΈ
Colab Inference :
KOrean lAyered assistive Segmentation
νκ΅ λλ‘·보ν νκ²½ μ μ© Universal Segmentation λͺ¨λΈμ
λλ€.shi-labs/oneformer_cityscapes_swin_large
κΈ°λ° OneFormer κ΅μ¬ λͺ¨λΈμ
- μμμ XML ν΄λ¦¬κ³€
- AIHUB λλ‘·보ννκ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν°λ‘ νμ΅ν νκ΅ν λͺ¨λΈ
- Cityscapes λ§μ€ν¬
μμΌλ‘ λ μ΄μ΄λ μμλΈνμ¬ μμ±ν GTλ‘ Edge-ViT 20βM νμ λͺ¨λΈμ μ¦λ₯νμ΅λλ€.
Model Details
- Developed by: Team RoadSight
- Base model:
shi-labs/oneformer_cityscapes_swin_large
- Model type: Edge-ViT 20 M + OneFormer head (semantic task)
- Framework: π€ Transformers & PyTorch
Training Data
AIHUB μΈλ·보ννκ²½ λ°μ΄ν° (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
- Bounding Box: 350,000μ₯ (29μ’ μ₯μ λ¬Ό λ°μ€ μ΄λ Έν μ΄μ )
- Polygon: 100,000μ₯ (29μ’ μ₯μ λ¬Ό ν΄λ¦¬κ³€ μ΄λ Έν μ΄μ ) β 500μ₯ μ¬μ©
- Surface Masking: 50,000μ₯ (λ Έλ©΄ μν λ§μ€ν¬) β 5,000μ₯ μ¬μ©
- Depth Prediction: 170,000μ₯ (μ€ν λ μ€ κΉμ΄)
μ΄ 18,369μ₯ (AIHUB 5.5k + μκ° μ΄¬μ 9k + Street View 3.7k) λ μ΄μ΄ μμλΈ β
Morph Open/Close + MedianBlur(17px) ν GT μμ±.
Speeds & Sizes (512Γ512, batch=1)
Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | koalaseg |
---|---|---|---|---|
A100 | 3.58 s β 0.28 FPS | 3.74 s β 0.27 FPS | 0.15 s β 6.67 FPS | 0.14 s β 7.25 FPS |
T4 | 5.61 s β 0.18 FPS | 6.01 s β 0.17 FPS | 0.39 s β 2.60 FPS | 0.31 s β 3.27 FPS |
CPU (i9-12900K) | 124 s β 0.008 FPS | 150 s β 0.007 FPS | 26.6 s β 0.038 FPS | 18.4 s β 0.054 FPS |
Quick Start
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO
# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()
# 1. Download image -------------------------------------------
url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img = Image.open(BytesIO(resp.content)).convert("RGB")
# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
out = model(**inputs)
# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()
# 3-B. Convert idmap β RGB mask + overlay ---------------------
cmap = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
if cid == 0: # keep background black
continue
mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)
mask_img = Image.fromarray(mask_rgb)
overlay = Image.blend(img, mask_img, alpha=0.6) # 0.6 β mask κ°μ‘°
# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
Intended Uses
- μκ° μ₯μ μΈ λμ λλ‘ μΈκ·Έλ©ν μ΄μ
- νκ΅ HD λ§΅Β·λλ‘ μ μ§λ³΄μ μ§μ
- νμ Β·μ°κ΅¬ λͺ©μ μ νκ΅ν λ°μ΄ν°μ λ²€μΉλ§ν¬
Out-of-Scope
- μλ£Β·μμ±Β·μ€λ΄ λ± λΉλλ‘ λλ©μΈ
- κ°μΈ μλ³Β·κ°μ λ± λ―Όκ° μμ
Limitations & Risks
- νκ΅ λλ‘ μ μ©: ν΄μΈΒ·κ·Ήμ μ‘°λΒ·νμ° λ± νκ²½μμ μ±λ₯ μ ν
- λΆλΆ κ°λ¦Ό μΈμ²΄ κ°μ§ λΆμμ β 보쑰μ©μΌλ‘λ§ μ¬μ©
Citation
@misc{KoalaSeg2025, title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation}, author = {RoadSight Team}, year = {2025}, url = {https://huggingface.co/gj5520/KoalaSeg} }
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support