KoalaSeg 🐨🛣️

Colab Inference :

KOrean lAyered assistive Segmentation

한국 도로·보행 환경 전용 Universal Segmentation 모델입니다.
shi-labs/oneformer_cityscapes_swin_large 기반 OneFormer 교사 모델을

수작업 XML 폴리곤
AIHUB 도로·보행환경 Surface Mask(5k) + Polygon(500) 데이터로 학습한 한국형 모델
Cityscapes 마스크
순으로 레이어드 앙상블하여 생성한 GT로 Edge-ViT 20 M 학생 모델을 증류했습니다.

Model Details

Developed by: Team RoadSight
Base model: shi-labs/oneformer_cityscapes_swin_large
Model type: Edge-ViT 20 M + OneFormer head (semantic task)
Framework: 🤗 Transformers & PyTorch

Training Data

AIHUB 인도·보행환경 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):

Bounding Box: 350,000장 (29종 장애물 박스 어노테이션)
Polygon: 100,000장 (29종 장애물 폴리곤 어노테이션) → 500장 사용
Surface Masking: 50,000장 (노면 상태 마스크) → 5,000장 사용
Depth Prediction: 170,000장 (스테레오 깊이)

총 18,369장 (AIHUB 5.5k + 자가 촬영 9k + Street View 3.7k) 레이어 앙상블 →
Morph Open/Close + MedianBlur(17px) 후 GT 생성.

Speeds & Sizes (512×512, batch=1)

Device	Baseline Cityscapes	Ensemble (3-layer)	Custom (K-Road)	koalaseg
A100	3.58 s → 0.28 FPS	3.74 s → 0.27 FPS	0.15 s → 6.67 FPS	0.14 s → 7.25 FPS
T4	5.61 s → 0.18 FPS	6.01 s → 0.17 FPS	0.39 s → 2.60 FPS	0.31 s → 3.27 FPS
CPU (i9-12900K)	124 s → 0.008 FPS	150 s → 0.007 FPS	26.6 s → 0.038 FPS	18.4 s → 0.054 FPS

Quick Start

from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO

# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc  = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()

# 1. Download image -------------------------------------------
url  = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img  = Image.open(BytesIO(resp.content)).convert("RGB")

# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model(**inputs)

# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
    out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()

# 3-B. Convert idmap → RGB mask + overlay ---------------------
cmap      = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb  = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
    if cid == 0:                  # keep background black
        continue
    mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)

mask_img = Image.fromarray(mask_rgb)
overlay  = Image.blend(img, mask_img, alpha=0.6)   # 0.6 → mask 강조

# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()

Intended Uses

시각 장애인 대상 도로 세그멘테이션
한국 HD 맵·도로 유지보수 지원
학술·연구 목적의 한국형 데이터셋 벤치마크

Out-of-Scope

의료·위성·실내 등 비도로 도메인
개인 식별·감시 등 민감 작업

Limitations & Risks

한국 도로 전용: 해외·극저조도·폭우 등 환경에서 성능 저하
부분 가림 인체 감지 불안정 → 보조용으로만 사용

Citation

@misc{KoalaSeg2025, title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation}, author = {RoadSight Team}, year = {2025}, url = {https://huggingface.co/gj5520/KoalaSeg} }