TableScope Structure Extractor 8B

Korean table structure extraction LoRA adapter for Qwen3-VL-8B, trained on 15k synthetic Korean tables with row-strip chunking.

한국어 테이블 이미지에서 **테이블 구조(TableSchema JSON)**를 자동 추출하는 QLoRA 파인튜닝 어댑터입니다.

✨ 주요 특징

Qwen3-VL-8B 기반 QLoRA (4-bit NF4, r=64, alpha=128)
한국어 테이블 특화: 15,000건 한국어 합성 테이블 데이터로 학습
Row-strip Chunking: 4096 토큰 초과 대형 테이블을 행 단위 분할 처리
Anti-Forgetting 2-Stage 학습: 기존 성능 유지하면서 대형 테이블 처리 능력 습득

📊 성능

Chunked 추론 모드 (1,500건 전체 테스트셋)

복잡도	수량	TEDS	TEDS-S	CellAcc	ValidRate
Simple	492	0.561	0.832	0.399	93.5%
Medium	505	0.620	0.830	0.437	89.7%
Complex	301	0.370	0.512	0.201	62.8%
Extreme	202	0.108	0.147	0.079	36.6%
전체	1500	0.481	0.675	0.329	78.4%

Standard vs Chunked 비교

복잡도	Standard	Chunked	개선
Simple	0.561	0.561	±0
Medium	0.611	0.620	+1.4%
Complex	0.282	0.370	+31%
Extreme	0.048	0.108	+125%
전체	0.453	0.481	+6.2%

🚀 사용법

설치

pip install transformers peft bitsandbytes qwen-vl-utils

추론

from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

# 이 리포는 LoRA 어댑터(weight)만 포함하고 있습니다.
# base_model(Qwen/Qwen3-VL-8B-Instruct)은 아래 코드에서 자동으로 다운로드됩니다.

# 베이스 모델 로드 (4-bit 양자화)
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)

# LoRA 어댑터 로드
model = PeftModel.from_pretrained(model, "cywellai/tablescope-structure-extractor-8b")
processor = AutoProcessor.from_pretrained("cywellai/tablescope-structure-extractor-8b")

# 추론
messages = [
    {"role": "system", "content": "당신은 테이블 이미지에서 구조를 추출하는 전문가입니다. 주어진 테이블 이미지를 분석하여 TableSchema JSON을 생성하세요."},
    {"role": "user", "content": [
        {"type": "image", "image": "path/to/table.png"},
        {"type": "text", "text": "이 테이블 이미지의 구조를 JSON으로 추출하세요."},
    ]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, return_tensors="pt").to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=4096)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output)  # TableSchema JSON

🏗️ 출력 형식 (TableSchema JSON)

{
  "col_headers": [
    {"labels": ["이름", "나이", "직급"], "spans": {}}
  ],
  "row_headers": [],
  "data": [
    [{"value": "김철수"}, {"value": "35"}, {"value": "대리"}],
    [{"value": "이영희"}, {"value": "42"}, {"value": "과장"}]
  ],
  "merged_regions": []
}

🔧 학습 상세

Anti-Forgetting 2-Stage 학습

항목	Stage 1	Stage 2
데이터	full 12,000건	full 70% + chunked 30% (17,142건)
시작	v0.0.1 adapter 이어학습	Stage 1 best adapter
LR	5e-6	3e-6
Epochs	1	2
Best eval_loss	1.0154	1.0070

QLoRA 설정

항목	값
Quantization	NF4 4-bit
LoRA rank	64
LoRA alpha	128
LoRA dropout	0.05
Trainable params	174M / 8.9B (1.95%)

학습 인프라

GPU: NVIDIA H200 (143GB VRAM)
Framework: transformers + peft + trl (SFTTrainer)
총 학습 시간: Stage 1 (1h 43m) + Stage 2 (5h 13m) = ~7시간

📦 관련 리소스

베이스 모델: Qwen/Qwen3-VL-8B-Instruct
프로젝트: TableScope — Korean Table Vision Agent

⚠️ 제한 사항

한국어 합성 테이블 데이터로만 학습 → 실사진/스캔 테이블은 성능 저하 가능
Complex/Extreme 복잡도에서는 아직 개선 여지 있음
Row-strip Chunking은 행 기반 분할이므로 열이 매우 많은 경우 한계

Downloads last month: 1

Model tree for cywellai/tablescope-structure-extractor-8b

Base model

Qwen/Qwen3-VL-8B-Instruct

Adapter

(91)

this model

Evaluation results

TEDS (Chunked)
self-reported

0.481
TEDS (Standard)
self-reported

0.453