TableScope Structure Extractor 8B
Korean table structure extraction LoRA adapter for Qwen3-VL-8B, trained on 15k synthetic Korean tables with row-strip chunking.
ํ๊ตญ์ด ํ
์ด๋ธ ์ด๋ฏธ์ง์์ **ํ
์ด๋ธ ๊ตฌ์กฐ(TableSchema JSON)**๋ฅผ ์๋ ์ถ์ถํ๋ QLoRA ํ์ธํ๋ ์ด๋ํฐ์
๋๋ค.
โจ ์ฃผ์ ํน์ง
- Qwen3-VL-8B ๊ธฐ๋ฐ QLoRA (4-bit NF4, r=64, alpha=128)
- ํ๊ตญ์ด ํ
์ด๋ธ ํนํ: 15,000๊ฑด ํ๊ตญ์ด ํฉ์ฑ ํ
์ด๋ธ ๋ฐ์ดํฐ๋ก ํ์ต
- Row-strip Chunking: 4096 ํ ํฐ ์ด๊ณผ ๋ํ ํ
์ด๋ธ์ ํ ๋จ์ ๋ถํ ์ฒ๋ฆฌ
- Anti-Forgetting 2-Stage ํ์ต: ๊ธฐ์กด ์ฑ๋ฅ ์ ์งํ๋ฉด์ ๋ํ ํ
์ด๋ธ ์ฒ๋ฆฌ ๋ฅ๋ ฅ ์ต๋
๐ ์ฑ๋ฅ
Chunked ์ถ๋ก ๋ชจ๋ (1,500๊ฑด ์ ์ฒด ํ
์คํธ์
)
| ๋ณต์ก๋ |
์๋ |
TEDS |
TEDS-S |
CellAcc |
ValidRate |
| Simple |
492 |
0.561 |
0.832 |
0.399 |
93.5% |
| Medium |
505 |
0.620 |
0.830 |
0.437 |
89.7% |
| Complex |
301 |
0.370 |
0.512 |
0.201 |
62.8% |
| Extreme |
202 |
0.108 |
0.147 |
0.079 |
36.6% |
| ์ ์ฒด |
1500 |
0.481 |
0.675 |
0.329 |
78.4% |
Standard vs Chunked ๋น๊ต
| ๋ณต์ก๋ |
Standard |
Chunked |
๊ฐ์ |
| Simple |
0.561 |
0.561 |
ยฑ0 |
| Medium |
0.611 |
0.620 |
+1.4% |
| Complex |
0.282 |
0.370 |
+31% |
| Extreme |
0.048 |
0.108 |
+125% |
| ์ ์ฒด |
0.453 |
0.481 |
+6.2% |
๐ ์ฌ์ฉ๋ฒ
์ค์น
pip install transformers peft bitsandbytes qwen-vl-utils
์ถ๋ก
from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16",
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(model, "cywellai/tablescope-structure-extractor-8b")
processor = AutoProcessor.from_pretrained("cywellai/tablescope-structure-extractor-8b")
messages = [
{"role": "system", "content": "๋น์ ์ ํ
์ด๋ธ ์ด๋ฏธ์ง์์ ๊ตฌ์กฐ๋ฅผ ์ถ์ถํ๋ ์ ๋ฌธ๊ฐ์
๋๋ค. ์ฃผ์ด์ง ํ
์ด๋ธ ์ด๋ฏธ์ง๋ฅผ ๋ถ์ํ์ฌ TableSchema JSON์ ์์ฑํ์ธ์."},
{"role": "user", "content": [
{"type": "image", "image": "path/to/table.png"},
{"type": "text", "text": "์ด ํ
์ด๋ธ ์ด๋ฏธ์ง์ ๊ตฌ์กฐ๋ฅผ JSON์ผ๋ก ์ถ์ถํ์ธ์."},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=4096)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output)
๐๏ธ ์ถ๋ ฅ ํ์ (TableSchema JSON)
{
"col_headers": [
{"labels": ["์ด๋ฆ", "๋์ด", "์ง๊ธ"], "spans": {}}
],
"row_headers": [],
"data": [
[{"value": "๊น์ฒ ์"}, {"value": "35"}, {"value": "๋๋ฆฌ"}],
[{"value": "์ด์ํฌ"}, {"value": "42"}, {"value": "๊ณผ์ฅ"}]
],
"merged_regions": []
}
๐ง ํ์ต ์์ธ
Anti-Forgetting 2-Stage ํ์ต
| ํญ๋ชฉ |
Stage 1 |
Stage 2 |
| ๋ฐ์ดํฐ |
full 12,000๊ฑด |
full 70% + chunked 30% (17,142๊ฑด) |
| ์์ |
v0.0.1 adapter ์ด์ดํ์ต |
Stage 1 best adapter |
| LR |
5e-6 |
3e-6 |
| Epochs |
1 |
2 |
| Best eval_loss |
1.0154 |
1.0070 |
QLoRA ์ค์
| ํญ๋ชฉ |
๊ฐ |
| Quantization |
NF4 4-bit |
| LoRA rank |
64 |
| LoRA alpha |
128 |
| LoRA dropout |
0.05 |
| Trainable params |
174M / 8.9B (1.95%) |
ํ์ต ์ธํ๋ผ
- GPU: NVIDIA H200 (143GB VRAM)
- Framework: transformers + peft + trl (SFTTrainer)
- ์ด ํ์ต ์๊ฐ: Stage 1 (1h 43m) + Stage 2 (5h 13m) = ~7์๊ฐ
๐ฆ ๊ด๋ จ ๋ฆฌ์์ค
โ ๏ธ ์ ํ ์ฌํญ
- ํ๊ตญ์ด ํฉ์ฑ ํ
์ด๋ธ ๋ฐ์ดํฐ๋ก๋ง ํ์ต โ ์ค์ฌ์ง/์ค์บ ํ
์ด๋ธ์ ์ฑ๋ฅ ์ ํ ๊ฐ๋ฅ
- Complex/Extreme ๋ณต์ก๋์์๋ ์์ง ๊ฐ์ ์ฌ์ง ์์
- Row-strip Chunking์ ํ ๊ธฐ๋ฐ ๋ถํ ์ด๋ฏ๋ก ์ด์ด ๋งค์ฐ ๋ง์ ๊ฒฝ์ฐ ํ๊ณ