Image-Text-to-Text
PEFT
Safetensors
Korean
table-extraction
table-structure
qwen3-vl
qlora
lora
korean
document-ai
vision-language-model
document-understanding
conversational
Eval Results (legacy)
Instructions to use cywellai/tablescope-structure-extractor-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use cywellai/tablescope-structure-extractor-8b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-VL-8B-Instruct") model = PeftModel.from_pretrained(base_model, "cywellai/tablescope-structure-extractor-8b") - Notebooks
- Google Colab
- Kaggle
TableScope Structure Extractor 8B
Korean table structure extraction LoRA adapter for Qwen3-VL-8B, trained on 15k synthetic Korean tables with row-strip chunking.
νκ΅μ΄ ν μ΄λΈ μ΄λ―Έμ§μμ **ν μ΄λΈ ꡬ쑰(TableSchema JSON)**λ₯Ό μλ μΆμΆνλ QLoRA νμΈνλ μ΄λν°μ λλ€.
β¨ μ£Όμ νΉμ§
- Qwen3-VL-8B κΈ°λ° QLoRA (4-bit NF4, r=64, alpha=128)
- νκ΅μ΄ ν μ΄λΈ νΉν: 15,000건 νκ΅μ΄ ν©μ± ν μ΄λΈ λ°μ΄ν°λ‘ νμ΅
- Row-strip Chunking: 4096 ν ν° μ΄κ³Ό λν ν μ΄λΈμ ν λ¨μ λΆν μ²λ¦¬
- Anti-Forgetting 2-Stage νμ΅: κΈ°μ‘΄ μ±λ₯ μ μ§νλ©΄μ λν ν μ΄λΈ μ²λ¦¬ λ₯λ ₯ μ΅λ
π μ±λ₯
Chunked μΆλ‘ λͺ¨λ (1,500건 μ 체 ν μ€νΈμ )
| 볡μ‘λ | μλ | TEDS | TEDS-S | CellAcc | ValidRate |
|---|---|---|---|---|---|
| Simple | 492 | 0.561 | 0.832 | 0.399 | 93.5% |
| Medium | 505 | 0.620 | 0.830 | 0.437 | 89.7% |
| Complex | 301 | 0.370 | 0.512 | 0.201 | 62.8% |
| Extreme | 202 | 0.108 | 0.147 | 0.079 | 36.6% |
| μ 체 | 1500 | 0.481 | 0.675 | 0.329 | 78.4% |
Standard vs Chunked λΉκ΅
| 볡μ‘λ | Standard | Chunked | κ°μ |
|---|---|---|---|
| Simple | 0.561 | 0.561 | Β±0 |
| Medium | 0.611 | 0.620 | +1.4% |
| Complex | 0.282 | 0.370 | +31% |
| Extreme | 0.048 | 0.108 | +125% |
| μ 체 | 0.453 | 0.481 | +6.2% |
π μ¬μ©λ²
μ€μΉ
pip install transformers peft bitsandbytes qwen-vl-utils
μΆλ‘
from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
# μ΄ λ¦¬ν¬λ LoRA μ΄λν°(weight)λ§ ν¬ν¨νκ³ μμ΅λλ€.
# base_model(Qwen/Qwen3-VL-8B-Instruct)μ μλ μ½λμμ μλμΌλ‘ λ€μ΄λ‘λλ©λλ€.
# λ² μ΄μ€ λͺ¨λΈ λ‘λ (4-bit μμν)
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16",
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
# LoRA μ΄λν° λ‘λ
model = PeftModel.from_pretrained(model, "cywellai/tablescope-structure-extractor-8b")
processor = AutoProcessor.from_pretrained("cywellai/tablescope-structure-extractor-8b")
# μΆλ‘
messages = [
{"role": "system", "content": "λΉμ μ ν
μ΄λΈ μ΄λ―Έμ§μμ ꡬ쑰λ₯Ό μΆμΆνλ μ λ¬Έκ°μ
λλ€. μ£Όμ΄μ§ ν
μ΄λΈ μ΄λ―Έμ§λ₯Ό λΆμνμ¬ TableSchema JSONμ μμ±νμΈμ."},
{"role": "user", "content": [
{"type": "image", "image": "path/to/table.png"},
{"type": "text", "text": "μ΄ ν
μ΄λΈ μ΄λ―Έμ§μ ꡬ쑰λ₯Ό JSONμΌλ‘ μΆμΆνμΈμ."},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=4096)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output) # TableSchema JSON
ποΈ μΆλ ₯ νμ (TableSchema JSON)
{
"col_headers": [
{"labels": ["μ΄λ¦", "λμ΄", "μ§κΈ"], "spans": {}}
],
"row_headers": [],
"data": [
[{"value": "κΉμ² μ"}, {"value": "35"}, {"value": "λ리"}],
[{"value": "μ΄μν¬"}, {"value": "42"}, {"value": "κ³Όμ₯"}]
],
"merged_regions": []
}
π§ νμ΅ μμΈ
Anti-Forgetting 2-Stage νμ΅
| νλͺ© | Stage 1 | Stage 2 |
|---|---|---|
| λ°μ΄ν° | full 12,000건 | full 70% + chunked 30% (17,142건) |
| μμ | v0.0.1 adapter μ΄μ΄νμ΅ | Stage 1 best adapter |
| LR | 5e-6 | 3e-6 |
| Epochs | 1 | 2 |
| Best eval_loss | 1.0154 | 1.0070 |
QLoRA μ€μ
| νλͺ© | κ° |
|---|---|
| Quantization | NF4 4-bit |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| Trainable params | 174M / 8.9B (1.95%) |
νμ΅ μΈνλΌ
- GPU: NVIDIA H200 (143GB VRAM)
- Framework: transformers + peft + trl (SFTTrainer)
- μ΄ νμ΅ μκ°: Stage 1 (1h 43m) + Stage 2 (5h 13m) = ~7μκ°
π¦ κ΄λ ¨ 리μμ€
- λ² μ΄μ€ λͺ¨λΈ: Qwen/Qwen3-VL-8B-Instruct
- νλ‘μ νΈ: TableScope β Korean Table Vision Agent
β οΈ μ ν μ¬ν
- νκ΅μ΄ ν©μ± ν μ΄λΈ λ°μ΄ν°λ‘λ§ νμ΅ β μ€μ¬μ§/μ€μΊ ν μ΄λΈμ μ±λ₯ μ ν κ°λ₯
- Complex/Extreme 볡μ‘λμμλ μμ§ κ°μ μ¬μ§ μμ
- Row-strip Chunkingμ ν κΈ°λ° λΆν μ΄λ―λ‘ μ΄μ΄ λ§€μ° λ§μ κ²½μ° νκ³
- Downloads last month
- 1
Model tree for cywellai/tablescope-structure-extractor-8b
Base model
Qwen/Qwen3-VL-8B-InstructEvaluation results
- TEDS (Chunked)self-reported0.481
- TEDS (Standard)self-reported0.453