Model Card for qwen2.5-3b-memory-summary-v1

This model is an early experimental LoRA fine-tuned version of Qwen2.5-3B-Instruct for session memory summarization.
The v1 checkpoint was trained as an initial baseline, but its outputs before the training pipeline fixes were unstable and frequently degenerated into repetitive template-like text rather than producing clean memory summaries.

Model Details

Model Description

This model is a supervised fine-tuning (SFT) baseline built for a conversational memory summarization task.
Its intended role is to compress prior dialogue context into a concise memory summary that can be reused in later turns of a multi-turn chatbot system.

However, this v1 version was trained before key training pipeline fixes were applied. As a result, the model often failed to generate usable summaries and instead produced repetitive or malformed outputs such as repeated prompt-template fragments.

Developed by: 김예슬
Shared by: 김예슬
Model type: Causal Language Model with LoRA fine-tuning
Language(s): English
License: Apache-2.0
Finetuned from model: Qwen/Qwen2.5-3B-Instruct

Model Sources

Repository: Hugging Face model repository for qwen2.5-3b-memory-summary-v1
Base model: Qwen/Qwen2.5-3B-Instruct
Paper: Not applicable
Demo: Not available

Uses

Direct Use

This model was intended for:

session memory summarization
multi-turn chatbot memory compression
summarizing recent dialogue into a short reusable narrative

A typical use case is:

input: recent conversation context, optional previous memory, optional active document context
output: a concise memory summary for future turns

Downstream Use

Potential downstream use includes:

memory module in a multi-turn chatbot
summarization component in a RAG-based assistant
lightweight experimental memory updater for conversational AI pipelines

Out-of-Scope Use

This v1 model should not be used for:

production deployment
factual summarization requiring high reliability
safety-critical domains
automated user profiling
any setting where repetitive or malformed generation is unacceptable

Bias, Risks, and Limitations

This model has major technical limitations in its v1 state.

Known issues observed in the pre-fix version include:

repetitive degenerate generation
prompt-template leakage into outputs
unstable summary quality
poor alignment between training format and inference format
weak reliability for structured memory summarization

Because the model was trained before several important fixes, its output quality should be treated as baseline-only and not production-ready.

Recommendations

Recommended usage for this v1 model:

use only for debugging or baseline comparison
compare against later improved versions
do not rely on its summaries without manual inspection
use it primarily to document failure cases and training pipeline issues

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yeseul0-0/qwen2.5-3b-memory-summary-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [
    {"role": "system", "content": "You are a session-memory summarization model."},
    {"role": "user", "content": "Summarize the recent conversation into concise memory for future turns."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was trained on a mixed summarization-style dataset composed of public dialogue and meeting summarization sources, along with a small amount of synthetic memory-style data.

Training data sources included:

DialogSum
SAMSum
QMSum
small synthetic session-memory examples

The goal was to adapt a general instruct model into a memory summarizer. However, the synthetic memory-specific portion was too limited compared with the general summarization data, which likely contributed to weak task alignment.

Training Procedure

This model was trained using LoRA-based supervised fine-tuning.

Preprocessing

Training examples were formatted into chat-style conversations using a system / user / assistant structure.

In the v1 pipeline, the full chat text was converted into a single training sequence. Before later fixes, this created several issues:

the model learned from full prompt text rather than focusing only on the target completion
system and user prompt tokens likely contributed to overfitting on template patterns
training format and actual inference format were not well aligned
long inputs may not have been handled as intended because the effective training max length was not clearly enforced in the SFT configuration

Training Hyperparameters

Training regime: LoRA + mixed precision fine-tuning
Base model: Qwen/Qwen2.5-3B-Instruct
Fine-tuning method: SFT with LoRA
Notable v1 characteristics:
- early baseline configuration
- short training schedule
- packing enabled in the original version
- completion-only supervision not properly separated before fixes
- training/inference format mismatch remained unresolved in v1

Speeds, Sizes, Times

This model card currently describes the qualitative state of the v1 baseline. Exact runtime, throughput, and wall-clock training duration were not recorded here.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed mainly through small-scale qualitative generation checks and baseline summary inspection.

Factors

The main factors affecting evaluation in v1 were:

long-context handling
prompt formatting consistency
memory-style summarization alignment
repetition and degeneration during generation

Metrics

The v1 checkpoint was mainly assessed through:

qualitative generation inspection
failure-case analysis
summary usability
repetition / degeneration checking

Results

The pre-fix v1 model did not produce stable or reliable memory summaries.

Observed behavior included:

repetitive outputs
template-like text leakage
malformed summary generation
failure to consistently return concise, usable memory summaries

In particular, the model sometimes generated obviously broken repetitive strings instead of meaningful summaries, indicating that the initial training setup was not yet appropriate for the target task.

Summary

This v1 checkpoint should be regarded as a failed or incomplete baseline rather than a successful task model. Its main value is diagnostic:

it shows what went wrong in the early training pipeline
it motivates later fixes such as completion-only loss, disabling packing, improving max-length handling, and aligning training format with inference format

Model Examination

The v1 model is useful for examining common failure modes in early instruction-style fine-tuning for memory summarization tasks, especially:

prompt-copying behavior
repetition collapse
template-token overlearning
weak task adaptation from general summarization data

Environmental Impact

Carbon emissions were not measured for this experiment.

Hardware Type: Not fully documented
Hours used: Not fully documented
Cloud Provider: Colab / GPU notebook environment assumed during experimentation
Compute Region: Not documented
Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

Architecture: Qwen2.5-3B-Instruct
Objective: causal language modeling with supervised fine-tuning
Adaptation method: LoRA
Target task: session memory summarization for multi-turn conversation systems

Compute Infrastructure

The model was trained in a notebook-based GPU environment.

Hardware

GPU-based notebook environment. Exact device specification for this v1 model card is not finalized.

Software

Python
Transformers
TRL
PEFT
PyTorch
Hugging Face Hub

Citation

BibTeX:

@misc{kim2026qwen25memoryv1,
  title={qwen2.5-3b-memory-summary-v1},
  author={Kim, Yeseul},
  year={2026},
  howpublished={Hugging Face Model Repository},
  note={Experimental LoRA fine-tuned baseline for session memory summarization}
}

APA:

Kim, Y. (2026). qwen2.5-3b-memory-summary-v1 [Model]. Hugging Face.

More Information

This repository documents an early baseline before major training fixes were applied. Later model versions are expected to improve:

completion-only supervision
prompt/completion separation
max-length control
packing removal
memory-format data alignment

Model Card Authors

김예슬

Model Card Contact

김예슬

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1285)

this model