Model Card for qwen2.5-3b-memory-summary-v1

This model is an early experimental LoRA fine-tuned version of Qwen2.5-3B-Instruct for session memory summarization.
The v1 checkpoint was trained as an initial baseline, but its outputs before the training pipeline fixes were unstable and frequently degenerated into repetitive template-like text rather than producing clean memory summaries.

Model Details

Model Description

This model is a supervised fine-tuning (SFT) baseline built for a conversational memory summarization task.
Its intended role is to compress prior dialogue context into a concise memory summary that can be reused in later turns of a multi-turn chatbot system.

However, this v1 version was trained before key training pipeline fixes were applied. As a result, the model often failed to generate usable summaries and instead produced repetitive or malformed outputs such as repeated prompt-template fragments.

  • Developed by: ๊น€์˜ˆ์Šฌ
  • Shared by: ๊น€์˜ˆ์Šฌ
  • Model type: Causal Language Model with LoRA fine-tuning
  • Language(s): English
  • License: Apache-2.0
  • Finetuned from model: Qwen/Qwen2.5-3B-Instruct

Model Sources

  • Repository: Hugging Face model repository for qwen2.5-3b-memory-summary-v1
  • Base model: Qwen/Qwen2.5-3B-Instruct
  • Paper: Not applicable
  • Demo: Not available

Uses

Direct Use

This model was intended for:

  • session memory summarization
  • multi-turn chatbot memory compression
  • summarizing recent dialogue into a short reusable narrative

A typical use case is:

  • input: recent conversation context, optional previous memory, optional active document context
  • output: a concise memory summary for future turns

Downstream Use

Potential downstream use includes:

  • memory module in a multi-turn chatbot
  • summarization component in a RAG-based assistant
  • lightweight experimental memory updater for conversational AI pipelines

Out-of-Scope Use

This v1 model should not be used for:

  • production deployment
  • factual summarization requiring high reliability
  • safety-critical domains
  • automated user profiling
  • any setting where repetitive or malformed generation is unacceptable

Bias, Risks, and Limitations

This model has major technical limitations in its v1 state.

Known issues observed in the pre-fix version include:

  • repetitive degenerate generation
  • prompt-template leakage into outputs
  • unstable summary quality
  • poor alignment between training format and inference format
  • weak reliability for structured memory summarization

Because the model was trained before several important fixes, its output quality should be treated as baseline-only and not production-ready.

Recommendations

Recommended usage for this v1 model:

  • use only for debugging or baseline comparison
  • compare against later improved versions
  • do not rely on its summaries without manual inspection
  • use it primarily to document failure cases and training pipeline issues

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yeseul0-0/qwen2.5-3b-memory-summary-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [
    {"role": "system", "content": "You are a session-memory summarization model."},
    {"role": "user", "content": "Summarize the recent conversation into concise memory for future turns."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was trained on a mixed summarization-style dataset composed of public dialogue and meeting summarization sources, along with a small amount of synthetic memory-style data.

Training data sources included:

  • DialogSum
  • SAMSum
  • QMSum
  • small synthetic session-memory examples

The goal was to adapt a general instruct model into a memory summarizer. However, the synthetic memory-specific portion was too limited compared with the general summarization data, which likely contributed to weak task alignment.

Training Procedure

This model was trained using LoRA-based supervised fine-tuning.

Preprocessing

Training examples were formatted into chat-style conversations using a system / user / assistant structure.

In the v1 pipeline, the full chat text was converted into a single training sequence. Before later fixes, this created several issues:

  • the model learned from full prompt text rather than focusing only on the target completion
  • system and user prompt tokens likely contributed to overfitting on template patterns
  • training format and actual inference format were not well aligned
  • long inputs may not have been handled as intended because the effective training max length was not clearly enforced in the SFT configuration

Training Hyperparameters

  • Training regime: LoRA + mixed precision fine-tuning

  • Base model: Qwen/Qwen2.5-3B-Instruct

  • Fine-tuning method: SFT with LoRA

  • Notable v1 characteristics:

    • early baseline configuration
    • short training schedule
    • packing enabled in the original version
    • completion-only supervision not properly separated before fixes
    • training/inference format mismatch remained unresolved in v1

Speeds, Sizes, Times

This model card currently describes the qualitative state of the v1 baseline. Exact runtime, throughput, and wall-clock training duration were not recorded here.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed mainly through small-scale qualitative generation checks and baseline summary inspection.

Factors

The main factors affecting evaluation in v1 were:

  • long-context handling
  • prompt formatting consistency
  • memory-style summarization alignment
  • repetition and degeneration during generation

Metrics

The v1 checkpoint was mainly assessed through:

  • qualitative generation inspection
  • failure-case analysis
  • summary usability
  • repetition / degeneration checking

Results

The pre-fix v1 model did not produce stable or reliable memory summaries.

Observed behavior included:

  • repetitive outputs
  • template-like text leakage
  • malformed summary generation
  • failure to consistently return concise, usable memory summaries

In particular, the model sometimes generated obviously broken repetitive strings instead of meaningful summaries, indicating that the initial training setup was not yet appropriate for the target task.

Summary

This v1 checkpoint should be regarded as a failed or incomplete baseline rather than a successful task model. Its main value is diagnostic:

  • it shows what went wrong in the early training pipeline
  • it motivates later fixes such as completion-only loss, disabling packing, improving max-length handling, and aligning training format with inference format

Model Examination

The v1 model is useful for examining common failure modes in early instruction-style fine-tuning for memory summarization tasks, especially:

  • prompt-copying behavior
  • repetition collapse
  • template-token overlearning
  • weak task adaptation from general summarization data

Environmental Impact

Carbon emissions were not measured for this experiment.

  • Hardware Type: Not fully documented
  • Hours used: Not fully documented
  • Cloud Provider: Colab / GPU notebook environment assumed during experimentation
  • Compute Region: Not documented
  • Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

  • Architecture: Qwen2.5-3B-Instruct
  • Objective: causal language modeling with supervised fine-tuning
  • Adaptation method: LoRA
  • Target task: session memory summarization for multi-turn conversation systems

Compute Infrastructure

The model was trained in a notebook-based GPU environment.

Hardware

GPU-based notebook environment. Exact device specification for this v1 model card is not finalized.

Software

  • Python
  • Transformers
  • TRL
  • PEFT
  • PyTorch
  • Hugging Face Hub

Citation

BibTeX:

@misc{kim2026qwen25memoryv1,
  title={qwen2.5-3b-memory-summary-v1},
  author={Kim, Yeseul},
  year={2026},
  howpublished={Hugging Face Model Repository},
  note={Experimental LoRA fine-tuned baseline for session memory summarization}
}

APA:

Kim, Y. (2026). qwen2.5-3b-memory-summary-v1 [Model]. Hugging Face.

More Information

This repository documents an early baseline before major training fixes were applied. Later model versions are expected to improve:

  • completion-only supervision
  • prompt/completion separation
  • max-length control
  • packing removal
  • memory-format data alignment

Model Card Authors

๊น€์˜ˆ์Šฌ

Model Card Contact

๊น€์˜ˆ์Šฌ

Downloads last month
2
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1

Base model

Qwen/Qwen2.5-3B
Adapter
(1285)
this model