Instructions to use yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1") model = AutoModelForMultimodalLM.from_pretrained("yeseul0-0/qwen2.5-3b-memory-summary-default_v0.1") - Notebooks
- Google Colab
- Kaggle
Model Card for qwen2.5-3b-memory-summary-v1
This model is an early experimental LoRA fine-tuned version of Qwen2.5-3B-Instruct for session memory summarization.
The v1 checkpoint was trained as an initial baseline, but its outputs before the training pipeline fixes were unstable and frequently degenerated into repetitive template-like text rather than producing clean memory summaries.
Model Details
Model Description
This model is a supervised fine-tuning (SFT) baseline built for a conversational memory summarization task.
Its intended role is to compress prior dialogue context into a concise memory summary that can be reused in later turns of a multi-turn chatbot system.
However, this v1 version was trained before key training pipeline fixes were applied. As a result, the model often failed to generate usable summaries and instead produced repetitive or malformed outputs such as repeated prompt-template fragments.
- Developed by: ๊น์์ฌ
- Shared by: ๊น์์ฌ
- Model type: Causal Language Model with LoRA fine-tuning
- Language(s): English
- License: Apache-2.0
- Finetuned from model: Qwen/Qwen2.5-3B-Instruct
Model Sources
- Repository: Hugging Face model repository for
qwen2.5-3b-memory-summary-v1 - Base model: Qwen/Qwen2.5-3B-Instruct
- Paper: Not applicable
- Demo: Not available
Uses
Direct Use
This model was intended for:
- session memory summarization
- multi-turn chatbot memory compression
- summarizing recent dialogue into a short reusable narrative
A typical use case is:
- input: recent conversation context, optional previous memory, optional active document context
- output: a concise memory summary for future turns
Downstream Use
Potential downstream use includes:
- memory module in a multi-turn chatbot
- summarization component in a RAG-based assistant
- lightweight experimental memory updater for conversational AI pipelines
Out-of-Scope Use
This v1 model should not be used for:
- production deployment
- factual summarization requiring high reliability
- safety-critical domains
- automated user profiling
- any setting where repetitive or malformed generation is unacceptable
Bias, Risks, and Limitations
This model has major technical limitations in its v1 state.
Known issues observed in the pre-fix version include:
- repetitive degenerate generation
- prompt-template leakage into outputs
- unstable summary quality
- poor alignment between training format and inference format
- weak reliability for structured memory summarization
Because the model was trained before several important fixes, its output quality should be treated as baseline-only and not production-ready.
Recommendations
Recommended usage for this v1 model:
- use only for debugging or baseline comparison
- compare against later improved versions
- do not rely on its summaries without manual inspection
- use it primarily to document failure cases and training pipeline issues
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yeseul0-0/qwen2.5-3b-memory-summary-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a session-memory summarization model."},
{"role": "user", "content": "Summarize the recent conversation into concise memory for future turns."}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was trained on a mixed summarization-style dataset composed of public dialogue and meeting summarization sources, along with a small amount of synthetic memory-style data.
Training data sources included:
- DialogSum
- SAMSum
- QMSum
- small synthetic session-memory examples
The goal was to adapt a general instruct model into a memory summarizer. However, the synthetic memory-specific portion was too limited compared with the general summarization data, which likely contributed to weak task alignment.
Training Procedure
This model was trained using LoRA-based supervised fine-tuning.
Preprocessing
Training examples were formatted into chat-style conversations using a system / user / assistant structure.
In the v1 pipeline, the full chat text was converted into a single training sequence. Before later fixes, this created several issues:
- the model learned from full prompt text rather than focusing only on the target completion
- system and user prompt tokens likely contributed to overfitting on template patterns
- training format and actual inference format were not well aligned
- long inputs may not have been handled as intended because the effective training max length was not clearly enforced in the SFT configuration
Training Hyperparameters
Training regime: LoRA + mixed precision fine-tuning
Base model: Qwen/Qwen2.5-3B-Instruct
Fine-tuning method: SFT with LoRA
Notable v1 characteristics:
- early baseline configuration
- short training schedule
- packing enabled in the original version
- completion-only supervision not properly separated before fixes
- training/inference format mismatch remained unresolved in v1
Speeds, Sizes, Times
This model card currently describes the qualitative state of the v1 baseline. Exact runtime, throughput, and wall-clock training duration were not recorded here.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluation was performed mainly through small-scale qualitative generation checks and baseline summary inspection.
Factors
The main factors affecting evaluation in v1 were:
- long-context handling
- prompt formatting consistency
- memory-style summarization alignment
- repetition and degeneration during generation
Metrics
The v1 checkpoint was mainly assessed through:
- qualitative generation inspection
- failure-case analysis
- summary usability
- repetition / degeneration checking
Results
The pre-fix v1 model did not produce stable or reliable memory summaries.
Observed behavior included:
- repetitive outputs
- template-like text leakage
- malformed summary generation
- failure to consistently return concise, usable memory summaries
In particular, the model sometimes generated obviously broken repetitive strings instead of meaningful summaries, indicating that the initial training setup was not yet appropriate for the target task.
Summary
This v1 checkpoint should be regarded as a failed or incomplete baseline rather than a successful task model. Its main value is diagnostic:
- it shows what went wrong in the early training pipeline
- it motivates later fixes such as completion-only loss, disabling packing, improving max-length handling, and aligning training format with inference format
Model Examination
The v1 model is useful for examining common failure modes in early instruction-style fine-tuning for memory summarization tasks, especially:
- prompt-copying behavior
- repetition collapse
- template-token overlearning
- weak task adaptation from general summarization data
Environmental Impact
Carbon emissions were not measured for this experiment.
- Hardware Type: Not fully documented
- Hours used: Not fully documented
- Cloud Provider: Colab / GPU notebook environment assumed during experimentation
- Compute Region: Not documented
- Carbon Emitted: Not measured
Technical Specifications
Model Architecture and Objective
- Architecture: Qwen2.5-3B-Instruct
- Objective: causal language modeling with supervised fine-tuning
- Adaptation method: LoRA
- Target task: session memory summarization for multi-turn conversation systems
Compute Infrastructure
The model was trained in a notebook-based GPU environment.
Hardware
GPU-based notebook environment. Exact device specification for this v1 model card is not finalized.
Software
- Python
- Transformers
- TRL
- PEFT
- PyTorch
- Hugging Face Hub
Citation
BibTeX:
@misc{kim2026qwen25memoryv1,
title={qwen2.5-3b-memory-summary-v1},
author={Kim, Yeseul},
year={2026},
howpublished={Hugging Face Model Repository},
note={Experimental LoRA fine-tuned baseline for session memory summarization}
}
APA:
Kim, Y. (2026). qwen2.5-3b-memory-summary-v1 [Model]. Hugging Face.
More Information
This repository documents an early baseline before major training fixes were applied. Later model versions are expected to improve:
- completion-only supervision
- prompt/completion separation
- max-length control
- packing removal
- memory-format data alignment
Model Card Authors
๊น์์ฌ
Model Card Contact
๊น์์ฌ
- Downloads last month
- 2