Pasmikh commited on
Commit
71065b8
·
verified ·
1 Parent(s): 28dbc9d

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 4096,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": true,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Model Details
2
+
3
+ ### Model Description
4
+ - **Model Type:** Sentence Transformer
5
+ - **Maximum Sequence Length:** 32768 tokens
6
+ - **Similarity Function:** Cosine Similarity
7
+
8
+ ### Model Sources
9
+
10
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
11
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
12
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
13
+
14
+ ### Full Model Architecture
15
+
16
+ ```
17
+ SentenceTransformer(
18
+ (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: Qwen2Model
19
+ (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
20
+ (2): Normalize()
21
+ )
22
+ ```
23
+
24
+ ## Usage
25
+
26
+ ### Direct Usage (Sentence Transformers)
27
+
28
+ First install the Sentence Transformers library:
29
+
30
+ ```bash
31
+ pip install -U sentence-transformers
32
+ ```
33
+
34
+ Then you can load this model and run inference.
35
+ ```python
36
+ from sentence_transformers import SentenceTransformer
37
+
38
+ # Download from the 🤗 Hub
39
+ model = SentenceTransformer(
40
+ "<model_name>",
41
+ model_kwargs={"attn_implementation": "flash_attention_2", "torch_dtype": torch.bfloat16}
42
+ )
43
+
44
+ # Run inference
45
+ documents = [
46
+ 'Your Upcoming Stay at Park Tower Knightsbridge\n\n\r\n[cid:image001.png@01DA125C.D84FF1A0]\r\n\r\n\r\nDear Abdulla Alhassani,\r\n\r\n\r\n\r\nThank you for choosing The Park Tower Knightsbridge, A Luxury Collection Hotel for your upcoming visit! We are looking forward to welcoming you to the hotel and\r\n\r\nwould like to prepare for your arrival with any special requests you may have. A few details from you, can help us best prepare and make your stay as memorable as possible.\r\n\r\nPlease share with us your estimated arrival time and let us know if we may assist you in booking a private car or taxi service to get to the hotel.\r\n\r\nEmail us here to book your car now.\r\n\r\n\r\n\r\nYOUR RESERVATION\r\n\r\nARRIVAL: 11/11/2023\r\n\r\nDEPARTURE: 11/22/2023\r\n\r\nCONFIRMATION NUMBER: 95476045\r\n\r\n\r\n\r\n*IF TRAVELLING WITH CHILDREN PLEASE CONFIRM THEIR AGES\r\n\r\n\r\n\r\n\r\n\r\n[cid:image002.png@01DA125C.D84FF1A0][cid:image003.png@01DA125C.D84FF1A0][cid:image004.png@01DA125C.D84FF1A0]\r\n\r\n\r\n[cid:image005.png@01DA125C.D84FF1A0]\r\n[cid:image006.png@01DA125C.D84FF1A0][cid:image007.png@01DA125C.D84FF1A0]\r\n[cid:image008.png@01DA125C.D84FF1A0]\n',
47
+ ]
48
+
49
+ categories = [
50
+ "Email category: 'Hotel -- Additional request of arrival time'. Email category description: 'A request from the hotel asking for the client to provide the exact or approximate check-in/arrival time as this is requested by the hotel due to different reasons. For example, the hotel does not have 24 hour reception and for this reason is asking for the arrival time. Information about the check-in helps the hotel better prepare for the guest's arrival and plan the schedule of the hotel staff.'",
51
+ "Email category: 'Hotel -- Content request '. Email category description: 'This is an email from a hotelier who has seen photos of the hotel where they work and requests that certain photos be removed, changed, or new ones added. It could also be a request to modify the description of a particular facility or service at the hotel, such as information about type of meals, the deposit amount, or parking facilities. These are important letters that help us keep the hotel photo gallery on the website up to date, ensuring that guests can be confident in what they are booking when looking at the photos. We send such letters to the content department.'",
52
+ ]
53
+
54
+ document_embeddings = model.encode(sentences)
55
+ category_embeddings = model.encode(sentences)
56
+
57
+ print(document_embeddings.shape)
58
+ # [1, 4096]
59
+
60
+ print(category_embeddings.shape)
61
+ # [2, 4096]
62
+
63
+ # Get the similarity scores for the embeddings
64
+ similarities = model.similarity(document_embeddings, category_embeddings)
65
+ ```
66
+
67
+ <!--
68
+ ### Direct Usage (Transformers)
69
+
70
+ <details><summary>Click to see the direct usage in Transformers</summary>
71
+
72
+ </details>
73
+ -->
74
+
75
+ <!--
76
+ ### Downstream Usage (Sentence Transformers)
77
+
78
+ You can finetune this model on your own dataset.
79
+
80
+ <details><summary>Click to expand</summary>
81
+
82
+ </details>
83
+ -->
84
+
85
+ <!--
86
+ ### Out-of-Scope Use
87
+
88
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
89
+ -->
90
+
91
+ <!--
92
+ ## Bias, Risks and Limitations
93
+
94
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
95
+ -->
96
+
97
+ <!--
98
+ ### Recommendations
99
+
100
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
101
+ -->
102
+
103
+ ### Training Hyperparameters
104
+ #### Non-Default Hyperparameters
105
+
106
+ - `eval_strategy`: steps
107
+ - `per_device_train_batch_size`: 6
108
+ - `per_device_eval_batch_size`: 6
109
+ - `learning_rate`: 2e-06
110
+ - `num_train_epochs`: 1
111
+ - `warmup_ratio`: 0.1
112
+ - `bf16`: True
113
+ - `load_best_model_at_end`: True
114
+ - `batch_sampler`: no_duplicates
115
+
116
+ #### All Hyperparameters
117
+ <details><summary>Click to expand</summary>
118
+
119
+ - `overwrite_output_dir`: False
120
+ - `do_predict`: False
121
+ - `eval_strategy`: steps
122
+ - `prediction_loss_only`: True
123
+ - `per_device_train_batch_size`: 6
124
+ - `per_device_eval_batch_size`: 6
125
+ - `per_gpu_train_batch_size`: None
126
+ - `per_gpu_eval_batch_size`: None
127
+ - `gradient_accumulation_steps`: 1
128
+ - `eval_accumulation_steps`: None
129
+ - `learning_rate`: 2e-06
130
+ - `weight_decay`: 0.0
131
+ - `adam_beta1`: 0.9
132
+ - `adam_beta2`: 0.999
133
+ - `adam_epsilon`: 1e-08
134
+ - `max_grad_norm`: 1.0
135
+ - `num_train_epochs`: 1
136
+ - `max_steps`: -1
137
+ - `lr_scheduler_type`: linear
138
+ - `lr_scheduler_kwargs`: {}
139
+ - `warmup_ratio`: 0.1
140
+ - `warmup_steps`: 0
141
+ - `log_level`: passive
142
+ - `log_level_replica`: warning
143
+ - `log_on_each_node`: True
144
+ - `logging_nan_inf_filter`: True
145
+ - `save_safetensors`: True
146
+ - `save_on_each_node`: False
147
+ - `save_only_model`: False
148
+ - `restore_callback_states_from_checkpoint`: False
149
+ - `no_cuda`: False
150
+ - `use_cpu`: False
151
+ - `use_mps_device`: False
152
+ - `seed`: 42
153
+ - `data_seed`: None
154
+ - `jit_mode_eval`: False
155
+ - `use_ipex`: False
156
+ - `bf16`: True
157
+ - `fp16`: False
158
+ - `fp16_opt_level`: O1
159
+ - `half_precision_backend`: auto
160
+ - `bf16_full_eval`: False
161
+ - `fp16_full_eval`: False
162
+ - `tf32`: None
163
+ - `local_rank`: 0
164
+ - `ddp_backend`: None
165
+ - `tpu_num_cores`: None
166
+ - `tpu_metrics_debug`: False
167
+ - `debug`: []
168
+ - `dataloader_drop_last`: False
169
+ - `dataloader_num_workers`: 0
170
+ - `dataloader_prefetch_factor`: None
171
+ - `past_index`: -1
172
+ - `disable_tqdm`: False
173
+ - `remove_unused_columns`: True
174
+ - `label_names`: None
175
+ - `load_best_model_at_end`: True
176
+ - `ignore_data_skip`: False
177
+ - `fsdp`: []
178
+ - `fsdp_min_num_params`: 0
179
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
180
+ - `fsdp_transformer_layer_cls_to_wrap`: None
181
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
182
+ - `deepspeed`: None
183
+ - `label_smoothing_factor`: 0.0
184
+ - `optim`: adamw_torch
185
+ - `optim_args`: None
186
+ - `adafactor`: False
187
+ - `group_by_length`: False
188
+ - `length_column_name`: length
189
+ - `ddp_find_unused_parameters`: None
190
+ - `ddp_bucket_cap_mb`: None
191
+ - `ddp_broadcast_buffers`: False
192
+ - `dataloader_pin_memory`: True
193
+ - `dataloader_persistent_workers`: False
194
+ - `skip_memory_metrics`: True
195
+ - `use_legacy_prediction_loop`: False
196
+ - `push_to_hub`: False
197
+ - `resume_from_checkpoint`: None
198
+ - `hub_model_id`: None
199
+ - `hub_strategy`: every_save
200
+ - `hub_private_repo`: False
201
+ - `hub_always_push`: False
202
+ - `gradient_checkpointing`: False
203
+ - `gradient_checkpointing_kwargs`: None
204
+ - `include_inputs_for_metrics`: False
205
+ - `eval_do_concat_batches`: True
206
+ - `fp16_backend`: auto
207
+ - `push_to_hub_model_id`: None
208
+ - `push_to_hub_organization`: None
209
+ - `mp_parameters`:
210
+ - `auto_find_batch_size`: False
211
+ - `full_determinism`: False
212
+ - `torchdynamo`: None
213
+ - `ray_scope`: last
214
+ - `ddp_timeout`: 1800
215
+ - `torch_compile`: False
216
+ - `torch_compile_backend`: None
217
+ - `torch_compile_mode`: None
218
+ - `dispatch_batches`: None
219
+ - `split_batches`: None
220
+ - `include_tokens_per_second`: False
221
+ - `include_num_input_tokens_seen`: False
222
+ - `neftune_noise_alpha`: None
223
+ - `optim_target_modules`: None
224
+ - `batch_eval_metrics`: False
225
+ - `eval_on_start`: False
226
+ - `batch_sampler`: no_duplicates
227
+ - `multi_dataset_batch_sampler`: proportional
228
+
229
+ </details>
230
+
231
+ ### Training Logs
232
+ | Epoch | Step | Training Loss | loss |
233
+ |:------:|:----:|:-------------:|:------:|
234
+ | 0.0252 | 50 | 0.5621 | - |
235
+ | 0.0504 | 100 | 0.6789 | - |
236
+ | 0.0755 | 150 | 0.7126 | - |
237
+ | 0.1007 | 200 | 0.6461 | 0.1758 |
238
+ | 0.1259 | 250 | 0.3928 | - |
239
+ | 0.1511 | 300 | 0.3786 | - |
240
+ | 0.1762 | 350 | 0.4105 | - |
241
+ | 0.2014 | 400 | 0.3354 | 0.1420 |
242
+ | 0.2266 | 450 | 0.327 | - |
243
+ | 0.2518 | 500 | 0.2494 | - |
244
+ | 0.2769 | 550 | 0.1773 | - |
245
+ | 0.3021 | 600 | 0.1215 | 0.1241 |
246
+ | 0.3273 | 650 | 0.2426 | - |
247
+ | 0.3525 | 700 | 0.2279 | - |
248
+ | 0.3776 | 750 | 0.2151 | - |
249
+ | 0.4028 | 800 | 0.2676 | 0.1216 |
250
+ | 0.4280 | 850 | 0.2645 | - |
251
+ | 0.4532 | 900 | 0.2491 | - |
252
+ | 0.4783 | 950 | 0.2945 | - |
253
+ | 0.5035 | 1000 | 0.1859 | 0.1206 |
254
+ | 0.5287 | 1050 | 0.2401 | - |
255
+ | 0.5539 | 1100 | 0.2154 | - |
256
+ | 0.5791 | 1150 | 0.1731 | - |
257
+ | 0.6042 | 1200 | 0.1942 | 0.1196 |
258
+ | 0.6294 | 1250 | 0.2643 | - |
259
+ | 0.6546 | 1300 | 0.1806 | - |
260
+ | 0.6798 | 1350 | 0.1609 | - |
261
+ | 0.7049 | 1400 | 0.1008 | 0.1187 |
262
+
263
+
264
+ ### Framework Versions
265
+ - Python: 3.10.12
266
+ - Sentence Transformers: 3.0.1
267
+ - Transformers: 4.42.4
268
+ - PyTorch: 2.2.0+cu121
269
+ - Accelerate: 0.33.0
270
+ - Datasets: 2.20.0
271
+ - Tokenizers: 0.19.1
272
+
273
+ ## Citation
274
+
275
+ ### BibTeX
276
+
277
+ #### Sentence Transformers
278
+ ```bibtex
279
+ @inproceedings{reimers-2019-sentence-bert,
280
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
281
+ author = "Reimers, Nils and Gurevych, Iryna",
282
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
283
+ month = "11",
284
+ year = "2019",
285
+ publisher = "Association for Computational Linguistics",
286
+ url = "https://arxiv.org/abs/1908.10084",
287
+ }
288
+ ```
289
+
290
+ #### CachedMultipleNegativesRankingLoss
291
+ ```bibtex
292
+ @misc{gao2021scaling,
293
+ title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
294
+ author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
295
+ year={2021},
296
+ eprint={2101.06983},
297
+ archivePrefix={arXiv},
298
+ primaryClass={cs.LG}
299
+ }
300
+ ```
301
+
302
+ <!--
303
+ ## Glossary
304
+
305
+ *Clearly define terms in order to be accessible across audiences.*
306
+ -->
307
+
308
+ <!--
309
+ ## Model Card Authors
310
+
311
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
312
+ -->
313
+
314
+ <!--
315
+ ## Model Card Contact
316
+
317
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
318
+ -->
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|endoftext|>": 151643,
3
+ "<|im_end|>": 151645,
4
+ "<|im_start|>": 151644
5
+ }
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "asbabiy/crm-mail-embedder-v3",
3
+ "architectures": [
4
+ "Qwen2Model"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoModel": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2Model",
9
+ "AutoModelForCausalLM": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2ForCausalLM",
10
+ "AutoModelForSequenceClassification": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2ForSequenceClassification"
11
+ },
12
+ "bos_token_id": 151643,
13
+ "eos_token_id": 151643,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 1536,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 8960,
18
+ "max_position_embeddings": 131072,
19
+ "max_window_layers": 21,
20
+ "model_type": "qwen2",
21
+ "num_attention_heads": 12,
22
+ "num_hidden_layers": 28,
23
+ "num_key_value_heads": 2,
24
+ "rms_norm_eps": 1e-06,
25
+ "rope_theta": 1000000.0,
26
+ "sliding_window": null,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.43.2",
30
+ "use_cache": true,
31
+ "use_sliding_window": false,
32
+ "vocab_size": 151646
33
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.43.2",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {
8
+ "mail_reason": "Instruct: Given an email, retrieve relevant email categories and their descriptions that describe email contents.\nQuery: "
9
+ },
10
+ "default_prompt_name": "mail_reason",
11
+ "similarity_fn_name": null
12
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c39fe2a36724c00f4949950b489efdfc48f3831b8e7ec1e434b4ba39a8baaaa
3
+ size 3086574240
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 32768,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "eos_token": {
7
+ "content": "<|endoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "pad_token": {
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ }
20
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_eos_token": true,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<|im_start|>",
32
+ "<|im_end|>"
33
+ ],
34
+ "auto_map": {
35
+ "AutoTokenizer": [
36
+ "Alibaba-NLP/gte-Qwen2-1.5B-instruct--tokenization_qwen.Qwen2Tokenizer",
37
+ "Alibaba-NLP/gte-Qwen2-1.5B-instruct--tokenization_qwen.Qwen2TokenizerFast"
38
+ ]
39
+ },
40
+ "bos_token": null,
41
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
42
+ "clean_up_tokenization_spaces": false,
43
+ "eos_token": "<|endoftext|>",
44
+ "errors": "replace",
45
+ "max_length": 32768,
46
+ "model_max_length": 32768,
47
+ "pad_to_multiple_of": null,
48
+ "pad_token": "<|endoftext|>",
49
+ "pad_token_type_id": 0,
50
+ "padding_side": "right",
51
+ "split_special_tokens": false,
52
+ "stride": 0,
53
+ "tokenizer_class": "Qwen2Tokenizer",
54
+ "truncation_side": "right",
55
+ "truncation_strategy": "longest_first",
56
+ "unk_token": null
57
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff