Model Card for ldp72/Test-Qwen-Marcel.5-0.5B-it

This model was finetuned by performing instruct tuning on Telco domain datatsets.

Model Details

Model Description

Developed by: [More Information Needed]
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: [More Information Needed]
Language(s) (NLP): English
License: [More Information Needed]
Finetuned from model [optional]: ['Qwen/Qwen2.5-0.5B']
Date [optional]: 2025-07-16 14:40:15

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be used with the transformers library using pipeline abstraction as follows:

import torch
from transformers import pipeline

model_id = "ldp72/Test-Qwen-Marcel.5-0.5B-it"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are chatbot specialized on Telco domain."},
{"role": "user", "content": "Can you give a sample of your specialized knowledge?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

This model was finetuned with Orange internal fine tuning tools with the Docker Image tagged 0.1.1 in the registry and the following configuration file:

data:
dataset_name:
train:
-   path: telco-lm/arxiv-abstract-generation-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
revision: legacy
-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
revision: legacy
validation_abstract_generation:
-   path: telco-lm/arxiv-abstract-generation-telco-instructions
revision: legacy
split: validation
validation_general:
-   path: telco-lm/slim-orca-multi-task-general-instructions
revision: legacy
split: validation
validation_synthetic:
-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
revision: legacy
split: validation
-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
revision: legacy
split: validation
validation_telco_qa:
-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
revision: legacy
split: validation
validation_telco_qcm:
-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
revision: legacy
split: validation
debug: true
implementation_name: instructions
description:
contributors:
-   email: [email protected]
first_name: Loïc
last_name: Fosse
-   email: [email protected]
first_name: Lionel
last_name: Delphin-Poulat
-   email: [email protected]
first_name: Ismaël
last_name: Rousseau
domain: Telco
languages:
- en
model_name: ldp72/Test-Qwen-Marcel.5-0.5B-it
image:
version: 0.1.1
model:
attn_implementation: flash_attention_2
chat_template_tokenizer: Qwen/Qwen2.5-0.5B-Instruct
model_name_or_path: Qwen/Qwen2.5-0.5B
trust_remote_code: true
training:
bf16: true
dataloader_num_workers: 4
dataloader_persistent_workers: true
dataloader_pin_memory: true
dataloader_prefetch_factor: 2
disable_tqdm: true
eval_accumulation_steps: 1
eval_steps: 10
eval_strategy: steps
fp16: false
gradient_accumulation_steps: 2
gradient_checkpointing: true
group_by_length: false
learning_rate: 2.0e-05
log_level: debug
logging_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2/logs
logging_steps: 10
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_steps: -1
num_train_epochs: 2
optim: paged_adamw_32bit
output_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2
per_device_eval_batch_size: 2
per_device_train_batch_size: 2
push_to_hub: false
report_to: tensorboard
save_steps: 0
save_strategy: epoch
save_total_limit: 1
seed: 42
torch_compile: false
training_type: instruct-tuning
use_liger_kernel: false
warmup_ratio: 0.05
weight_decay: 0.1

Training Data

This model was trained on the following datasets:

-   path: telco-lm/arxiv-abstract-generation-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
revision: legacy
-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
revision: legacy
-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
revision: legacy

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: This model was trained with the following hyperparameters for SFTTrainer,other parameters were set as default:

bf16: true
dataloader_num_workers: 4
dataloader_persistent_workers: true
dataloader_pin_memory: true
dataloader_prefetch_factor: 2
disable_tqdm: true
eval_accumulation_steps: 1
eval_steps: 10
eval_strategy: steps
fp16: false
gradient_accumulation_steps: 2
gradient_checkpointing: true
group_by_length: false
learning_rate: 2.0e-05
log_level: debug
logging_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2/logs
logging_steps: 10
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_steps: -1
num_train_epochs: 2
optim: paged_adamw_32bit
output_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2
per_device_eval_batch_size: 2
per_device_train_batch_size: 2
push_to_hub: false
report_to: tensorboard
save_steps: 0
save_strategy: epoch
save_total_limit: 1
seed: 42
torch_compile: false
use_liger_kernel: false
warmup_ratio: 0.05
weight_decay: 0.1
``` <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary



## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

Thanks to [Loïc Fosse](mailto:[email protected]), [Lionel Delphin-Poulat](mailto:[email protected]), [Ismaël Rousseau](mailto:[email protected]) for adding this model.

ldp72
/

Test-Qwen-Marcel

Model Card for ldp72/Test-Qwen-Marcel.5-0.5B-it

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Model tree for ldp72/Test-Qwen-Marcel