wi-lab/lwm · Generalization of LWM

Feb 27

Dear LWM team,

I have the following questions regarding the generalization of LWM that I hope you could help clarify.

According to the original paper, the LWM data comes from an OFDM signal with 32 antennas and 32 subcarriers. Additionally, based on my inspection of the source code, the bandwidth appears to be 32 kHz, and the number of paths is set to 20. My question is whether the LWM pre-trained model can support DeepMIMO channels with fewer than 32 antennas or fewer than 32 subcarriers (by padding with zeros to meet the (1, 32, 32) requirement). Similarly, can it support other bandwidths or a different number of paths, such as 1?

I look forward to your response.

Thank you for your time!

LogINNN

Feb 27

I have one small follow-up question: Is the dataset currently being released intended for downstream tasks, or is it for training the lwm-tiny model? Since this dataset only contains urban scenarios, I'm not sure if the current lwm-tiny model supports O1 scenarios

wi-lab

Owner Feb 28

•

edited Feb 28

Hello @LogINNN ,

Thanks for your interest! You have done a great job inspecting the source code.

The current LWM model is pre-trained only on channels of size (32, 32), with 32 antennas at the base station, single-antenna users, and 32 subcarriers. Channels of other sizes, whether padded or sub-sampled, are not guaranteed to produce embeddings as representative as those for (32, 32) channels. However, in an upcoming release, this limitation has been removed, allowing you to feed channels of arbitrary sizes into the model.
The model has demonstrated generalizability beyond its pre-training settings, which include 20 propagation paths, a 3.5 GHz frequency, and a 32 kHz bandwidth. While these were the conditions used in training, the model has been evaluated on other settings with promising results. You can generate scenarios with different configurations and assess the performance of the embeddings in downstream tasks. For downstream model training, refer to the tutorial.py file in the repository. This will guide you on how to train models with and without LWM embeddings and compare the results.
The datasets in the dataset repository are only examples for downstream tasks, as they were not used in pre-training. You can use them for downstream task evaluations. The actual pre-training scenarios are mentioned in the LWM paper, so to ensure fairness, avoid using them for downstream model training, as the model is already highly familiar with them. If you still wish to use them, make sure to select different base stations. For instance, the O1 scenario includes multiple base stations, but only one was used for pre-training. The specific base station indices can be found in input_preprocess.py. The pre-training dataset is too large to be shared explicitly and was generated on-the-fly. However, you can generate it yourself using the scripts available in the repository. More details on this process can be found in the following discussion: Hugging Face Discussion.

Let us know if you have any further questions!

LogINNN

Feb 28

Thank you for your professional response! I have no further questions and look forward to the upcoming release of the new LWM version!

LogINNN changed discussion status to closed Feb 28

LogINNN changed discussion status to open Mar 3

LogINNN

Mar 3

Sorry to bother you again, but I have a new question and would appreciate your advice.

I noticed that LWM's training data is directly based on BS-user channels generated by DeepMIMO. I would like to ask whether LWM, through fine-tuning, can effectively extract features from the following two types of channel data:

1.BS-BS channels generated by DeepMIMO (32 subcarriers and 32 transmit antennas).
2.Cascaded channels, such as:
Channel 1: A 32-subcarrier, 32-transmit antenna, 32-receive antenna channel generated by DeepMIMO.
Channel 2: A 32-subcarrier, 32-transmit antenna, 1-receive antenna channel generated by DeepMIMO.
Final channel: The product of Channel 1 and Channel 2.
I look forward to your response. Thank you!

LogINNN

Mar 4

I have another significant confusion that I would like to consult with you. I noticed that in both beam prediction and beamforming tasks, the output dimension is (32, 1), which implies that the same beam is used across all carriers. My understanding is that the output should be (32, 32), meaning that beam selection and beamforming design should be performed independently for each carrier. Could you clarify this discrepancy? Thank you!

wi-lab

Owner Mar 20

•

edited Mar 20

Hello @LogINNN ,

Thank you for your interesting question. Please find our clarifications below:

LWM is trained on BS-user channels, so its effectiveness on BS-BS and cascaded channels is not guaranteed. The model's learned representations are optimized for BS-user propagation, which differs from BS-BS and cascaded channels. However, you can try using LWM embeddings in your task and evaluate their performance.
BS-BS channels may still share useful spatial correlations with BS-user channels, but fine-tuning is recommended. While BS-BS channels have different propagation characteristics, LWM’s embeddings might still extract meaningful features. Fine-tuning on BS-BS data can help the model adapt better.
Cascaded channels introduce a fundamentally different structure, making direct application of LWM less optimal. The product of two separate channels results in altered rank constraints and different statistical properties. Since LWM has not seen such compositions, fine-tuning on cascaded data is necessary.
Fine-tuning the final layers of LWM jointly with the downstream model can improve performance. You can unfreeze the final layers of LWM and fine-tune them while keeping the lower layers frozen to adapt the model to new channel properties.

We will release the next version of LWM next week, allowing you to easily fine-tune it for your specific input data or task. We will also provide some videos that address your questions.

Let us know if anything needs further clarification.

wi-lab

Owner Mar 20

Thank you @LogINNN for another interesting question. Please find our clarifications below:

Beam selection is applied across all subcarriers, meaning a single beam is chosen for the entire bandwidth. The output dimension being (32,1) instead of (32,32) indicates that the same beam is used for all subcarriers rather than selecting a different beam for each one. This approach assumes that the optimal beam remains consistent across all frequencies, effectively treating the system as frequency-flat. In the beam prediction task, the best beam is chosen by first applying beamformers to the channel, then averaging power across all subcarriers, and finally selecting the beam with the highest average power. This method simplifies computation and reduces feedback overhead but may not fully exploit frequency diversity, especially in wideband systems where subcarriers can experience different optimal beam directions.
This approach is reasonable for frequency-flat channels but may be suboptimal for frequency-selective environments. If the channel varies significantly across subcarriers, different beams could be optimal at different frequencies, making per-subcarrier selection more effective.
Alternative methods like per-subcarrier beam selection or hybrid strategies can improve performance. Instead of averaging, selecting the best beam per subcarrier, which results in a (32,32) output, or using hybrid methods like majority voting or weighted averaging can better account for frequency diversity.
Labeling methods can still be modified if reasonable. If a different labeling strategy is preferred, it can be applied as long as it is justified. LWM embeddings can serve as a proxy to map the original raw channels to the new labels.

Let us know if you need further clarification!

LogINNN

Apr 24

Hello! I'm glad to see the release of LWM 1.1 — congratulations on the update.

I have one question I’d like to clarify with you. According to the latest statement, it seems that the current version of LWM only requires the input dimension to satisfy $N \times SC < 8196$, and supports arbitrary combinations of $(N, SC)$. However, based on the pretraining dataset you described, it appears that only a limited set of configurations were used, specifically:
$N = {8, 16, 32, 64, 128}$ and $SC = {32, 64, 128, 256, 512, 1024}$.

Given this, I’m wondering: in scenarios similar to DeepMIMO, if we use input configurations outside this set (e.g., $(32, 1)$), would it still be necessary to fine-tune LWM for optimal performance?

wi-lab

Owner 25 days ago

•

edited 25 days ago

Thank you @LogINNN ! We are glad you are exploring LWM 1.1 and appreciate your thoughtful question.

LWM 1.1 generalizes well to new input configurations under a size constraint. While LWM 1.1 was pre-trained on 20 specific combinations of antenna and subcarrier values, we have observed strong generalization to unseen configurations—as long as the total number of elements satisfies. This allows users to experiment with a wide range of input shapes without retraining from scratch.

There is a minimum subcarrier requirement due to the 2D patching strategy. Unlike LWM 1.0, which used 1D patching, LWM 1.1 segments the channel into 2D patches of size (4, 4), grouping 4 antenna elements and 4 subcarriers per patch. Because of this, your input must include at least 4 subcarriers; otherwise, patching becomes infeasible or error-prone.

You can fine-tune LWM 1.1 if needed for configurations like (32, 1). If you are working with narrowband inputs such as , and either encounter errors or observe suboptimal performance, you can easily fine-tune the model using LWM 1.1's flexible fine-tuning framework. This allows the model to adapt effectively to your custom input shape.

Let us know if you have further questions—we are happy to assist!

LogINNN changed discussion status to closed 10 days ago