Image Audio Alingment Train OpenClip With Features
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +3 -0
- Vaani/.vscode/settings.json +4 -0
- Vaani/Img_Audio_Alignment/Train_OpenClip/_2.1.1_Train_OpenCLIP.py +0 -0
- Vaani/Img_Audio_Alignment/_2.1.1_Train_OpenCLIP.py +34 -16
- Vaani/Img_Audio_Alignment/_2.1.2_OpenCLIP_Image_Features.ipynb +0 -0
- Vaani/Img_Audio_Alignment/_2.1.2_OpenCLIP_Image_Features.py +0 -0
- Vaani/Img_Audio_Alignment/_2.1.2_Train_OpenCLIP.py +0 -0
- Vaani/Img_Audio_Alignment/_2_Image_data.ipynb +0 -0
- Vaani/Img_Audio_Alignment/available_img_audios_TEST3.csv +0 -0
- Vaani/Img_Audio_Alignment/available_img_audios_TRAIN3.csv +3 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/_2.1.2_Train_OpenCLIP.py +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_best_epoch_539.pt +3 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_best_epoch_567.pt +3 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_epoch_572_best_epoch_-1.pt +3 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_epoch_573_best_epoch_-1.pt +3 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_101_loss_4.1500.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_102_loss_4.1500.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_103_loss_4.1499.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_104_loss_4.1499.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_105_loss_4.1498.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_107_loss_4.1498.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_108_loss_4.1497.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_109_loss_4.1497.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_110_loss_4.1496.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_111_loss_4.1496.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_113_loss_4.1495.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_114_loss_4.1495.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_115_loss_4.1494.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_116_loss_4.1494.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_117_loss_4.1494.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_118_loss_4.1493.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_120_loss_4.1493.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_121_loss_4.1492.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_122_loss_4.1492.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_123_loss_4.1491.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_125_loss_4.1491.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_126_loss_4.1490.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_128_loss_4.1490.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_129_loss_4.1489.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_130_loss_4.1489.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_131_loss_4.1489.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_132_loss_4.1488.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_133_loss_4.1488.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_134_loss_4.1487.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_135_loss_4.1487.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_136_loss_4.1487.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_137_loss_4.1486.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_138_loss_4.1486.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_139_loss_4.1486.png +0 -0
- Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_140_loss_4.1485.png +0 -0
.gitattributes
CHANGED
@@ -420,3 +420,6 @@ Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_85_loss_4.1363.png
|
|
420 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_8_loss_4.1522.png filter=lfs diff=lfs merge=lfs -text
|
421 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_96_loss_4.1361.png filter=lfs diff=lfs merge=lfs -text
|
422 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_9_loss_4.1521.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
420 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_8_loss_4.1522.png filter=lfs diff=lfs merge=lfs -text
|
421 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_96_loss_4.1361.png filter=lfs diff=lfs merge=lfs -text
|
422 |
Vaani/Img_Audio_Alignment/checkpoints/csip/probs/probs_epoch_9_loss_4.1521.png filter=lfs diff=lfs merge=lfs -text
|
423 |
+
Vaani/Img_Audio_Alignment/available_img_audios_TRAIN3.csv filter=lfs diff=lfs merge=lfs -text
|
424 |
+
Vaani/Img_Audio_Alignment/dog[[:space:]]and[[:space:]]cat.png filter=lfs diff=lfs merge=lfs -text
|
425 |
+
Vaani/SDFT/SD2_1_Audio/dog[[:space:]]and[[:space:]]cat.png filter=lfs diff=lfs merge=lfs -text
|
Vaani/.vscode/settings.json
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"editor.fontFamily": "JetBrains Mono Light",
|
3 |
+
"svg.preview.background": "black"
|
4 |
+
}
|
Vaani/Img_Audio_Alignment/Train_OpenClip/_2.1.1_Train_OpenCLIP.py
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/_2.1.1_Train_OpenCLIP.py
CHANGED
@@ -5989,7 +5989,8 @@ def load_checkpoint(
|
|
5989 |
|
5990 |
# /home/IITB/ai-at-ieor/23m1521/.conda/envs/openclip2/lib/python3.11/site-packages/open_clip/factory.py
|
5991 |
HF_HUB_PREFIX = 'hf-hub:'
|
5992 |
-
_MODEL_CONFIG_PATHS = [Path(__file__).parent / f"model_configs/"]
|
|
|
5993 |
_MODEL_CONFIGS = {} # directory (model_name: config) of model architecture configs
|
5994 |
|
5995 |
import json
|
@@ -6952,15 +6953,24 @@ class VaaniImageAudioDataset(torch.utils.data.Dataset):
|
|
6952 |
|
6953 |
def __getitem__(self, idx):
|
6954 |
return {
|
6955 |
-
'image_path':
|
6956 |
'audio_path': self.audio_paths[idx]
|
6957 |
}
|
6958 |
|
6959 |
|
6960 |
def collate_fn(batch):
|
6961 |
-
image_tensor = [item['image_path'] for item in batch]
|
6962 |
-
|
6963 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6964 |
|
6965 |
|
6966 |
|
@@ -6996,8 +7006,10 @@ test_dataloader = torch.utils.data.DataLoader(
|
|
6996 |
)
|
6997 |
|
6998 |
batch = next(iter(train_dataloader))
|
6999 |
-
image_tensor_batch = batch['image_tensor']
|
7000 |
-
audio_tensor_batch = batch['audio_tensor']
|
|
|
|
|
7001 |
print("Image batch shape:", image_tensor_batch.shape) # [BATCH_SIZE, 3, 224, 224]
|
7002 |
print("Audio batch shape:", audio_tensor_batch.shape) # [BATCH_SIZE, 1, 44100]
|
7003 |
|
@@ -7063,8 +7075,8 @@ def load_checkpoint(checkpoint_dir, model, optimizer, scheduler):
|
|
7063 |
path = os.path.join(checkpoint_dir, best_ckpt)
|
7064 |
checkpoint = torch.load(path)
|
7065 |
model.load_state_dict(checkpoint['model_state'])
|
7066 |
-
optimizer.load_state_dict(checkpoint['optimizer_state'])
|
7067 |
-
scheduler.load_state_dict(checkpoint['scheduler_state'])
|
7068 |
start_epoch = checkpoint['epoch']
|
7069 |
best_loss = checkpoint['best_loss']
|
7070 |
print(f"Resumed training from epoch {start_epoch+1} with best loss {best_loss:.4f}")
|
@@ -7187,7 +7199,8 @@ def train_model(model, train_loader, test_loader,
|
|
7187 |
writer.add_scalar("Similarity/Test", avg_test_sim, epoch + 1)
|
7188 |
writer.add_scalar("Learning Rate", current_lr, epoch + 1)
|
7189 |
|
7190 |
-
print(f"
|
|
|
7191 |
f"LR: {current_lr:.2e} |"
|
7192 |
f"Similarity: ({avg_train_sim:.4f}, {avg_test_sim:.4f})"
|
7193 |
f"| I-Loss: ({avg_train_i_loss:.4f}, {avg_test_i_loss:.4f})"
|
@@ -7224,16 +7237,21 @@ def train_model(model, train_loader, test_loader,
|
|
7224 |
|
7225 |
|
7226 |
model_name = "csip_model_openClip_CLAP"
|
|
|
|
|
7227 |
learning_rate = 1e-5
|
7228 |
-
epochs = 100
|
7229 |
optimizer = torch.optim.AdamW(csip_model.parameters(), lr=learning_rate)
|
7230 |
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs, eta_min=1e-10)
|
7231 |
|
7232 |
-
|
7233 |
-
|
7234 |
-
|
7235 |
-
|
7236 |
-
|
|
|
|
|
|
|
|
|
7237 |
|
7238 |
train_model(
|
7239 |
model=csip_model,
|
|
|
5989 |
|
5990 |
# /home/IITB/ai-at-ieor/23m1521/.conda/envs/openclip2/lib/python3.11/site-packages/open_clip/factory.py
|
5991 |
HF_HUB_PREFIX = 'hf-hub:'
|
5992 |
+
# _MODEL_CONFIG_PATHS = [Path(__file__).parent / f"model_configs/"]
|
5993 |
+
_MODEL_CONFIG_PATHS = [Path("/home/IITB/ai-at-ieor/23m1521/ashish/MTP/Vaani/Img_Audio_Alignment/model_configs")]
|
5994 |
_MODEL_CONFIGS = {} # directory (model_name: config) of model architecture configs
|
5995 |
|
5996 |
import json
|
|
|
6953 |
|
6954 |
def __getitem__(self, idx):
|
6955 |
return {
|
6956 |
+
'image_path': self.image_paths[idx],
|
6957 |
'audio_path': self.audio_paths[idx]
|
6958 |
}
|
6959 |
|
6960 |
|
6961 |
def collate_fn(batch):
|
6962 |
+
image_tensor = [open_clip_imgaug(Image.open(item['image_path'])) for item in batch]
|
6963 |
+
image_paths = [item['image_path'] for item in batch]
|
6964 |
+
|
6965 |
+
audio_paths = [item['audio_path'] for item in batch]
|
6966 |
+
audio_tensor = CLAPAudioProcessor(audio_paths, resample=True)
|
6967 |
+
|
6968 |
+
return {
|
6969 |
+
'image_tensor': torch.stack(image_tensor),
|
6970 |
+
'image_paths': image_paths,
|
6971 |
+
'audio_tensor': audio_tensor,
|
6972 |
+
'audio_paths': audio_paths
|
6973 |
+
}
|
6974 |
|
6975 |
|
6976 |
|
|
|
7006 |
)
|
7007 |
|
7008 |
batch = next(iter(train_dataloader))
|
7009 |
+
image_tensor_batch = batch['image_tensor'].to(device=device)
|
7010 |
+
audio_tensor_batch = batch['audio_tensor'].to(device=device)
|
7011 |
+
image_paths_batch = batch['image_paths']
|
7012 |
+
audio_paths_batch = batch['audio_paths']
|
7013 |
print("Image batch shape:", image_tensor_batch.shape) # [BATCH_SIZE, 3, 224, 224]
|
7014 |
print("Audio batch shape:", audio_tensor_batch.shape) # [BATCH_SIZE, 1, 44100]
|
7015 |
|
|
|
7075 |
path = os.path.join(checkpoint_dir, best_ckpt)
|
7076 |
checkpoint = torch.load(path)
|
7077 |
model.load_state_dict(checkpoint['model_state'])
|
7078 |
+
# optimizer.load_state_dict(checkpoint['optimizer_state'])
|
7079 |
+
# scheduler.load_state_dict(checkpoint['scheduler_state'])
|
7080 |
start_epoch = checkpoint['epoch']
|
7081 |
best_loss = checkpoint['best_loss']
|
7082 |
print(f"Resumed training from epoch {start_epoch+1} with best loss {best_loss:.4f}")
|
|
|
7199 |
writer.add_scalar("Similarity/Test", avg_test_sim, epoch + 1)
|
7200 |
writer.add_scalar("Learning Rate", current_lr, epoch + 1)
|
7201 |
|
7202 |
+
print(f"\n\n |"
|
7203 |
+
f"Epoch {epoch+1} | Loss: ({avg_train_loss:.4f}, {avg_test_loss:.4f}, {best_loss:.4f}) |"
|
7204 |
f"LR: {current_lr:.2e} |"
|
7205 |
f"Similarity: ({avg_train_sim:.4f}, {avg_test_sim:.4f})"
|
7206 |
f"| I-Loss: ({avg_train_i_loss:.4f}, {avg_test_i_loss:.4f})"
|
|
|
7237 |
|
7238 |
|
7239 |
model_name = "csip_model_openClip_CLAP"
|
7240 |
+
epochs = 500
|
7241 |
+
|
7242 |
learning_rate = 1e-5
|
|
|
7243 |
optimizer = torch.optim.AdamW(csip_model.parameters(), lr=learning_rate)
|
7244 |
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs, eta_min=1e-10)
|
7245 |
|
7246 |
+
# learning_rate = 1e-3
|
7247 |
+
# optimizer = torch.optim.AdamW(csip_model.parameters(), lr=learning_rate)
|
7248 |
+
# scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2, eta_min=1e-5)
|
7249 |
+
|
7250 |
+
# subprocess.run([
|
7251 |
+
# "rm",
|
7252 |
+
# "-rf",
|
7253 |
+
# f"/home/IITB/ai-at-ieor/23m1521/ashish/MTP/Vaani/Img_Audio_Alignment/{model_name}",
|
7254 |
+
# ])
|
7255 |
|
7256 |
train_model(
|
7257 |
model=csip_model,
|
Vaani/Img_Audio_Alignment/_2.1.2_OpenCLIP_Image_Features.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/_2.1.2_OpenCLIP_Image_Features.py
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/_2.1.2_Train_OpenCLIP.py
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/_2_Image_data.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/available_img_audios_TEST3.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/available_img_audios_TRAIN3.csv
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:719e5fef279658370c6f754a34c8bc1ba1af4fe259c46b53002b3932a1889e89
|
3 |
+
size 17466633
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/_2.1.2_Train_OpenCLIP.py
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_best_epoch_539.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b03f512670a8be47816cdb9132322ce8eddc5fe76c289ffd95dbd887a85bfddd
|
3 |
+
size 2691054590
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_best_epoch_567.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fd8f88bdc9e15be4d9af9d1378a8019072cf78a566dd6c78d2e113c96603a4c2
|
3 |
+
size 2691054590
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_epoch_572_best_epoch_-1.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b88cd9caf9c0049c9c5c97c120df6bfbfda1aa5d27aa23e0d06816503816226b
|
3 |
+
size 2691066446
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/csip_epoch_573_best_epoch_-1.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2560edf996a13c7874a2e3fa461d0342b82aa4824f0e48f960e4c6ef487cc03d
|
3 |
+
size 2691066446
|
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_101_loss_4.1500.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_102_loss_4.1500.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_103_loss_4.1499.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_104_loss_4.1499.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_105_loss_4.1498.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_107_loss_4.1498.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_108_loss_4.1497.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_109_loss_4.1497.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_110_loss_4.1496.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_111_loss_4.1496.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_113_loss_4.1495.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_114_loss_4.1495.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_115_loss_4.1494.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_116_loss_4.1494.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_117_loss_4.1494.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_118_loss_4.1493.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_120_loss_4.1493.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_121_loss_4.1492.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_122_loss_4.1492.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_123_loss_4.1491.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_125_loss_4.1491.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_126_loss_4.1490.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_128_loss_4.1490.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_129_loss_4.1489.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_130_loss_4.1489.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_131_loss_4.1489.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_132_loss_4.1488.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_133_loss_4.1488.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_134_loss_4.1487.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_135_loss_4.1487.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_136_loss_4.1487.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_137_loss_4.1486.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_138_loss_4.1486.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_139_loss_4.1486.png
ADDED
![]() |
Vaani/Img_Audio_Alignment/csip_model_openClip_CLAP/checkpoints/csip/logits/raw_logits_epoch_140_loss_4.1485.png
ADDED
![]() |