[01:12:47] Epoch 33/50 - Loss: 0.3897 - Accuracy: 0.99
[01:12:49] Epoch 34/50 - Loss: 0.2919 - Accuracy: 0.99
[01:12:50] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:12:51] Epoch 35/50 - Loss: 0.2199 - Accuracy: 0.99
[01:12:52] [INFO] Running validation step...
[01:12:53] [INFO] Loading dataset 'banking-conversations-v2'...
[01:12:54] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:12:55] Epoch 36/50 - Loss: 0.2164 - Accuracy: 0.99
[01:12:55] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:12:57] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:12:58] Epoch 37/50 - Loss: 0.1262 - Accuracy: 0.99
[01:12:58] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:00] Epoch 38/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:00] [INFO] Running validation step...
[01:13:02] Epoch 39/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:02] [INFO] Initializing distributed training across 4 GPUs...
[01:13:04] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:04] Epoch 40/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:05] [INFO] Garbage collection freed 1.2GB of memory.
[01:13:06] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:07] Epoch 41/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:08] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:13:08] Epoch 42/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:10] Epoch 43/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:10] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:13:11] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:13:12] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:13] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:14] Epoch 2/50 - Loss: 2.4694 - Accuracy: 0.42
[01:13:15] [INFO] Loading dataset 'banking-conversations-v2'...
[01:13:16] Epoch 3/50 - Loss: 2.4270 - Accuracy: 0.44
[01:13:17] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:19] Epoch 4/50 - Loss: 2.2779 - Accuracy: 0.45
[01:13:19] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:20] Epoch 5/50 - Loss: 2.2560 - Accuracy: 0.45
[01:13:21] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:22] Epoch 6/50 - Loss: 2.2211 - Accuracy: 0.46
[01:13:22] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:23] Epoch 7/50 - Loss: 2.0745 - Accuracy: 0.49
[01:13:24] Epoch 8/50 - Loss: 1.9879 - Accuracy: 0.53
[01:13:25] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:13:26] [INFO] Running validation step...
[01:13:26] Epoch 9/50 - Loss: 1.8387 - Accuracy: 0.57
[01:13:27] [INFO] Initializing distributed training across 4 GPUs...
[01:13:28] [INFO] Running validation step...
[01:13:29] [INFO] Running validation step...
[01:13:29] Epoch 10/50 - Loss: 1.7025 - Accuracy: 0.61
[01:13:30] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:13:32] Epoch 11/50 - Loss: 1.6099 - Accuracy: 0.64
[01:13:32] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:13:33] Epoch 12/50 - Loss: 1.5312 - Accuracy: 0.65
[01:13:34] Epoch 2/50 - Loss: 2.4285 - Accuracy: 0.43
[01:13:35] [INFO] Initializing distributed training across 4 GPUs...
[01:13:35] [INFO] Running validation step...
[01:13:37] Epoch 3/50 - Loss: 2.3220 - Accuracy: 0.46
[01:13:37] Epoch 13/50 - Loss: 1.4955 - Accuracy: 0.65
[01:13:38] [INFO] Initializing distributed training across 4 GPUs...
[01:13:39] Epoch 14/50 - Loss: 1.4407 - Accuracy: 0.65
[01:13:40] Epoch 4/50 - Loss: 2.2552 - Accuracy: 0.50
[01:13:40] Epoch 15/50 - Loss: 1.4209 - Accuracy: 0.66
[01:13:42] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:13:42] Epoch 16/50 - Loss: 1.3548 - Accuracy: 0.68
[01:13:44] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:13:45] [INFO] Initializing distributed training across 4 GPUs...
[01:13:45] [INFO] Garbage collection freed 1.2GB of memory.
[01:13:46] [INFO] Running validation step...
[01:13:47] Epoch 17/50 - Loss: 1.3512 - Accuracy: 0.71
[01:13:48] [INFO] Running validation step...
[01:13:48] Epoch 18/50 - Loss: 1.3180 - Accuracy: 0.76
[01:13:50] Epoch 5/50 - Loss: 2.1316 - Accuracy: 0.52
[01:13:50] Epoch 19/50 - Loss: 1.2337 - Accuracy: 0.78
[01:13:52] Epoch 6/50 - Loss: 2.1187 - Accuracy: 0.56
[01:13:53] Epoch 20/50 - Loss: 1.0849 - Accuracy: 0.80
[01:13:53] Epoch 7/50 - Loss: 2.0365 - Accuracy: 0.57
[01:13:55] [INFO] Loading dataset 'banking-conversations-v2'...
[01:13:55] Epoch 21/50 - Loss: 1.0470 - Accuracy: 0.82
[01:13:56] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:13:56] Epoch 8/50 - Loss: 1.9370 - Accuracy: 0.62
[01:13:58] Epoch 22/50 - Loss: 0.9740 - Accuracy: 0.84
[01:13:59] Epoch 9/50 - Loss: 1.8566 - Accuracy: 0.66
[01:14:00] Epoch 10/50 - Loss: 1.8458 - Accuracy: 0.71
[01:14:00] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:14:01] Epoch 11/50 - Loss: 1.7259 - Accuracy: 0.73
[01:14:02] [INFO] Running validation step...
[01:14:02] Epoch 12/50 - Loss: 1.6758 - Accuracy: 0.74
[01:14:03] Epoch 13/50 - Loss: 1.6064 - Accuracy: 0.76
[01:14:03] Epoch 23/50 - Loss: 0.8947 - Accuracy: 0.87
[01:14:05] Epoch 24/50 - Loss: 0.7626 - Accuracy: 0.88
[01:14:05] [INFO] Initializing distributed training across 4 GPUs...
[01:14:07] Epoch 25/50 - Loss: 0.7176 - Accuracy: 0.89
[01:14:07] [INFO] Garbage collection freed 1.2GB of memory.
[01:14:08] Epoch 26/50 - Loss: 0.6707 - Accuracy: 0.92
[01:14:09] Epoch 14/50 - Loss: 1.5344 - Accuracy: 0.77
[01:14:10] Epoch 27/50 - Loss: 0.5973 - Accuracy: 0.93
[01:14:11] Epoch 28/50 - Loss: 0.5047 - Accuracy: 0.96
[01:14:11] [INFO] Garbage collection freed 1.2GB of memory.
[01:14:12] Epoch 29/50 - Loss: 0.4328 - Accuracy: 0.99
[01:14:13] Epoch 15/50 - Loss: 1.4372 - Accuracy: 0.78
[01:14:14] Epoch 30/50 - Loss: 0.4312 - Accuracy: 0.99
[01:14:14] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:14:15] Epoch 31/50 - Loss: 0.3048 - Accuracy: 0.99
[01:14:16] Epoch 16/50 - Loss: 1.4003 - Accuracy: 0.82
[01:14:16] Epoch 32/50 - Loss: 0.2017 - Accuracy: 0.99
[01:14:18] Epoch 17/50 - Loss: 1.3978 - Accuracy: 0.87
[01:14:18] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:14:20] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:14:20] Epoch 33/50 - Loss: 0.1480 - Accuracy: 0.99
[01:14:21] [INFO] Loading dataset 'banking-conversations-v2'...
[01:14:22] Epoch 18/50 - Loss: 1.3624 - Accuracy: 0.88
[01:14:23] Epoch 34/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:24] Epoch 19/50 - Loss: 1.2724 - Accuracy: 0.91
[01:14:24] Epoch 35/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:26] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:14:27] Epoch 36/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:27] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:14:28] Epoch 37/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:29] [INFO] Loading dataset 'banking-conversations-v2'...
[01:14:29] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:14:31] [INFO] Initializing distributed training across 4 GPUs...
[01:14:31] Epoch 38/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:33] Epoch 20/50 - Loss: 1.1780 - Accuracy: 0.95
[01:14:33] Epoch 39/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:34] Epoch 40/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:35] Epoch 21/50 - Loss: 1.1414 - Accuracy: 0.99
[01:14:36] Epoch 41/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:37] Epoch 22/50 - Loss: 1.1357 - Accuracy: 0.99
[01:14:37] Epoch 42/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:39] Epoch 23/50 - Loss: 1.0920 - Accuracy: 0.99
[01:14:40] Epoch 43/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:40] Epoch 24/50 - Loss: 1.0587 - Accuracy: 0.99
[01:14:41] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:42] Epoch 25/50 - Loss: 0.9270 - Accuracy: 0.99
[01:14:43] [INFO] Loading dataset 'banking-conversations-v2'...
[01:14:43] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:45] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:14:45] [INFO] Initializing distributed training across 4 GPUs...
[01:14:46] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:47] [INFO] Loading dataset 'banking-conversations-v2'...
[01:14:48] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:48] Epoch 26/50 - Loss: 0.8185 - Accuracy: 0.99
[01:14:50] Epoch 27/50 - Loss: 0.7544 - Accuracy: 0.99
[01:14:50] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:51] Epoch 28/50 - Loss: 0.6948 - Accuracy: 0.99
[01:14:52] [INFO] Initializing distributed training across 4 GPUs...
[01:14:53] Epoch 29/50 - Loss: 0.5533 - Accuracy: 0.99
[01:14:55] [INFO] Loading dataset 'banking-conversations-v2'...
[01:14:55] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:56] Epoch 30/50 - Loss: 0.4751 - Accuracy: 0.99
[01:14:57] Epoch 31/50 - Loss: 0.3742 - Accuracy: 0.99
[01:14:57] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:14:59] Epoch 32/50 - Loss: 0.3626 - Accuracy: 0.99
[01:14:59] [INFO] Garbage collection freed 1.2GB of memory.
[01:15:00] Epoch 33/50 - Loss: 0.2440 - Accuracy: 0.99
[01:15:00] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:15:01] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:15:03] Epoch 2/50 - Loss: 2.4960 - Accuracy: 0.43
[01:15:04] Epoch 34/50 - Loss: 0.1288 - Accuracy: 0.99
[01:15:05] [INFO] Garbage collection freed 1.2GB of memory.
[01:15:06] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:15:07] Epoch 3/50 - Loss: 2.4789 - Accuracy: 0.45
[01:15:07] Epoch 35/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:08] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:15:09] Epoch 4/50 - Loss: 2.3985 - Accuracy: 0.46
[01:15:10] Epoch 5/50 - Loss: 2.2811 - Accuracy: 0.49
[01:15:10] Epoch 36/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:12] Epoch 37/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:13] [INFO] Running validation step...
[01:15:14] Epoch 6/50 - Loss: 2.2048 - Accuracy: 0.51
[01:15:14] [INFO] Initializing distributed training across 4 GPUs...
[01:15:16] Epoch 38/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:16] [INFO] Initializing distributed training across 4 GPUs...
[01:15:18] [INFO] Initializing distributed training across 4 GPUs...
[01:15:18] Epoch 39/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:20] Epoch 7/50 - Loss: 2.1650 - Accuracy: 0.52
[01:15:20] Epoch 40/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:22] [INFO] Loading dataset 'banking-conversations-v2'...
[01:15:23] Epoch 41/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:24] Epoch 8/50 - Loss: 2.1168 - Accuracy: 0.56
[01:15:25] Epoch 42/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:26] Epoch 9/50 - Loss: 2.1126 - Accuracy: 0.59
[01:15:28] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:15:28] Epoch 10/50 - Loss: 2.0216 - Accuracy: 0.61
[01:15:30] [INFO] Garbage collection freed 1.2GB of memory.
[01:15:30] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:15:32] [INFO] Running validation step...
[01:15:33] Epoch 43/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:35] Epoch 11/50 - Loss: 1.8727 - Accuracy: 0.63
[01:15:35] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:15:36] Epoch 12/50 - Loss: 1.8190 - Accuracy: 0.64
[01:15:38] [INFO] Initializing distributed training across 4 GPUs...
[01:15:39] Epoch 13/50 - Loss: 1.6742 - Accuracy: 0.69
[01:15:40] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:15:41] Epoch 14/50 - Loss: 1.6618 - Accuracy: 0.74
[01:15:42] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:43] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:44] [INFO] Garbage collection freed 1.2GB of memory.
[01:15:45] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:15:46] Epoch 15/50 - Loss: 1.6549 - Accuracy: 0.76
[01:15:47] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:48] Epoch 16/50 - Loss: 1.6378 - Accuracy: 0.78
[01:15:49] [INFO] Initializing distributed training across 4 GPUs...
[01:15:50] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:52] Epoch 17/50 - Loss: 1.5175 - Accuracy: 0.83
[01:15:52] [INFO] Loading dataset 'banking-conversations-v2'...
[01:15:54] Epoch 18/50 - Loss: 1.5150 - Accuracy: 0.86
[01:15:54] [INFO] Garbage collection freed 1.2GB of memory.
[01:15:56] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:15:57] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:15:58] Epoch 19/50 - Loss: 1.4085 - Accuracy: 0.91
[01:16:00] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:16:01] Epoch 20/50 - Loss: 1.2873 - Accuracy: 0.93
[01:16:48] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:16:48] Epoch 21/50 - Loss: 1.2576 - Accuracy: 0.96
[01:17:30] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:17:30] Epoch 22/50 - Loss: 1.2377 - Accuracy: 0.99
[01:17:31] Epoch 23/50 - Loss: 1.2265 - Accuracy: 0.99
[01:17:32] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:17:32] Epoch 24/50 - Loss: 1.2252 - Accuracy: 0.99
[01:17:32] Epoch 2/50 - Loss: 2.4247 - Accuracy: 0.43
[01:17:34] [INFO] Garbage collection freed 1.2GB of memory.
[01:17:35] Epoch 25/50 - Loss: 1.1018 - Accuracy: 0.99
[01:17:36] Epoch 26/50 - Loss: 1.0870 - Accuracy: 0.99
[01:17:36] Epoch 3/50 - Loss: 2.3392 - Accuracy: 0.47
[01:17:38] Epoch 27/50 - Loss: 0.9640 - Accuracy: 0.99
[01:17:38] Epoch 4/50 - Loss: 2.2999 - Accuracy: 0.50
[01:17:40] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:17:40] Epoch 28/50 - Loss: 0.8161 - Accuracy: 0.99
[01:17:41] Epoch 5/50 - Loss: 2.2592 - Accuracy: 0.54
[01:17:41] Epoch 29/50 - Loss: 0.7535 - Accuracy: 0.99
[01:17:42] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:17:43] Epoch 30/50 - Loss: 0.6076 - Accuracy: 0.99
[01:17:43] Epoch 6/50 - Loss: 2.1556 - Accuracy: 0.55
[01:17:44] Epoch 31/50 - Loss: 0.4936 - Accuracy: 0.99
[01:17:45] Epoch 7/50 - Loss: 2.0174 - Accuracy: 0.59
[01:17:45] [INFO] Running validation step...
[01:17:47] Epoch 32/50 - Loss: 0.4620 - Accuracy: 0.99
[01:17:47] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:17:48] Epoch 33/50 - Loss: 0.3764 - Accuracy: 0.99
[01:17:48] [INFO] Running validation step...
[01:17:49] Epoch 8/50 - Loss: 1.9349 - Accuracy: 0.64
[01:17:50] [INFO] Garbage collection freed 1.2GB of memory.
[01:17:50] [INFO] Garbage collection freed 1.2GB of memory.
[01:17:51] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:17:52] Epoch 9/50 - Loss: 1.8529 - Accuracy: 0.66
[01:17:53] Epoch 10/50 - Loss: 1.8272 - Accuracy: 0.67
[01:17:53] Epoch 34/50 - Loss: 0.3386 - Accuracy: 0.99
[01:17:54] Epoch 11/50 - Loss: 1.8195 - Accuracy: 0.71
[01:17:55] [INFO] Initializing distributed training across 4 GPUs...
[01:17:56] [INFO] Garbage collection freed 1.2GB of memory.
[01:17:57] Epoch 35/50 - Loss: 0.3128 - Accuracy: 0.99
[01:17:58] Epoch 12/50 - Loss: 1.7153 - Accuracy: 0.73
[01:18:00] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:18:00] Epoch 36/50 - Loss: 0.2308 - Accuracy: 0.99
[01:18:02] Epoch 13/50 - Loss: 1.6001 - Accuracy: 0.75
[01:18:02] Epoch 37/50 - Loss: 0.1035 - Accuracy: 0.99
[01:18:04] [INFO] Initializing distributed training across 4 GPUs...
[01:18:05] Epoch 14/50 - Loss: 1.5801 - Accuracy: 0.79
[01:18:06] Epoch 38/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:08] Epoch 15/50 - Loss: 1.4498 - Accuracy: 0.81
[01:18:09] Epoch 39/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:10] Epoch 16/50 - Loss: 1.4326 - Accuracy: 0.85
[01:18:12] [INFO] Running validation step...
[01:18:13] Epoch 17/50 - Loss: 1.3286 - Accuracy: 0.86
[01:18:14] [INFO] Initializing distributed training across 4 GPUs...
[01:18:16] Epoch 18/50 - Loss: 1.2370 - Accuracy: 0.91
[01:18:16] Epoch 40/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:17] Epoch 19/50 - Loss: 1.1719 - Accuracy: 0.96
[01:18:17] Epoch 20/50 - Loss: 1.1172 - Accuracy: 0.98
[01:18:18] Epoch 41/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:19] Epoch 21/50 - Loss: 1.1008 - Accuracy: 0.99
[01:18:19] Epoch 42/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:21] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:18:21] Epoch 22/50 - Loss: 1.0265 - Accuracy: 0.99
[01:18:22] Epoch 43/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:22] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:18:23] Epoch 23/50 - Loss: 0.9325 - Accuracy: 0.99
[01:18:24] Epoch 24/50 - Loss: 0.8716 - Accuracy: 0.99
[01:18:24] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:26] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:26] Epoch 25/50 - Loss: 0.8072 - Accuracy: 0.99
[01:18:28] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:18:28] Epoch 26/50 - Loss: 0.6592 - Accuracy: 0.99
[01:18:29] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:29] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:18:30] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:32] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:18:33] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:33] Epoch 27/50 - Loss: 0.5099 - Accuracy: 0.99
[01:18:35] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:35] [INFO] Loading dataset 'banking-conversations-v2'...
[01:18:37] [INFO] Garbage collection freed 1.2GB of memory.
[01:18:38] [INFO] Loading dataset 'banking-conversations-v2'...
[01:18:39] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:40] Epoch 28/50 - Loss: 0.3615 - Accuracy: 0.99
[01:18:40] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:18:41] Epoch 29/50 - Loss: 0.2199 - Accuracy: 0.99
[01:18:42] [INFO] Running validation step...
[01:18:43] [INFO] Garbage collection freed 1.2GB of memory.
[01:18:43] Epoch 2/50 - Loss: 2.4293 - Accuracy: 0.42
[01:18:44] [INFO] Adjusting learning rate to 2e-5 with Cosine Annealing...
[01:18:45] Epoch 3/50 - Loss: 2.3829 - Accuracy: 0.43
[01:18:46] Epoch 30/50 - Loss: 0.1146 - Accuracy: 0.99
[01:18:47] Epoch 4/50 - Loss: 2.2965 - Accuracy: 0.47
[01:18:48] [INFO] Initializing distributed training across 4 GPUs...
[01:18:49] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:18:50] Epoch 31/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:51] Epoch 5/50 - Loss: 2.2940 - Accuracy: 0.49
[01:18:52] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:18:52] Epoch 6/50 - Loss: 2.2835 - Accuracy: 0.53
[01:18:53] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:18:54] Epoch 32/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:55] Epoch 7/50 - Loss: 2.2141 - Accuracy: 0.56
[01:18:55] Epoch 33/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:57] Epoch 8/50 - Loss: 2.1187 - Accuracy: 0.59
[01:18:57] Epoch 34/50 - Loss: 0.1000 - Accuracy: 0.99
[01:18:59] Epoch 35/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:00] Epoch 9/50 - Loss: 1.9911 - Accuracy: 0.61
[01:19:00] Epoch 36/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:02] [INFO] Running validation step...
[01:19:02] [INFO] Running validation step...
[01:19:02] Epoch 10/50 - Loss: 1.8811 - Accuracy: 0.64
[01:19:04] Epoch 37/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:04] Epoch 11/50 - Loss: 1.7736 - Accuracy: 0.67
[01:19:05] Epoch 38/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:06] Epoch 39/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:06] Epoch 12/50 - Loss: 1.6465 - Accuracy: 0.69
[01:19:08] Epoch 40/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:08] Epoch 13/50 - Loss: 1.5402 - Accuracy: 0.74
[01:19:09] Epoch 14/50 - Loss: 1.5380 - Accuracy: 0.78
[01:19:10] Epoch 41/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:10] [INFO] Running validation step...
[01:19:12] Epoch 42/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:13] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:19:14] Epoch 15/50 - Loss: 1.4367 - Accuracy: 0.82
[01:19:15] Epoch 43/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:15] Epoch 44/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:16] Epoch 16/50 - Loss: 1.3909 - Accuracy: 0.86
[01:19:17] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:19:18] Epoch 17/50 - Loss: 1.2440 - Accuracy: 0.86
[01:19:19] Epoch 45/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:19] Epoch 18/50 - Loss: 1.1680 - Accuracy: 0.90
[01:19:20] Epoch 19/50 - Loss: 1.1231 - Accuracy: 0.91
[01:19:21] Epoch 46/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:22] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:19:23] Epoch 47/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:24] Epoch 48/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:24] Epoch 20/50 - Loss: 1.0293 - Accuracy: 0.92
[01:19:26] Epoch 49/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:26] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:19:28] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:19:28] Epoch 21/50 - Loss: 0.9421 - Accuracy: 0.96
[01:19:29] Epoch 50/50 - Loss: 0.1000 - Accuracy: 0.99
[01:19:30] Epoch 22/50 - Loss: 0.9412 - Accuracy: 0.98
[01:19:31] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:19:32] [INFO] Initializing distributed training across 4 GPUs...
[01:19:33] Epoch 1/50 - Loss: 2.5000 - Accuracy: 0.40
[01:19:34] Epoch 2/50 - Loss: 2.4472 - Accuracy: 0.41
[01:19:34] [INFO] Garbage collection freed 1.2GB of memory.
[01:19:35] Epoch 3/50 - Loss: 2.3323 - Accuracy: 0.44
[01:19:37] Epoch 23/50 - Loss: 0.9130 - Accuracy: 0.99
[01:19:37] Epoch 4/50 - Loss: 2.2629 - Accuracy: 0.46
[01:19:38] Epoch 5/50 - Loss: 2.1945 - Accuracy: 0.50
[01:19:38] [INFO] Running validation step...
[01:19:39] [WARN] VRAM usage at 88%. Enabling gradient checkpointing.
[01:19:40] Epoch 6/50 - Loss: 2.1867 - Accuracy: 0.55
[01:19:42] Epoch 24/50 - Loss: 0.8935 - Accuracy: 0.99
[01:19:42] [SUCCESS] Checkpoint saved to ./weights/llama3-finetuned-step-800.pt
[01:19:43] Epoch 7/50 - Loss: 2.1438 - Accuracy: 0.59
[01:19:44] Epoch 25/50 - Loss: 0.8246 - Accuracy: 0.99
[01:19:45] Epoch 8/50 - Loss: 2.0210 - Accuracy: 0.63
[01:19:47] Epoch 26/50 - Loss: 0.6815 - Accuracy: 0.99
[01:19:47] [INFO] Loading dataset 'banking-conversations-v2'...
[01:19:47] Epoch 27/50 - Loss: 0.6091 - Accuracy: 0.99
[01:19:48] Epoch 28/50 - Loss: 0.5563 - Accuracy: 0.99
[01:19:50] [INFO] Loading dataset 'banking-conversations-v2'...
[01:19:50] Epoch 29/50 - Loss: 0.4430 - Accuracy: 0.99