Spaces:
Running
Running
Huzaifa Pardawala
commited on
Commit
·
bd767d8
1
Parent(s):
555b191
fix: editing icons and sections
Browse files- index.html +13 -13
index.html
CHANGED
@@ -3861,8 +3861,8 @@
|
|
3861 |
|
3862 |
<!-- </section>
|
3863 |
</div> -->
|
3864 |
-
|
3865 |
-
<div class="container">
|
3866 |
<!-- Model Performance Highlights -->
|
3867 |
<div class="card mb-5">
|
3868 |
<div class="card-header">
|
@@ -3916,12 +3916,12 @@
|
|
3916 |
<p class="has-text-weight-bold mb-3"><span class="icon has-text-primary"><i class="fa-solid fa-magnifying-glass"></i></span> Key Insights from Model Analysis</p>
|
3917 |
|
3918 |
<div class="notification is-info is-light py-3 px-4">
|
3919 |
-
<p><strong
|
3920 |
-
<p><strong
|
3921 |
-
<p><strong
|
3922 |
-
<p><strong
|
3923 |
-
<p><strong
|
3924 |
-
<p><strong
|
3925 |
</div>
|
3926 |
</div>
|
3927 |
</div>
|
@@ -4202,27 +4202,27 @@
|
|
4202 |
</div>
|
4203 |
|
4204 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4205 |
-
<p class="has-text-weight-bold mb-1"
|
4206 |
<p class="is-size-7 mb-0">Investigating in-context learning techniques such as few-shot, chain-of-thought, and retrieval-augmented generation (RAG).</p>
|
4207 |
</div>
|
4208 |
|
4209 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4210 |
-
<p class="has-text-weight-bold mb-1"
|
4211 |
<p class="is-size-7 mb-0">Evaluating fine-tuning strategies to enhance model understanding of financial-specific terminology and reasoning.</p>
|
4212 |
</div>
|
4213 |
|
4214 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4215 |
-
<p class="has-text-weight-bold mb-1"
|
4216 |
<p class="is-size-7 mb-0">Curating datasets from underrepresented financial sectors such as insurance, derivatives, and central banking.</p>
|
4217 |
</div>
|
4218 |
|
4219 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4220 |
-
<p class="has-text-weight-bold mb-1"
|
4221 |
<p class="is-size-7 mb-0">Developing detailed trade-off analyses between accuracy, latency, and cost to optimize real-world usability.</p>
|
4222 |
</div>
|
4223 |
|
4224 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4225 |
-
<p class="has-text-weight-bold mb-1"
|
4226 |
<p class="is-size-7 mb-0">Moving beyond traditional accuracy metrics by incorporating trustworthiness, robustness, and interpretability measures.</p>
|
4227 |
</div>
|
4228 |
|
|
|
3861 |
|
3862 |
<!-- </section>
|
3863 |
</div> -->
|
3864 |
+
<section class="section">
|
3865 |
+
<div class="container">
|
3866 |
<!-- Model Performance Highlights -->
|
3867 |
<div class="card mb-5">
|
3868 |
<div class="card-header">
|
|
|
3916 |
<p class="has-text-weight-bold mb-3"><span class="icon has-text-primary"><i class="fa-solid fa-magnifying-glass"></i></span> Key Insights from Model Analysis</p>
|
3917 |
|
3918 |
<div class="notification is-info is-light py-3 px-4">
|
3919 |
+
<p><strong>🏆 No single dominant model:</strong> DeepSeek R1 leads in complex multi-step QA, while Claude 3.5 excels in sentiment tasks. GPT-4o is strong in classification and summarization.</p>
|
3920 |
+
<p><strong>⚖️ Inconsistent scaling:</strong> Larger models don’t always outperform smaller ones—DeepSeek R1 trails in summarization despite excelling in QA.</p>
|
3921 |
+
<p><strong>🛠️ Open-weight models:</strong> Many open-weight models like DeepSeek-V3 and Llama 3.1 70B offer competitive performance while being cost-effective.</p>
|
3922 |
+
<p><strong>💰 Cost-performance disparities:</strong> Running DeepSeek R1 can cost up to <strong>$260</strong> per million tokens, while Claude 3.5 Sonnet and o1-mini cost around <strong>$105</strong>, and Meta’s Llama 3.1 8B only <strong>$4</strong>.</p>
|
3923 |
+
<p><strong>📈 Numeric reasoning challenges:</strong> Even the best models struggle with financial numeric reasoning tasks, achieving low F1 scores (<strong>≤ 0.06</strong>).</p>
|
3924 |
+
<p><strong>🔢 Step-by-step deductions:</strong> Multi-turn financial QA (e.g., ConvFinQA) significantly reduces model accuracy due to complex dependencies.</p>
|
3925 |
</div>
|
3926 |
</div>
|
3927 |
</div>
|
|
|
4202 |
</div>
|
4203 |
|
4204 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4205 |
+
<p class="has-text-weight-bold mb-1">🧠 Few-Shot & Chain-of-Thought</p>
|
4206 |
<p class="is-size-7 mb-0">Investigating in-context learning techniques such as few-shot, chain-of-thought, and retrieval-augmented generation (RAG).</p>
|
4207 |
</div>
|
4208 |
|
4209 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4210 |
+
<p class="has-text-weight-bold mb-1">⚙️ Domain-Adaptive Training</p>
|
4211 |
<p class="is-size-7 mb-0">Evaluating fine-tuning strategies to enhance model understanding of financial-specific terminology and reasoning.</p>
|
4212 |
</div>
|
4213 |
|
4214 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4215 |
+
<p class="has-text-weight-bold mb-1">📊 Expanded Dataset Coverage</p>
|
4216 |
<p class="is-size-7 mb-0">Curating datasets from underrepresented financial sectors such as insurance, derivatives, and central banking.</p>
|
4217 |
</div>
|
4218 |
|
4219 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4220 |
+
<p class="has-text-weight-bold mb-1">⚖️ Efficiency & Cost Benchmarking</p>
|
4221 |
<p class="is-size-7 mb-0">Developing detailed trade-off analyses between accuracy, latency, and cost to optimize real-world usability.</p>
|
4222 |
</div>
|
4223 |
|
4224 |
<div class="notification is-info is-light py-2 px-3 mb-3">
|
4225 |
+
<p class="has-text-weight-bold mb-1">📈 Advanced Evaluation Metrics</p>
|
4226 |
<p class="is-size-7 mb-0">Moving beyond traditional accuracy metrics by incorporating trustworthiness, robustness, and interpretability measures.</p>
|
4227 |
</div>
|
4228 |
|