Spaces:

fastrtc
/

talk-to-sambanova-gradio

Running

App Files Files Community

Wauplin HF staff commited on 1 day ago

Commit

f6c0ef9

verified ·

1 Parent(s): a874d43

Use `huggingface_hub.InferenceClient` instead of `openai` to call Sambanova

Browse files

This is a suggestion to use the `huggingface_hub` client instead of openai's one to call Sambanova API. No need to provide the sambanova API endpoint anymore. Also, one can use the HF model id and easily switch between providers (currently 7 providers on https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, user can select based on speed/cost trade-off).

More advanced suggestion would be to set a `HF_TOKEN` Space secret instead and instantiate the client like this:

```py
client = huggingface_hub.InferenceClient(
provider="sambanova",
)
```

This will provide HF routing => easier to switch between providers while keeping billing in a single place.

Files changed (1) hide show

app.py +4 -4

app.py CHANGED Viewed

@@ -5,7 +5,7 @@ from pathlib import Path
 import gradio as gr
 import numpy as np
-import openai
 from dotenv import load_dotenv
 from fastapi import FastAPI
 from fastapi.responses import HTMLResponse, StreamingResponse
@@ -25,9 +25,9 @@ load_dotenv()
 curr_dir = Path(__file__).parent
-client = openai.OpenAI(
     api_key=os.environ.get("SAMBANOVA_API_KEY"),
-    base_url="https://api.sambanova.ai/v1",
 )
 stt_model = get_stt_model()
@@ -52,7 +52,7 @@ def response(
     raise WebRTCError("test")
     request = client.chat.completions.create(
-        model="Meta-Llama-3.2-3B-Instruct",
         messages=conversation_state,  # type: ignore
         temperature=0.1,
         top_p=0.1,

 import gradio as gr
 import numpy as np
+import huggingface_hub
 from dotenv import load_dotenv
 from fastapi import FastAPI
 from fastapi.responses import HTMLResponse, StreamingResponse
 curr_dir = Path(__file__).parent
+client = huggingface_hub.InferenceClient(
     api_key=os.environ.get("SAMBANOVA_API_KEY"),
+    provider="sambanova",
 )
 stt_model = get_stt_model()
     raise WebRTCError("test")
     request = client.chat.completions.create(
+        model="meta-llama/Llama-3.2-3B-Instruct",
         messages=conversation_state,  # type: ignore
         temperature=0.1,
         top_p=0.1,