Spaces:

mwalker22
/

TMD-SDG-via-LangGraph

Sleeping

App Files Files Community

mwalker22 commited on Apr 28

Commit

5b1bd96

1 Parent(s): 1371d47

Implemented processing of a dataset through the LangGraph, along with evaluation rules. This will allow comparing the processing of this LangGraph against RAGAS runs on the same dataset.

Browse files

Files changed (3) hide show

experiments/README.md +45 -0
experiments/evaluate_on_dataset.py +78 -0
experiments/evaluate_predictions.py +67 -0

experiments/README.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# Experiments: Synthetic Data Generation & Evaluation
+This folder contains scripts for running batch experiments and evaluations on your RAG pipeline using LangSmith.
+## Contents
+- `evaluate_on_dataset.py`: Runs your RAG pipeline on all questions in the LangSmith dataset and logs predictions.
+- `evaluate_predictions.py`: Runs automated evaluation (Correctness, Helpfulness, Dopeness) on predictions using LangSmith evaluators.
+## Prerequisites
+- Python 3.10+
+- All project dependencies installed (see project root requirements)
+- API keys set as environment variables:
+  - `OPENAI_API_KEY`
+  - `LANGCHAIN_API_KEY`
+- (Optional) **Vectorstore location:**
+  - `VECTORSTORE_PATH` (default: `/tmp/vectorstore`)
+- **LangSmith Tracing:**
+  - `LANGCHAIN_TRACING_V2` (must be set to `true` to enable tracing in LangSmith)
+## Usage
+1. **Run the RAG pipeline and log predictions:**
+   ```sh
+   export OPENAI_API_KEY=sk-...
+   export LANGCHAIN_API_KEY=ls-...
+   export LANGCHAIN_TRACING_V2=true
+   export VECTORSTORE_PATH=/tmp/vectorstore  # or your preferred path
+   python evaluate_on_dataset.py
+   ```
+   This will process all questions in the LangSmith dataset and log your app's predictions.
+2. **Run evaluation on predictions:**
+   ```sh
+   python evaluate_predictions.py
+   ```
+   This will score your predictions for correctness, helpfulness, and dopeness, and log results to LangSmith.
+3. **View Results:**
+   - Go to your [LangSmith dashboard](https://smith.langchain.com/) and open the relevant project/dataset to see experiment results and metrics.
+## Notes
+- Make sure your dataset name matches between scripts and LangSmith.
+- You can rerun these scripts as you update your pipeline or data.
+- The vectorstore will be stored in `/tmp/vectorstore` by default, which is suitable for cloud environments like Hugging Face Spaces. Set `VECTORSTORE_PATH` if you want to use a different location.
+- **Tracing:** Setting `LANGCHAIN_TRACING_V2=true` is required for detailed trace logging in LangSmith. Without this, traces will not appear in your LangSmith dashboard.

experiments/evaluate_on_dataset.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import os
+from dotenv import load_dotenv
+from langsmith import Client
+from graph.types import SDGState
+from graph.build_graph import build_sdg_graph
+from preprocess.embed_documents import create_or_load_vectorstore
+from preprocess.html_to_documents import extract_documents_from_html
+from langchain_openai import ChatOpenAI
+from pathlib import Path
+import pickle
+load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env'))
+# --- CONFIG ---
+DATASET_NAME = "State of AI Across the Years!"
+PROJECT_NAME = "State of AI Across the Years!"
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+LANGCHAIN_API_KEY = os.environ.get("LANGCHAIN_API_KEY")
+# --- SETUP ENV ---
+os.environ["LANGCHAIN_PROJECT"] = PROJECT_NAME
+if LANGCHAIN_API_KEY:
+    os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY
+if OPENAI_API_KEY:
+    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
+# --- LOAD DOCUMENTS & VECTORSTORE ---
+def load_docs():
+    output_file = Path("generated/documents.pkl")
+    if output_file.exists():
+        with open(output_file, "rb") as f:
+            return pickle.load(f)
+    # Fallback: extract from HTML
+    docs = []
+    data_dir = Path("data")
+    for html_file in data_dir.glob("*.html"):
+        docs.extend(extract_documents_from_html(str(html_file), label=html_file.stem))
+    output_file.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_file, "wb") as f:
+        pickle.dump(docs, f)
+    return docs
+def main():
+    # Load dataset from LangSmith
+    client = Client()
+    dataset = client.read_dataset(dataset_name=DATASET_NAME)
+    examples = client.list_examples(dataset_id=dataset["id"])
+    # Load docs/vectorstore
+    docs = load_docs()
+    vectorstore_path = os.environ.get("VECTORSTORE_PATH", "/tmp/vectorstore")
+    vectorstore = create_or_load_vectorstore(docs, path=vectorstore_path)
+    llm = ChatOpenAI()
+    graph = build_sdg_graph(docs, vectorstore, llm)
+    # For each example, run the graph and log prediction
+    for example in examples:
+        question = example.inputs["question"]
+        reference = example.outputs["answer"]
+        # Prepare initial state
+        state = SDGState(input=question)
+        result = graph.invoke(state)
+        if not isinstance(result, SDGState):
+            result = SDGState(**dict(result))
+        # Log prediction to LangSmith
+        client.create_run(
+            name="SDG App Run",
+            inputs={"question": question},
+            outputs={"output": result.answer},
+            reference_outputs={"answer": reference},
+            example_id=example.id,
+            project_name=PROJECT_NAME,
+        )
+        print(f"Processed: {question}\n  → {result.answer}\n")
+if __name__ == "__main__":
+    main()

experiments/evaluate_predictions.py ADDED Viewed

	@@ -0,0 +1,67 @@

+import os
+from dotenv import load_dotenv
+from langsmith.evaluation import LangChainStringEvaluator, evaluate
+from langchain_openai import ChatOpenAI
+load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env'))
+# --- CONFIG ---
+DATASET_NAME = "State of AI Across the Years!"
+PROJECT_NAME = "State of AI Across the Years!"
+EVAL_LLM_MODEL = "gpt-4.1"  # Match the notebook's model if possible
+# --- SETUP ENV ---
+if "LANGCHAIN_API_KEY" in os.environ:
+    os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGCHAIN_API_KEY"]
+if "OPENAI_API_KEY" in os.environ:
+    os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]
+# --- EVALUATORS ---
+eval_llm = ChatOpenAI(model=EVAL_LLM_MODEL)
+qa_evaluator = LangChainStringEvaluator("qa", config={"llm": eval_llm})
+labeled_helpfulness_evaluator = LangChainStringEvaluator(
+    "labeled_criteria",
+    config={
+        "criteria": {
+            "helpfulness": (
+                "Is this submission helpful to the user,"
+                " taking into account the correct reference answer?"
+            )
+        },
+        "llm": eval_llm
+    },
+    prepare_data=lambda run, example: {
+        "prediction": run.outputs["output"],
+        "reference": example.outputs["answer"],
+        "input": example.inputs["question"],
+    }
+)
+dope_or_nope_evaluator = LangChainStringEvaluator(
+    "criteria",
+    config={
+        "criteria": {
+            "dopeness": "Is this submission dope, lit, or cool?",
+        },
+        "llm": eval_llm
+    }
+)
+# --- RUN EVALUATION ---
+if __name__ == "__main__":
+    print("Running evaluation on predictions in LangSmith...")
+    results = evaluate(
+        None,  # No need to pass a chain, just evaluate existing runs
+        data=DATASET_NAME,
+        evaluators=[
+            qa_evaluator,
+            labeled_helpfulness_evaluator,
+            dope_or_nope_evaluator
+        ],
+        project_name=PROJECT_NAME,
+        metadata={"source": "app_evaluation"},
+    )
+    print("Evaluation complete! View results in your LangSmith dashboard.")