Spaces:

Tarun-1999M
/

Semantic_Search_in_ArXiv_ML_Papers

Sleeping

Tarun-1999M commited on Aug 20, 2024

Commit

c2dc399

verified ·

1 Parent(s): b747958

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -57,13 +57,24 @@ def search_arxiv(query):
     return "\n\n".join(results)
 # Create the Gradio interface
 iface = gr.Interface(
     fn=search_arxiv,
     inputs=gr.components.Textbox(lines=1, placeholder="Enter your query..."),
     outputs="markdown",
     title="Semantic Search in ArXiv ML Papers",
-    description="Enter a query to find relevant ML papers from the ArXiv dataset."
 )
 # Launch the interface

     return "\n\n".join(results)
+# Dataset information
+dataset_info = """
+### About the Dataset
+This dataset contains a subset of ArXiv papers with the "cs.LG" tag, indicating that the paper is about Machine Learning. The core dataset is filtered from the full ArXiv dataset hosted on Kaggle: [ArXiv Dataset on Kaggle](https://www.kaggle.com/datasets/Cornell-University/arxiv). The original dataset contains roughly 2 million papers, and this dataset contains approximately 100,000 papers after category filtering.
+The dataset is maintained by making requests to the ArXiv API. The current iteration only includes the title and abstract of each paper.
+"""
 # Create the Gradio interface
 iface = gr.Interface(
     fn=search_arxiv,
     inputs=gr.components.Textbox(lines=1, placeholder="Enter your query..."),
     outputs="markdown",
     title="Semantic Search in ArXiv ML Papers",
+    description="Enter a query to find relevant ML papers from the ArXiv dataset.",
+    article=dataset_info
 )
 # Launch the interface