Spaces:

AlexCasF
/

ForRealQuiz

Sleeping

App Files Files Community

Alexander Casimir Fischer commited on Sep 28, 2023

Commit

15ec33f

1 Parent(s): df1e74b

modified: common.py

Browse files

Files changed (1) hide show

common.py +50 -25

common.py CHANGED Viewed

@@ -232,31 +232,56 @@ prompt_answer_bad = PromptTemplate(input_variables=["context", "frq"],
         Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
         Clearly indicate that you do not possess all of the skills being tested.\
         You might not pass this exam.")
-prompt_qc_run = PromptTemplate(input_variables=["context", "frq", "rubric", \
-    "answer_good", "evaluation_good", "answer_bad", "evaluation_bad"],
-    template="You are a Senior Test Manager with 15 years of experience at a successful software company. \
-        Your daily business is to test educational KI software. You also have a degree in linguistics and love logic puzzles. \
-        Please have a look at 7 pieces of text, which will be given to you at the end of this prompt. \
-        Here are the 7 descriptions: \
-        1. an article on a certain topic, given by the software \n\
-        2. a free-response question on this article, given by the software \n\
-        3. a certain educational standard rubric, that is used to evaluate the answer on this free-response question \n\
-        4. the answer to the free-response question, given by a strong 4th grade student \n\
-        5. the evaluation of the strong answer, given by the software \n\
-        6. the answer to the free-response question, given by a weak 4th grade student \n\
-        7. the evaluation of the weak answer, given by the software \n\
-        Your task today is the following: please have a critical look at the output of the software. \
-        Take your time on each of the 7 texts, then give critical feedback on any shortcomings of the software's KI. \
-        Give recommendations on how to further improve the quality of texts number 1., 2., 5. and 7., \
-        by fine-tuning the KI instructions or prompts. \
-        Please be rather critical.\n\n\
-        {context}\n\n\
-        {frq}\n\n\
-        {rubric}\n\n\
-        {answer_good}\n\n\
-        {evaluation_good}\n\n\
-        {answer_bad}\n\n\
-        {evaluation_bad}")
 prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
     template="You will be given a precise report that was written to evaluate a new software's performance. \
         Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \

         Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
         Clearly indicate that you do not possess all of the skills being tested.\
         You might not pass this exam.")
+prompt_qc_run = PromptTemplate(
+    input_variables=[
+        "context", "frq", "rubric",
+        "answer_good", "evaluation_good",
+        "answer_bad", "evaluation_bad"
+    ],
+    template="""
+        You, holding a degree in linguistics and with a penchant for logic puzzles, have served as a Senior Test Manager for 15 years at a leading software company specializing in educational AI software. Your expertise in testing and refining educational software is crucial today as you critically assess the AI’s output across 7 specific texts provided below.
+        The texts include:
+        1. An AI-generated article on a predefined topic.
+        2. A free-response question on this article, formulated by the AI.
+        3. An educational standard rubric, serving as the benchmark for evaluating the response.
+        4. A robust response from a 4th-grade student to the free-response question.
+        5. The AI’s evaluation of the robust response.
+        6. A weaker response from a 4th-grade student to the free-response question.
+        7. The AI’s evaluation of the weaker response.
+        Your task is to meticulously review each text and provide critical, constructive feedback on the AI's performance, with a particular emphasis on texts 1, 2, 5, and 7. Propose actionable recommendations for refining the AI's prompts or instructions to enhance the quality and relevance of its outputs.
+        ### Constraints:
+        - Assess the clarity, relevance, coherence, and conciseness of the texts.
+        - Evaluate the fairness and alignment of the AI’s evaluations with the provided rubric.
+        - Determine the appropriateness and accessibility of the language and content for 4th-grade students, considering their comprehension level.
+        - Validate whether the AI’s outputs are logical, unbiased, and free of errors.
+        ### Personalization:
+        - Draw upon your extensive experience in testing educational AI software and your profound knowledge of linguistics to provide insights into language structure, educational content relevance, and logical coherence.
+        - Consider the potential learning outcomes and impacts on the students’ learning experience when providing feedback and recommendations.
+        ### Expectations:
+        - Offer specific, clear, and actionable feedback and recommendations.
+        - Provide insights on how the AI can better align its outputs with educational standards and linguistic appropriateness.
+        - Consider the implications of your recommendations on the overall user experience and learning outcomes for 4th-grade students.
+        {context}
+        {frq}
+        {rubric}
+        {answer_good}
+        {evaluation_good}
+        {answer_bad}
+        {evaluation_bad}
+    """
+)
 prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
     template="You will be given a precise report that was written to evaluate a new software's performance. \
         Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \