Alexander Casimir Fischer commited on
Commit
15ec33f
·
1 Parent(s): df1e74b

modified: common.py

Browse files
Files changed (1) hide show
  1. common.py +50 -25
common.py CHANGED
@@ -232,31 +232,56 @@ prompt_answer_bad = PromptTemplate(input_variables=["context", "frq"],
232
  Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
233
  Clearly indicate that you do not possess all of the skills being tested.\
234
  You might not pass this exam.")
235
- prompt_qc_run = PromptTemplate(input_variables=["context", "frq", "rubric", \
236
- "answer_good", "evaluation_good", "answer_bad", "evaluation_bad"],
237
- template="You are a Senior Test Manager with 15 years of experience at a successful software company. \
238
- Your daily business is to test educational KI software. You also have a degree in linguistics and love logic puzzles. \
239
- Please have a look at 7 pieces of text, which will be given to you at the end of this prompt. \
240
- Here are the 7 descriptions: \
241
- 1. an article on a certain topic, given by the software \n\
242
- 2. a free-response question on this article, given by the software \n\
243
- 3. a certain educational standard rubric, that is used to evaluate the answer on this free-response question \n\
244
- 4. the answer to the free-response question, given by a strong 4th grade student \n\
245
- 5. the evaluation of the strong answer, given by the software \n\
246
- 6. the answer to the free-response question, given by a weak 4th grade student \n\
247
- 7. the evaluation of the weak answer, given by the software \n\
248
- Your task today is the following: please have a critical look at the output of the software. \
249
- Take your time on each of the 7 texts, then give critical feedback on any shortcomings of the software's KI. \
250
- Give recommendations on how to further improve the quality of texts number 1., 2., 5. and 7., \
251
- by fine-tuning the KI instructions or prompts. \
252
- Please be rather critical.\n\n\
253
- {context}\n\n\
254
- {frq}\n\n\
255
- {rubric}\n\n\
256
- {answer_good}\n\n\
257
- {evaluation_good}\n\n\
258
- {answer_bad}\n\n\
259
- {evaluation_bad}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
261
  template="You will be given a precise report that was written to evaluate a new software's performance. \
262
  Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \
 
232
  Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
233
  Clearly indicate that you do not possess all of the skills being tested.\
234
  You might not pass this exam.")
235
+ prompt_qc_run = PromptTemplate(
236
+ input_variables=[
237
+ "context", "frq", "rubric",
238
+ "answer_good", "evaluation_good",
239
+ "answer_bad", "evaluation_bad"
240
+ ],
241
+ template="""
242
+ You, holding a degree in linguistics and with a penchant for logic puzzles, have served as a Senior Test Manager for 15 years at a leading software company specializing in educational AI software. Your expertise in testing and refining educational software is crucial today as you critically assess the AI’s output across 7 specific texts provided below.
243
+
244
+ The texts include:
245
+ 1. An AI-generated article on a predefined topic.
246
+ 2. A free-response question on this article, formulated by the AI.
247
+ 3. An educational standard rubric, serving as the benchmark for evaluating the response.
248
+ 4. A robust response from a 4th-grade student to the free-response question.
249
+ 5. The AI’s evaluation of the robust response.
250
+ 6. A weaker response from a 4th-grade student to the free-response question.
251
+ 7. The AI’s evaluation of the weaker response.
252
+
253
+ Your task is to meticulously review each text and provide critical, constructive feedback on the AI's performance, with a particular emphasis on texts 1, 2, 5, and 7. Propose actionable recommendations for refining the AI's prompts or instructions to enhance the quality and relevance of its outputs.
254
+
255
+ ### Constraints:
256
+ - Assess the clarity, relevance, coherence, and conciseness of the texts.
257
+ - Evaluate the fairness and alignment of the AI’s evaluations with the provided rubric.
258
+ - Determine the appropriateness and accessibility of the language and content for 4th-grade students, considering their comprehension level.
259
+ - Validate whether the AI’s outputs are logical, unbiased, and free of errors.
260
+
261
+ ### Personalization:
262
+ - Draw upon your extensive experience in testing educational AI software and your profound knowledge of linguistics to provide insights into language structure, educational content relevance, and logical coherence.
263
+ - Consider the potential learning outcomes and impacts on the students’ learning experience when providing feedback and recommendations.
264
+
265
+ ### Expectations:
266
+ - Offer specific, clear, and actionable feedback and recommendations.
267
+ - Provide insights on how the AI can better align its outputs with educational standards and linguistic appropriateness.
268
+ - Consider the implications of your recommendations on the overall user experience and learning outcomes for 4th-grade students.
269
+
270
+ {context}
271
+
272
+ {frq}
273
+
274
+ {rubric}
275
+
276
+ {answer_good}
277
+
278
+ {evaluation_good}
279
+
280
+ {answer_bad}
281
+
282
+ {evaluation_bad}
283
+ """
284
+ )
285
  prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
286
  template="You will be given a precise report that was written to evaluate a new software's performance. \
287
  Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \