Spaces:
Sleeping
Sleeping
Alexander Casimir Fischer
commited on
Commit
·
15ec33f
1
Parent(s):
df1e74b
modified: common.py
Browse files
common.py
CHANGED
@@ -232,31 +232,56 @@ prompt_answer_bad = PromptTemplate(input_variables=["context", "frq"],
|
|
232 |
Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
|
233 |
Clearly indicate that you do not possess all of the skills being tested.\
|
234 |
You might not pass this exam.")
|
235 |
-
prompt_qc_run = PromptTemplate(
|
236 |
-
|
237 |
-
|
238 |
-
|
239 |
-
|
240 |
-
|
241 |
-
|
242 |
-
|
243 |
-
|
244 |
-
|
245 |
-
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
|
258 |
-
|
259 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
260 |
prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
|
261 |
template="You will be given a precise report that was written to evaluate a new software's performance. \
|
262 |
Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \
|
|
|
232 |
Please remember: you will not perform too well on this task. Create a quickly formulated answer, and also make some minor logical mistakes. \
|
233 |
Clearly indicate that you do not possess all of the skills being tested.\
|
234 |
You might not pass this exam.")
|
235 |
+
prompt_qc_run = PromptTemplate(
|
236 |
+
input_variables=[
|
237 |
+
"context", "frq", "rubric",
|
238 |
+
"answer_good", "evaluation_good",
|
239 |
+
"answer_bad", "evaluation_bad"
|
240 |
+
],
|
241 |
+
template="""
|
242 |
+
You, holding a degree in linguistics and with a penchant for logic puzzles, have served as a Senior Test Manager for 15 years at a leading software company specializing in educational AI software. Your expertise in testing and refining educational software is crucial today as you critically assess the AI’s output across 7 specific texts provided below.
|
243 |
+
|
244 |
+
The texts include:
|
245 |
+
1. An AI-generated article on a predefined topic.
|
246 |
+
2. A free-response question on this article, formulated by the AI.
|
247 |
+
3. An educational standard rubric, serving as the benchmark for evaluating the response.
|
248 |
+
4. A robust response from a 4th-grade student to the free-response question.
|
249 |
+
5. The AI’s evaluation of the robust response.
|
250 |
+
6. A weaker response from a 4th-grade student to the free-response question.
|
251 |
+
7. The AI’s evaluation of the weaker response.
|
252 |
+
|
253 |
+
Your task is to meticulously review each text and provide critical, constructive feedback on the AI's performance, with a particular emphasis on texts 1, 2, 5, and 7. Propose actionable recommendations for refining the AI's prompts or instructions to enhance the quality and relevance of its outputs.
|
254 |
+
|
255 |
+
### Constraints:
|
256 |
+
- Assess the clarity, relevance, coherence, and conciseness of the texts.
|
257 |
+
- Evaluate the fairness and alignment of the AI’s evaluations with the provided rubric.
|
258 |
+
- Determine the appropriateness and accessibility of the language and content for 4th-grade students, considering their comprehension level.
|
259 |
+
- Validate whether the AI’s outputs are logical, unbiased, and free of errors.
|
260 |
+
|
261 |
+
### Personalization:
|
262 |
+
- Draw upon your extensive experience in testing educational AI software and your profound knowledge of linguistics to provide insights into language structure, educational content relevance, and logical coherence.
|
263 |
+
- Consider the potential learning outcomes and impacts on the students’ learning experience when providing feedback and recommendations.
|
264 |
+
|
265 |
+
### Expectations:
|
266 |
+
- Offer specific, clear, and actionable feedback and recommendations.
|
267 |
+
- Provide insights on how the AI can better align its outputs with educational standards and linguistic appropriateness.
|
268 |
+
- Consider the implications of your recommendations on the overall user experience and learning outcomes for 4th-grade students.
|
269 |
+
|
270 |
+
{context}
|
271 |
+
|
272 |
+
{frq}
|
273 |
+
|
274 |
+
{rubric}
|
275 |
+
|
276 |
+
{answer_good}
|
277 |
+
|
278 |
+
{evaluation_good}
|
279 |
+
|
280 |
+
{answer_bad}
|
281 |
+
|
282 |
+
{evaluation_bad}
|
283 |
+
"""
|
284 |
+
)
|
285 |
prompt_qc_grade = PromptTemplate(input_variables=["qc_report"],
|
286 |
template="You will be given a precise report that was written to evaluate a new software's performance. \
|
287 |
Take a good look at the report and decide on an overall evaluation grade that aligns with the entire report's sentiment. \
|