Running 1 1 Crowdsourced Evaluation π Evaluate model responses for clinical accuracy and relevance