xicocdi commited on
Commit
4acec17
·
1 Parent(s): d044125

save changes

Browse files
Chunking_Strat_Eval.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
Retrieval_Strat_Eval.ipynb ADDED
@@ -0,0 +1,393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "!pip install langchain langchain_community langchain_openai chromadb pypdf langsmith qdrant-client ragas pandas"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "execution_count": 2,
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "import os\n",
19
+ "import openai\n",
20
+ "from getpass import getpass\n",
21
+ "\n",
22
+ "openai.api_key = getpass(\"Please provide your OpenAI Key: \")\n",
23
+ "os.environ[\"OPENAI_API_KEY\"] = openai.api_key"
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "execution_count": 3,
29
+ "metadata": {},
30
+ "outputs": [],
31
+ "source": [
32
+ "import pandas as pd\n",
33
+ "\n",
34
+ "test_df = pd.read_csv(\"synthetic_midterm_question_dataset.csv\")"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "code",
39
+ "execution_count": 4,
40
+ "metadata": {},
41
+ "outputs": [],
42
+ "source": [
43
+ "test_questions = test_df[\"question\"].values.tolist()\n",
44
+ "test_groundtruths = test_df[\"ground_truth\"].values.tolist()"
45
+ ]
46
+ },
47
+ {
48
+ "cell_type": "code",
49
+ "execution_count": 24,
50
+ "metadata": {},
51
+ "outputs": [],
52
+ "source": [
53
+ "from langchain_community.document_loaders import PyPDFLoader\n",
54
+ "from langchain_community.document_loaders.sitemap import SitemapLoader\n",
55
+ "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
56
+ "from langchain_openai import OpenAIEmbeddings\n",
57
+ "from langchain_community.vectorstores.chroma import Chroma\n",
58
+ "from langchain_openai import ChatOpenAI\n",
59
+ "from langchain.prompts import PromptTemplate\n",
60
+ "from langchain.chains import ConversationalRetrievalChain\n",
61
+ "from langchain_community.vectorstores import Qdrant\n",
62
+ "from langchain.memory import ConversationBufferMemory"
63
+ ]
64
+ },
65
+ {
66
+ "cell_type": "code",
67
+ "execution_count": 7,
68
+ "metadata": {},
69
+ "outputs": [],
70
+ "source": [
71
+ "pdf_paths = [\"/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf\",\n",
72
+ "\"/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf\"]"
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": 8,
78
+ "metadata": {},
79
+ "outputs": [],
80
+ "source": [
81
+ "pdf_documents = []\n",
82
+ "for pdf_path in pdf_paths:\n",
83
+ " loader = PyPDFLoader(pdf_path)\n",
84
+ " pdf_documents.extend(loader.load())"
85
+ ]
86
+ },
87
+ {
88
+ "cell_type": "code",
89
+ "execution_count": 9,
90
+ "metadata": {},
91
+ "outputs": [],
92
+ "source": [
93
+ "text_splitter = RecursiveCharacterTextSplitter(\n",
94
+ " chunk_size=2000,\n",
95
+ " chunk_overlap=100,\n",
96
+ " )\n",
97
+ "pdf_docs = text_splitter.split_documents(pdf_documents)"
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "code",
102
+ "execution_count": 20,
103
+ "metadata": {},
104
+ "outputs": [],
105
+ "source": [
106
+ "embedding = OpenAIEmbeddings(model=\"text-embedding-3-small\")"
107
+ ]
108
+ },
109
+ {
110
+ "cell_type": "code",
111
+ "execution_count": 11,
112
+ "metadata": {},
113
+ "outputs": [],
114
+ "source": [
115
+ "baseline_metrics = pd.read_csv(\"medium_chunk_metrics.csv\")"
116
+ ]
117
+ },
118
+ {
119
+ "cell_type": "code",
120
+ "execution_count": 12,
121
+ "metadata": {},
122
+ "outputs": [
123
+ {
124
+ "data": {
125
+ "text/html": [
126
+ "<div>\n",
127
+ "<style scoped>\n",
128
+ " .dataframe tbody tr th:only-of-type {\n",
129
+ " vertical-align: middle;\n",
130
+ " }\n",
131
+ "\n",
132
+ " .dataframe tbody tr th {\n",
133
+ " vertical-align: top;\n",
134
+ " }\n",
135
+ "\n",
136
+ " .dataframe thead th {\n",
137
+ " text-align: right;\n",
138
+ " }\n",
139
+ "</style>\n",
140
+ "<table border=\"1\" class=\"dataframe\">\n",
141
+ " <thead>\n",
142
+ " <tr style=\"text-align: right;\">\n",
143
+ " <th></th>\n",
144
+ " <th>Metric</th>\n",
145
+ " <th>MediumChunk</th>\n",
146
+ " </tr>\n",
147
+ " </thead>\n",
148
+ " <tbody>\n",
149
+ " <tr>\n",
150
+ " <th>0</th>\n",
151
+ " <td>faithfulness</td>\n",
152
+ " <td>0.895359</td>\n",
153
+ " </tr>\n",
154
+ " <tr>\n",
155
+ " <th>1</th>\n",
156
+ " <td>answer_relevancy</td>\n",
157
+ " <td>0.955419</td>\n",
158
+ " </tr>\n",
159
+ " <tr>\n",
160
+ " <th>2</th>\n",
161
+ " <td>context_recall</td>\n",
162
+ " <td>0.934028</td>\n",
163
+ " </tr>\n",
164
+ " <tr>\n",
165
+ " <th>3</th>\n",
166
+ " <td>context_precision</td>\n",
167
+ " <td>0.937500</td>\n",
168
+ " </tr>\n",
169
+ " <tr>\n",
170
+ " <th>4</th>\n",
171
+ " <td>answer_correctness</td>\n",
172
+ " <td>0.629267</td>\n",
173
+ " </tr>\n",
174
+ " </tbody>\n",
175
+ "</table>\n",
176
+ "</div>"
177
+ ],
178
+ "text/plain": [
179
+ " Metric MediumChunk\n",
180
+ "0 faithfulness 0.895359\n",
181
+ "1 answer_relevancy 0.955419\n",
182
+ "2 context_recall 0.934028\n",
183
+ "3 context_precision 0.937500\n",
184
+ "4 answer_correctness 0.629267"
185
+ ]
186
+ },
187
+ "execution_count": 12,
188
+ "metadata": {},
189
+ "output_type": "execute_result"
190
+ }
191
+ ],
192
+ "source": [
193
+ "baseline_metrics"
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": 21,
199
+ "metadata": {},
200
+ "outputs": [
201
+ {
202
+ "ename": "NameError",
203
+ "evalue": "name 'qdrant_client' is not defined",
204
+ "output_type": "error",
205
+ "traceback": [
206
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
207
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
208
+ "Cell \u001b[0;32mIn[21], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m qdrant_client\u001b[38;5;241m.\u001b[39mcreate_collection(\n\u001b[1;32m 2\u001b[0m collection_name\u001b[38;5;241m=\u001b[39mCOLLECTION_NAME\u001b[38;5;241m+\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMultiQuery\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 3\u001b[0m vectors_config\u001b[38;5;241m=\u001b[39mVectorParams(size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m1536\u001b[39m, distance\u001b[38;5;241m=\u001b[39mDistance\u001b[38;5;241m.\u001b[39mCOSINE),\n\u001b[1;32m 4\u001b[0m )\n\u001b[1;32m 6\u001b[0m qdrant_vector_store \u001b[38;5;241m=\u001b[39m QdrantVectorStore(\n\u001b[1;32m 7\u001b[0m client\u001b[38;5;241m=\u001b[39mqdrant_client,\n\u001b[1;32m 8\u001b[0m collection_name\u001b[38;5;241m=\u001b[39mCOLLECTION_NAME\u001b[38;5;241m+\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMultiQuery\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 9\u001b[0m embedding\u001b[38;5;241m=\u001b[39membeddings,\n\u001b[1;32m 10\u001b[0m )\n\u001b[1;32m 12\u001b[0m qdrant_vector_store\u001b[38;5;241m.\u001b[39madd_documents(pdf_docs)\n",
209
+ "\u001b[0;31mNameError\u001b[0m: name 'qdrant_client' is not defined"
210
+ ]
211
+ }
212
+ ],
213
+ "source": [
214
+ "qdrant_client.create_collection(\n",
215
+ " collection_name=COLLECTION_NAME+\"MultiQuery\",\n",
216
+ " vectors_config=VectorParams(size=1536, distance=Distance.COSINE),\n",
217
+ ")\n",
218
+ "\n",
219
+ "qdrant_vector_store = QdrantVectorStore(\n",
220
+ " client=qdrant_client,\n",
221
+ " collection_name=COLLECTION_NAME+\"MultiQuery\",\n",
222
+ " embedding=embeddings,\n",
223
+ ")\n",
224
+ "\n",
225
+ "qdrant_vector_store.add_documents(pdf_docs)"
226
+ ]
227
+ },
228
+ {
229
+ "cell_type": "code",
230
+ "execution_count": 13,
231
+ "metadata": {},
232
+ "outputs": [],
233
+ "source": [
234
+ "vectorstore = Qdrant.from_documents(\n",
235
+ " documents=pdf_docs,\n",
236
+ " embedding=embedding,\n",
237
+ " location=\":memory:\",\n",
238
+ " collection_name=\"Midterm Eval\"\n",
239
+ ")\n",
240
+ "\n",
241
+ "retriever = vectorstore.as_retriever(\n",
242
+ " search_type=\"mmr\",\n",
243
+ " search_kwargs={\"k\": 4, \"fetch_k\": 10},\n",
244
+ ")\n",
245
+ "\n",
246
+ "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True, output_key=\"answer\")"
247
+ ]
248
+ },
249
+ {
250
+ "cell_type": "code",
251
+ "execution_count": 14,
252
+ "metadata": {},
253
+ "outputs": [],
254
+ "source": [
255
+ "from langchain.retrievers.multi_query import MultiQueryRetriever\n",
256
+ "\n",
257
+ "retriever_llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)\n",
258
+ "multiquery_retriever = MultiQueryRetriever.from_llm(\n",
259
+ " retriever=retriever, llm=retriever_llm\n",
260
+ ")"
261
+ ]
262
+ },
263
+ {
264
+ "cell_type": "code",
265
+ "execution_count": 15,
266
+ "metadata": {},
267
+ "outputs": [],
268
+ "source": [
269
+ "llm = ChatOpenAI(\n",
270
+ " model=\"gpt-4o-mini\",\n",
271
+ " temperature=0,\n",
272
+ " streaming=True,\n",
273
+ ")"
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "code",
278
+ "execution_count": 16,
279
+ "metadata": {},
280
+ "outputs": [],
281
+ "source": [
282
+ "custom_template = \"\"\"\n",
283
+ "You are an expert in artificial intelligence policy, ethics, and industry trends. Your task is to provide clear and accurate answers to questions related to AI's role in politics, government regulations, and its ethical implications for enterprises. Use reliable and up-to-date information from government documents, industry reports, and academic research to inform your responses. Make sure to consider how AI is evolving, especially in relation to the current political landscape, and provide answers in a way that is easy to understand for both AI professionals and non-experts.\n",
284
+ "\n",
285
+ "Remember these key points:\n",
286
+ "1. Use \"you\" when addressing the user and \"I\" when referring to yourself.\n",
287
+ "2. If you encounter complex or legal language in the context, simplify it for easy understanding. Imagine you're explaining it to someone who isn't familiar with legal terms.\n",
288
+ "3. Be prepared for follow-up questions and maintain context from previous exchanges.\n",
289
+ "4. If there's no information from a retrieved document in the context to answer a question or if there are no documents to cite, say: \"I'm sorry, I don't know the answer to that question.\"\n",
290
+ "5. When providing information, always cite the source document and page number in parentheses at the end of the relevant sentence or paragraph, like this: (Source: [document name], p. [page number]).\n",
291
+ "\n",
292
+ "Here are a few example questions you might receive:\n",
293
+ "\n",
294
+ "How are governments regulating AI, and what new policies have been implemented?\n",
295
+ "What are the ethical risks of using AI in political decision-making?\n",
296
+ "How can enterprises ensure their AI applications meet government ethical standards?\n",
297
+ "\n",
298
+ "One final rule for you to remember. You CANNOT under any circumstance, answer any question that does not pertain to the AI. If you do answer an out-of-scope question, you could lose your job. If you are asked a question that does not have to do with AI, you must say: \"I'm sorry, I don't know the answer to that question.\"\n",
299
+ "Context: {context}\n",
300
+ "Chat History: {chat_history}\n",
301
+ "Human: {question}\n",
302
+ "AI:\"\"\"\n",
303
+ "\n",
304
+ "PROMPT = PromptTemplate(\n",
305
+ " template=custom_template, input_variables=[\"context\", \"question\", \"chat_history\"]\n",
306
+ ")"
307
+ ]
308
+ },
309
+ {
310
+ "cell_type": "code",
311
+ "execution_count": 18,
312
+ "metadata": {},
313
+ "outputs": [],
314
+ "source": [
315
+ "multiquery_rag_chain = ConversationalRetrievalChain.from_llm(\n",
316
+ " llm,\n",
317
+ " retriever=multiquery_retriever,\n",
318
+ " memory=memory,\n",
319
+ " combine_docs_chain_kwargs={\"prompt\": PROMPT},\n",
320
+ " return_source_documents=True,\n",
321
+ " )"
322
+ ]
323
+ },
324
+ {
325
+ "cell_type": "code",
326
+ "execution_count": 19,
327
+ "metadata": {},
328
+ "outputs": [
329
+ {
330
+ "data": {
331
+ "text/plain": [
332
+ "{'question': 'What are Trustworthy AI Characteristics?',\n",
333
+ " 'chat_history': [HumanMessage(content='What are Trustworthy AI Characteristics?'),\n",
334
+ " AIMessage(content='Trustworthy AI characteristics refer to the essential qualities that AI systems should possess to ensure they are reliable, ethical, and beneficial for society. These characteristics include:\\n\\n1. **Accountable and Transparent**: AI systems should have clear accountability structures, and their decision-making processes should be transparent to users and stakeholders.\\n\\n2. **Explainable and Interpretable**: Users should be able to understand how AI systems make decisions, which helps build trust and allows for better oversight.\\n\\n3. **Fair with Harmful Bias Managed**: AI systems should be designed to minimize bias and ensure fairness, preventing discrimination against any group.\\n\\n4. **Privacy Enhanced**: AI should respect user privacy and protect personal data, ensuring that data handling practices are secure and compliant with regulations.\\n\\n5. **Safe**: AI systems must be safe to use, meaning they should not cause harm to users or society.\\n\\n6. **Valid and Reliable**: AI systems should produce accurate and consistent results, ensuring their outputs can be trusted.\\n\\nThese characteristics are crucial for fostering trust in AI technologies and ensuring they are used responsibly across various applications (Source: NIST AI Risk Management Framework, p. 57).')],\n",
335
+ " 'answer': 'Trustworthy AI characteristics refer to the essential qualities that AI systems should possess to ensure they are reliable, ethical, and beneficial for society. These characteristics include:\\n\\n1. **Accountable and Transparent**: AI systems should have clear accountability structures, and their decision-making processes should be transparent to users and stakeholders.\\n\\n2. **Explainable and Interpretable**: Users should be able to understand how AI systems make decisions, which helps build trust and allows for better oversight.\\n\\n3. **Fair with Harmful Bias Managed**: AI systems should be designed to minimize bias and ensure fairness, preventing discrimination against any group.\\n\\n4. **Privacy Enhanced**: AI should respect user privacy and protect personal data, ensuring that data handling practices are secure and compliant with regulations.\\n\\n5. **Safe**: AI systems must be safe to use, meaning they should not cause harm to users or society.\\n\\n6. **Valid and Reliable**: AI systems should produce accurate and consistent results, ensuring their outputs can be trusted.\\n\\nThese characteristics are crucial for fostering trust in AI technologies and ensuring they are used responsibly across various applications (Source: NIST AI Risk Management Framework, p. 57).',\n",
336
+ " 'source_documents': [Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf', 'page': 12, '_id': 'a2242d1731b243988149f0fb5867e7a1', '_collection_name': 'Midterm Eval'}, page_content='There may also be concerns about emotional entanglement between humans and GAI systems, which \\ncould lead to negative psychological impacts . \\nTrustworthy AI Characteristics: Accountable and Transparent, Explainable and Interpretable, Fair with \\nHarmful Bias Managed, Privacy Enhanced, Safe , Valid and Reliable \\n2.8. Information Integrity \\nInformation integrity describes the “ spectrum of information and associated patterns of its creation, \\nexchange, and consumption in society .” High-integrity information can be trusted; “distinguishes fact \\nfrom fiction, opinion, and inference; acknowledges uncertainties; and is transparent about its level of \\nvetting. This information can be linked to the original source(s) with appropriate evidence. High- integrity \\ninformation is also accurate and reliable, can be verified and authenticated, has a clear chain of custody, \\nand creates reasonable expectations about when its validity may expire. ”11 \\n \\n \\n11 This definition of information integrity is derived from the 2022 White House Roadmap for Researchers on \\nPriorities Related to Information Integrity Research and Development.'),\n",
337
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf', 'page': 0, '_id': '7a1a2660ff514f7b96626a5a0faffcf4', '_collection_name': 'Midterm Eval'}, page_content='NIST Trustworthy and Responsible AI \\nNIST AI 600 -1 \\nArtificial Intelligence Risk Management \\nFramework: Generative Artificial \\nIntelligence Profile \\n \\n \\nThis publication is available free of charge from: \\nhttps://doi.org/10.6028/NIST.AI.600 -1'),\n",
338
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 21, '_id': '63aa95de28fa47af8dd59a1d88d431e0', '_collection_name': 'Midterm Eval'}, page_content=\"SAFE AND EFFECTIVE \\nSYSTEMS \\nHOW THESE PRINCIPLES CAN MOVE INTO PRACTICE\\nReal-life examples of how these principles can become reality, through laws, policies, and practical \\ntechnical and sociotechnical approaches to protecting rights, opportunities, and access. \\nSome U.S government agencies have developed specific frameworks for ethical use of AI \\nsystems. The Department of Energy (DOE) has activated the AI Advancement Council that oversees coordina -\\ntion and advises on implementation of the DOE AI Strategy and addresses issues and/or escalations on the \\nethical use and development of AI systems.20 The Department of Defense has adopted Artificial Intelligence \\nEthical Principles, and tenets for Responsible Artificial Intelligence specifically tailored to its national \\nsecurity and defense activities.21 Similarl y, the U.S. Intelligence Community (IC) has developed the Principles \\nof Artificial Intelligence Ethics for the Intelligence Community to guide personnel on whether and how to \\ndevelop and use AI in furtherance of the IC's mission, as well as an AI Ethics Framework to help implement \\nthese principles.22\\nThe National Science Foundation (NSF) funds extensive research to help foster the \\ndevelopment of automated systems that adhere to and advance their safety, security and \\neffectiveness. Multiple NSF programs support research that directly addresses many of these principles: \\nthe National AI Research Institutes23 support research on all aspects of safe, trustworth y, fai r, and explainable \\nAI algorithms and systems; the Cyber Physical Systems24 program supports research on developing safe \\nautonomous and cyber physical systems with AI components; the Secure and Trustworthy Cyberspace25 \\nprogram supports research on cybersecurity and privacy enhancing technologies in automated systems; the \\nFormal Methods in the Field26 program supports research on rigorous formal verification and analysis of\"),\n",
339
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf', 'page': 60, '_id': '4bad29f0eeb54ac3a470c70cb1c14066', '_collection_name': 'Midterm Eval'}, page_content='57 National Institute of Standards and Technology (2023) AI Risk Management Framework, Appendix B: \\nHow AI Risks Differ from Traditional Software Risks . \\nhttps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Appendices/Appendix_B \\nNational Institute of Standards and Technology (2023) AI RMF Playbook . \\nhttps://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook \\nNational Institue of Standards and Technology (2023) Framing Risk \\nhttps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Foundational_Information/1- sec-risk \\nNational Institu te of Standards and Technology (2023) The Language of Trustworthy AI: An In- Depth \\nGlossary of Terms https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary \\nNational Institue of Standards and Technology (2022) Towards a Standard for Identifying and Managing \\nBias in Artificial Intelligence https://www.nist.gov/publications/towards -standard -identifying -and-\\nmanaging- bias-artificial -intelligence \\nNorthcutt, C. et al. (2021) Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. \\narXiv . https://arxiv.org/pdf/2103.14749 \\nOECD (2023) \"Advancing accountability in AI: Governing and managing risks throughout the lifecycle for \\ntrustworthy AI\", OECD Digital Economy Papers , No. 349, OECD Publishing, Paris . \\nhttps://doi.org/10.1787/2448f04b- en \\nOECD (2024) \"Defining AI incidents and related terms\" OECD Artificial Intelligence Papers , No. 16, OECD \\nPublishing, Paris . https://doi.org/10.1787/d1a8d965- en \\nOpenAI (2023) GPT-4 System Card . https://cdn.openai.com/papers/gpt -4-system -card.pdf \\nOpenAI (2024) GPT-4 Technical Report. https://arxiv.org/pdf/2303.08774 \\nPadmakumar, V. et al. (2024) Does writing with language models reduce content diversity? ICLR . \\nhttps://arxiv.org/pdf/2309.05196 \\nPark, P. et. al. (2024) AI deception: A survey of examples, risks, and potential solutions. Patterns, 5(5). \\narXiv . https://arxiv.org/pdf/2308.14752'),\n",
340
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 20, '_id': 'b6f8f70a683546d8a21da2f3104c694e', '_collection_name': 'Midterm Eval'}, page_content='SAFE AND EFFECTIVE \\nSYSTEMS \\nHOW THESE PRINCIPLES CAN MOVE INTO PRACTICE\\nReal-life examples of how these principles can become reality, through laws, policies, and practical \\ntechnical and sociotechnical approaches to protecting rights, opportunities, and access. \\nExecutive Order 13960 on Promoting the Use of Trustworthy Artificial Intelligence in the \\nFederal Government requires that certain federal agencies adhere to nine principles when \\ndesigning, developing, acquiring, or using AI for purposes other than national security or \\ndefense. These principles—while taking into account the sensitive law enforcement and other contexts in which \\nthe federal government may use AI, as opposed to private sector use of AI—require that AI is: (a) lawful and \\nrespectful of our Nation’s values; (b) purposeful and performance-driven; (c) accurate, reliable, and effective; (d) \\nsafe, secure, and resilient; (e) understandable; (f ) responsible and traceable; (g) regularly monitored; (h) transpar -\\nent; and, (i) accountable. The Blueprint for an AI Bill of Rights is consistent with the Executive Order. \\nAffected agencies across the federal government have released AI use case inventories13 and are implementing \\nplans to bring those AI systems into compliance with the Executive Order or retire them. \\nThe law and policy landscape for motor vehicles shows that strong safety regulations—and \\nmeasures to address harms when they occur—can enhance innovation in the context of com-\\nplex technologies. Cars, like automated digital systems, comprise a complex collection of components. \\nThe National Highway Traffic Safety Administration,14 through its rigorous standards and independent \\nevaluation, helps make sure vehicles on our roads are safe without limiting manufacturers’ ability to \\ninnovate.15 At the same time, rules of the road are implemented locally to impose contextually appropriate \\nrequirements on drivers, such as slowing down near schools or playgrounds.16'),\n",
341
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 20, '_id': '8558a75fa5164bca9f5c89597a3a6085', '_collection_name': 'Midterm Eval'}, page_content='robustness, safety, security (resilience), and mitigation of unintended and/or harmful bias, as well as of \\nharmful uses. The NIST framework will consider and encompass principles such as \\ntransparency, accountability, and fairness during pre-design, design and development, deployment, use, \\nand testing and evaluation of AI technologies and systems. It is expected to be released in the winter of 2022-23. \\n21'),\n",
342
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 20, '_id': '28bb3a622da24104a1a99ef79ae7ddc1', '_collection_name': 'Midterm Eval'}, page_content='requirements on drivers, such as slowing down near schools or playgrounds.16\\nFrom large companies to start-ups, industry is providing innovative solutions that allow \\norganizations to mitigate risks to the safety and efficacy of AI systems, both before \\ndeployment and through monitoring over time.17 These innovative solutions include risk \\nassessments, auditing mechanisms, assessment of organizational procedures, dashboards to allow for ongoing \\nmonitoring, documentation procedures specific to model assessments, and many other strategies that aim to \\nmitigate risks posed by the use of AI to companies’ reputation, legal responsibilities, and other product safety \\nand effectiveness concerns. \\nThe Office of Management and Budget (OMB) has called for an expansion of opportunities \\nfor meaningful stakeholder engagement in the design of programs and services. OMB also \\npoints to numerous examples of effective and proactive stakeholder engagement, including the Community-\\nBased Participatory Research Program developed by the National Institutes of Health and the participatory \\ntechnology assessments developed by the National Oceanic and Atmospheric Administration.18\\nThe National Institute of Standards and Technology (NIST) is developing a risk \\nmanagement framework to better manage risks posed to individuals, organizations, and \\nsociety by AI.19 The NIST AI Risk Management Framework, as mandated by Congress, is intended for \\nvoluntary use to help incorporate trustworthiness considerations into the design, development, use, and \\nevaluation of AI products, services, and systems. The NIST framework is being developed through a consensus-\\ndriven, open, transparent, and collaborative process that includes workshops and other opportunities to provide \\ninput. The NIST framework aims to foster the development of innovative approaches to address \\ncharacteristics of trustworthiness including accuracy, explainability and interpretability, reliability, privacy,'),\n",
343
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf', 'page': 41, '_id': '382ba83020394d4aa556d2ea6ac5fd0d', '_collection_name': 'Midterm Eval'}, page_content='Information Integrity \\nMS-3.3-003 Evaluate potential biases and stereotypes that could emerge from the AI -\\ngenerated content using appropriate methodologies including computational testing methods as well as evaluating structured feedback input. Harmful Bias and Homogenization'),\n",
344
+ " Document(metadata={'source': '/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf', 'page': 50, '_id': 'a5227da1038247eaac3316935b972942', '_collection_name': 'Midterm Eval'}, page_content='warrant additional human review, tracking and documentation, and greater management oversight. \\nAI technology can produce varied outputs in multiple modalities and present many classes of user \\ninterfaces. This leads to a broader set of AI Actors interacting with GAI systems for widely differing \\napplications and contexts of use. These can include data labeling and preparation, development of GAI \\nmodels, content moderation, code generation and review, text generation and editing, image and video \\ngeneration, summarization, search, and chat. These activities can take place within organizational \\nsettings or in the public domain. \\nOrganizations can restrict AI applications that cause harm, exceed stated risk tolerances, or that conflict with their tolerances or values. Governance tools and protocols that are applied to other types of AI systems can be applied to GAI systems. These p lans and actions include: \\n• Accessibility and reasonable accommodations \\n• AI actor credentials and qualifications \\n• Alignment to organizational values • Auditing and assessment \\n• Change -management controls \\n• Commercial use \\n• Data provenance')]}"
345
+ ]
346
+ },
347
+ "execution_count": 19,
348
+ "metadata": {},
349
+ "output_type": "execute_result"
350
+ }
351
+ ],
352
+ "source": [
353
+ "multiquery_rag_chain.invoke({\"question\": \"What are Trustworthy AI Characteristics?\"})"
354
+ ]
355
+ },
356
+ {
357
+ "cell_type": "code",
358
+ "execution_count": null,
359
+ "metadata": {},
360
+ "outputs": [],
361
+ "source": [
362
+ "answers = []\n",
363
+ "contexts = []\n",
364
+ "\n",
365
+ "for question in test_questions:\n",
366
+ " response = multiquery_rag_chain.invoke({\"question\" : question})\n",
367
+ " answers.append(response[\"answer\"])\n",
368
+ " contexts.append([context.page_content for context in response[\"source_documents\"]])"
369
+ ]
370
+ }
371
+ ],
372
+ "metadata": {
373
+ "kernelspec": {
374
+ "display_name": "base",
375
+ "language": "python",
376
+ "name": "python3"
377
+ },
378
+ "language_info": {
379
+ "codemirror_mode": {
380
+ "name": "ipython",
381
+ "version": 3
382
+ },
383
+ "file_extension": ".py",
384
+ "mimetype": "text/x-python",
385
+ "name": "python",
386
+ "nbconvert_exporter": "python",
387
+ "pygments_lexer": "ipython3",
388
+ "version": "3.12.4"
389
+ }
390
+ },
391
+ "nbformat": 4,
392
+ "nbformat_minor": 2
393
+ }
ai_midterm_notebook.ipynb ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 2,
6
+ "metadata": {
7
+ "id": "0drmw73KXIar"
8
+ },
9
+ "outputs": [
10
+ {
11
+ "name": "stdout",
12
+ "output_type": "stream",
13
+ "text": [
14
+ "Requirement already satisfied: langchain in /opt/anaconda3/lib/python3.12/site-packages (0.1.20)\n",
15
+ "Requirement already satisfied: langchain_community in /opt/anaconda3/lib/python3.12/site-packages (0.0.38)\n",
16
+ "Requirement already satisfied: langchain_openai in /opt/anaconda3/lib/python3.12/site-packages (0.1.1)\n",
17
+ "Requirement already satisfied: chromadb in /opt/anaconda3/lib/python3.12/site-packages (0.5.5)\n",
18
+ "Requirement already satisfied: pypdf in /opt/anaconda3/lib/python3.12/site-packages (4.3.1)\n",
19
+ "Requirement already satisfied: langsmith in /opt/anaconda3/lib/python3.12/site-packages (0.1.84)\n",
20
+ "Requirement already satisfied: qdrant-client in /opt/anaconda3/lib/python3.12/site-packages (1.11.1)\n",
21
+ "Requirement already satisfied: PyYAML>=5.3 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (6.0.1)\n",
22
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (2.0.30)\n",
23
+ "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (3.9.5)\n",
24
+ "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (0.5.14)\n",
25
+ "Requirement already satisfied: langchain-core<0.2.0,>=0.1.52 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (0.1.52)\n",
26
+ "Requirement already satisfied: langchain-text-splitters<0.1,>=0.0.1 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (0.0.2)\n",
27
+ "Requirement already satisfied: numpy<2,>=1 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (1.26.4)\n",
28
+ "Requirement already satisfied: pydantic<3,>=1 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (2.8.2)\n",
29
+ "Requirement already satisfied: requests<3,>=2 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (2.32.2)\n",
30
+ "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /opt/anaconda3/lib/python3.12/site-packages (from langchain) (8.5.0)\n",
31
+ "Requirement already satisfied: openai<2.0.0,>=1.10.0 in /opt/anaconda3/lib/python3.12/site-packages (from langchain_openai) (1.35.12)\n",
32
+ "Requirement already satisfied: tiktoken<1,>=0.5.2 in /opt/anaconda3/lib/python3.12/site-packages (from langchain_openai) (0.5.2)\n",
33
+ "Requirement already satisfied: build>=1.0.3 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.2.1)\n",
34
+ "Requirement already satisfied: chroma-hnswlib==0.7.6 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.7.6)\n",
35
+ "Requirement already satisfied: fastapi>=0.95.2 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.110.3)\n",
36
+ "Requirement already satisfied: uvicorn>=0.18.3 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.25.0)\n",
37
+ "Requirement already satisfied: posthog>=2.4.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (3.5.0)\n",
38
+ "Requirement already satisfied: typing-extensions>=4.5.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (4.11.0)\n",
39
+ "Requirement already satisfied: onnxruntime>=1.14.1 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.18.1)\n",
40
+ "Requirement already satisfied: opentelemetry-api>=1.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.25.0)\n",
41
+ "Requirement already satisfied: opentelemetry-exporter-otlp-proto-grpc>=1.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.25.0)\n",
42
+ "Requirement already satisfied: opentelemetry-instrumentation-fastapi>=0.41b0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.46b0)\n",
43
+ "Requirement already satisfied: opentelemetry-sdk>=1.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.25.0)\n",
44
+ "Requirement already satisfied: tokenizers>=0.13.2 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.19.1)\n",
45
+ "Requirement already satisfied: pypika>=0.48.9 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.48.9)\n",
46
+ "Requirement already satisfied: tqdm>=4.65.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (4.66.4)\n",
47
+ "Requirement already satisfied: overrides>=7.3.1 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (7.4.0)\n",
48
+ "Requirement already satisfied: importlib-resources in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (6.4.0)\n",
49
+ "Requirement already satisfied: grpcio>=1.58.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (1.66.1)\n",
50
+ "Requirement already satisfied: bcrypt>=4.0.1 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (4.2.0)\n",
51
+ "Requirement already satisfied: typer>=0.9.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.12.3)\n",
52
+ "Requirement already satisfied: kubernetes>=28.1.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (30.1.0)\n",
53
+ "Requirement already satisfied: mmh3>=4.0.1 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (4.1.0)\n",
54
+ "Requirement already satisfied: orjson>=3.9.12 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (3.10.6)\n",
55
+ "Requirement already satisfied: httpx>=0.27.0 in /opt/anaconda3/lib/python3.12/site-packages (from chromadb) (0.27.0)\n",
56
+ "Requirement already satisfied: grpcio-tools>=1.41.0 in /opt/anaconda3/lib/python3.12/site-packages (from qdrant-client) (1.62.3)\n",
57
+ "Requirement already satisfied: portalocker<3.0.0,>=2.7.0 in /opt/anaconda3/lib/python3.12/site-packages (from qdrant-client) (2.10.1)\n",
58
+ "Requirement already satisfied: urllib3<3,>=1.26.14 in /opt/anaconda3/lib/python3.12/site-packages (from qdrant-client) (2.2.2)\n",
59
+ "Requirement already satisfied: aiosignal>=1.1.2 in /opt/anaconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.2.0)\n",
60
+ "Requirement already satisfied: attrs>=17.3.0 in /opt/anaconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.1.0)\n",
61
+ "Requirement already satisfied: frozenlist>=1.1.1 in /opt/anaconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.0)\n",
62
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /opt/anaconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n",
63
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/anaconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.3)\n",
64
+ "Requirement already satisfied: packaging>=19.1 in /opt/anaconda3/lib/python3.12/site-packages (from build>=1.0.3->chromadb) (23.2)\n",
65
+ "Requirement already satisfied: pyproject_hooks in /opt/anaconda3/lib/python3.12/site-packages (from build>=1.0.3->chromadb) (1.1.0)\n",
66
+ "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /opt/anaconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (3.21.3)\n",
67
+ "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /opt/anaconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (0.9.0)\n",
68
+ "Requirement already satisfied: starlette<0.38.0,>=0.37.2 in /opt/anaconda3/lib/python3.12/site-packages (from fastapi>=0.95.2->chromadb) (0.37.2)\n",
69
+ "Requirement already satisfied: protobuf<5.0dev,>=4.21.6 in /opt/anaconda3/lib/python3.12/site-packages (from grpcio-tools>=1.41.0->qdrant-client) (4.25.4)\n",
70
+ "Requirement already satisfied: setuptools in /opt/anaconda3/lib/python3.12/site-packages (from grpcio-tools>=1.41.0->qdrant-client) (69.5.1)\n",
71
+ "Requirement already satisfied: anyio in /opt/anaconda3/lib/python3.12/site-packages (from httpx>=0.27.0->chromadb) (3.7.1)\n",
72
+ "Requirement already satisfied: certifi in /opt/anaconda3/lib/python3.12/site-packages (from httpx>=0.27.0->chromadb) (2024.6.2)\n",
73
+ "Requirement already satisfied: httpcore==1.* in /opt/anaconda3/lib/python3.12/site-packages (from httpx>=0.27.0->chromadb) (1.0.5)\n",
74
+ "Requirement already satisfied: idna in /opt/anaconda3/lib/python3.12/site-packages (from httpx>=0.27.0->chromadb) (3.7)\n",
75
+ "Requirement already satisfied: sniffio in /opt/anaconda3/lib/python3.12/site-packages (from httpx>=0.27.0->chromadb) (1.3.0)\n",
76
+ "Requirement already satisfied: h11<0.15,>=0.13 in /opt/anaconda3/lib/python3.12/site-packages (from httpcore==1.*->httpx>=0.27.0->chromadb) (0.14.0)\n",
77
+ "Requirement already satisfied: h2<5,>=3 in /opt/anaconda3/lib/python3.12/site-packages (from httpx[http2]>=0.20.0->qdrant-client) (4.1.0)\n",
78
+ "Requirement already satisfied: six>=1.9.0 in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (1.16.0)\n",
79
+ "Requirement already satisfied: python-dateutil>=2.5.3 in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (2.9.0.post0)\n",
80
+ "Requirement already satisfied: google-auth>=1.0.1 in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (2.33.0)\n",
81
+ "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (1.8.0)\n",
82
+ "Requirement already satisfied: requests-oauthlib in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (2.0.0)\n",
83
+ "Requirement already satisfied: oauthlib>=3.2.2 in /opt/anaconda3/lib/python3.12/site-packages (from kubernetes>=28.1.0->chromadb) (3.2.2)\n",
84
+ "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /opt/anaconda3/lib/python3.12/site-packages (from langchain-core<0.2.0,>=0.1.52->langchain) (1.33)\n",
85
+ "Requirement already satisfied: coloredlogs in /opt/anaconda3/lib/python3.12/site-packages (from onnxruntime>=1.14.1->chromadb) (15.0.1)\n",
86
+ "Requirement already satisfied: flatbuffers in /opt/anaconda3/lib/python3.12/site-packages (from onnxruntime>=1.14.1->chromadb) (24.3.25)\n",
87
+ "Requirement already satisfied: sympy in /opt/anaconda3/lib/python3.12/site-packages (from onnxruntime>=1.14.1->chromadb) (1.12)\n",
88
+ "Requirement already satisfied: distro<2,>=1.7.0 in /opt/anaconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.9.0)\n",
89
+ "Requirement already satisfied: deprecated>=1.2.6 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-api>=1.2.0->chromadb) (1.2.14)\n",
90
+ "Requirement already satisfied: importlib-metadata<=7.1,>=6.0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-api>=1.2.0->chromadb) (6.11.0)\n",
91
+ "Requirement already satisfied: googleapis-common-protos~=1.52 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.63.2)\n",
92
+ "Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.25.0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.25.0)\n",
93
+ "Requirement already satisfied: opentelemetry-proto==1.25.0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.25.0)\n",
94
+ "Requirement already satisfied: opentelemetry-instrumentation-asgi==0.46b0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.46b0)\n",
95
+ "Requirement already satisfied: opentelemetry-instrumentation==0.46b0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.46b0)\n",
96
+ "Requirement already satisfied: opentelemetry-semantic-conventions==0.46b0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.46b0)\n",
97
+ "Requirement already satisfied: opentelemetry-util-http==0.46b0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.46b0)\n",
98
+ "Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation==0.46b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (1.14.1)\n",
99
+ "Requirement already satisfied: asgiref~=3.0 in /opt/anaconda3/lib/python3.12/site-packages (from opentelemetry-instrumentation-asgi==0.46b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (3.8.1)\n",
100
+ "Requirement already satisfied: monotonic>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from posthog>=2.4.0->chromadb) (1.6)\n",
101
+ "Requirement already satisfied: backoff>=1.10.0 in /opt/anaconda3/lib/python3.12/site-packages (from posthog>=2.4.0->chromadb) (2.2.1)\n",
102
+ "Requirement already satisfied: annotated-types>=0.4.0 in /opt/anaconda3/lib/python3.12/site-packages (from pydantic<3,>=1->langchain) (0.6.0)\n",
103
+ "Requirement already satisfied: pydantic-core==2.20.1 in /opt/anaconda3/lib/python3.12/site-packages (from pydantic<3,>=1->langchain) (2.20.1)\n",
104
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain) (3.3.2)\n",
105
+ "Requirement already satisfied: regex>=2022.1.18 in /opt/anaconda3/lib/python3.12/site-packages (from tiktoken<1,>=0.5.2->langchain_openai) (2023.10.3)\n",
106
+ "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /opt/anaconda3/lib/python3.12/site-packages (from tokenizers>=0.13.2->chromadb) (0.23.4)\n",
107
+ "Requirement already satisfied: click>=8.0.0 in /opt/anaconda3/lib/python3.12/site-packages (from typer>=0.9.0->chromadb) (8.1.7)\n",
108
+ "Requirement already satisfied: shellingham>=1.3.0 in /opt/anaconda3/lib/python3.12/site-packages (from typer>=0.9.0->chromadb) (1.5.4)\n",
109
+ "Requirement already satisfied: rich>=10.11.0 in /opt/anaconda3/lib/python3.12/site-packages (from typer>=0.9.0->chromadb) (13.3.5)\n",
110
+ "Requirement already satisfied: httptools>=0.5.0 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.6.1)\n",
111
+ "Requirement already satisfied: python-dotenv>=0.13 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (1.0.0)\n",
112
+ "Requirement already satisfied: uvloop!=0.15.0,!=0.15.1,>=0.14.0 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.19.0)\n",
113
+ "Requirement already satisfied: watchfiles>=0.13 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.20.0)\n",
114
+ "Requirement already satisfied: websockets>=10.4 in /opt/anaconda3/lib/python3.12/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (12.0)\n",
115
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/anaconda3/lib/python3.12/site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (5.3.3)\n",
116
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/anaconda3/lib/python3.12/site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.2.8)\n",
117
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/anaconda3/lib/python3.12/site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (4.9)\n",
118
+ "Requirement already satisfied: hyperframe<7,>=6.0 in /opt/anaconda3/lib/python3.12/site-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client) (6.0.1)\n",
119
+ "Requirement already satisfied: hpack<5,>=4.0 in /opt/anaconda3/lib/python3.12/site-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client) (4.0.0)\n",
120
+ "Requirement already satisfied: filelock in /opt/anaconda3/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.2->chromadb) (3.13.1)\n",
121
+ "Requirement already satisfied: fsspec>=2023.5.0 in /opt/anaconda3/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.2->chromadb) (2024.3.1)\n",
122
+ "Requirement already satisfied: zipp>=0.5 in /opt/anaconda3/lib/python3.12/site-packages (from importlib-metadata<=7.1,>=6.0->opentelemetry-api>=1.2.0->chromadb) (3.17.0)\n",
123
+ "Requirement already satisfied: jsonpointer>=1.9 in /opt/anaconda3/lib/python3.12/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.2.0,>=0.1.52->langchain) (2.1)\n",
124
+ "Requirement already satisfied: markdown-it-py<3.0.0,>=2.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from rich>=10.11.0->typer>=0.9.0->chromadb) (2.2.0)\n",
125
+ "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/anaconda3/lib/python3.12/site-packages (from rich>=10.11.0->typer>=0.9.0->chromadb) (2.15.1)\n",
126
+ "Requirement already satisfied: mypy-extensions>=0.3.0 in /opt/anaconda3/lib/python3.12/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain) (1.0.0)\n",
127
+ "Requirement already satisfied: humanfriendly>=9.1 in /opt/anaconda3/lib/python3.12/site-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb) (10.0)\n",
128
+ "Requirement already satisfied: mpmath>=0.19 in /opt/anaconda3/lib/python3.12/site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n",
129
+ "Requirement already satisfied: mdurl~=0.1 in /opt/anaconda3/lib/python3.12/site-packages (from markdown-it-py<3.0.0,>=2.2.0->rich>=10.11.0->typer>=0.9.0->chromadb) (0.1.0)\n",
130
+ "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/anaconda3/lib/python3.12/site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.4.8)\n"
131
+ ]
132
+ }
133
+ ],
134
+ "source": [
135
+ "!pip install langchain langchain_community langchain_openai chromadb pypdf langsmith qdrant-client"
136
+ ]
137
+ },
138
+ {
139
+ "cell_type": "code",
140
+ "execution_count": 12,
141
+ "metadata": {},
142
+ "outputs": [],
143
+ "source": [
144
+ "from langchain_community.document_loaders import PyPDFLoader\n",
145
+ "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
146
+ "from langchain_openai import OpenAIEmbeddings\n",
147
+ "from langchain_community.vectorstores import Qdrant\n",
148
+ "from langchain_openai import ChatOpenAI\n",
149
+ "from langchain.prompts import PromptTemplate\n",
150
+ "from langchain.chains import ConversationalRetrievalChain, LLMChain\n",
151
+ "from langchain.memory import ConversationBufferMemory\n",
152
+ "from langchain_community.document_loaders import YoutubeLoader\n",
153
+ "from langchain_openai import OpenAIEmbeddings\n",
154
+ "import pandas as pd\n",
155
+ "import nest_asyncio\n",
156
+ "import os\n",
157
+ "import getpass\n",
158
+ "import pprint"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type": "code",
163
+ "execution_count": 4,
164
+ "metadata": {},
165
+ "outputs": [],
166
+ "source": [
167
+ "pdf_paths = [\"/Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf\",\n",
168
+ "\"/Users/xico/AIMakerSpace-Midterm/Blueprint-for-an-AI-Bill-of-Rights.pdf\"]"
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "execution_count": 5,
174
+ "metadata": {},
175
+ "outputs": [],
176
+ "source": [
177
+ "pdf_documents = []\n",
178
+ "for pdf_path in pdf_paths:\n",
179
+ " loader = PyPDFLoader(pdf_path)\n",
180
+ " pdf_documents.extend(loader.load())"
181
+ ]
182
+ },
183
+ {
184
+ "cell_type": "code",
185
+ "execution_count": 7,
186
+ "metadata": {},
187
+ "outputs": [],
188
+ "source": [
189
+ "text_splitter = RecursiveCharacterTextSplitter(\n",
190
+ " chunk_size=1000,\n",
191
+ " chunk_overlap=200,\n",
192
+ " )\n",
193
+ "pdf_docs = text_splitter.split_documents(pdf_documents)"
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": 8,
199
+ "metadata": {},
200
+ "outputs": [
201
+ {
202
+ "data": {
203
+ "text/plain": [
204
+ "524"
205
+ ]
206
+ },
207
+ "execution_count": 8,
208
+ "metadata": {},
209
+ "output_type": "execute_result"
210
+ }
211
+ ],
212
+ "source": [
213
+ "len(pdf_docs)"
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": 9,
219
+ "metadata": {},
220
+ "outputs": [],
221
+ "source": [
222
+ "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter Your OpenAI API Key:\")"
223
+ ]
224
+ },
225
+ {
226
+ "cell_type": "code",
227
+ "execution_count": 13,
228
+ "metadata": {},
229
+ "outputs": [],
230
+ "source": [
231
+ "embedding = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
232
+ "vectorstore = Qdrant.from_documents(\n",
233
+ " documents=pdf_docs,\n",
234
+ " embedding=embedding,\n",
235
+ " location=\":memory:\",\n",
236
+ " collection_name=\"Full Content\"\n",
237
+ ")"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "code",
242
+ "execution_count": 14,
243
+ "metadata": {},
244
+ "outputs": [],
245
+ "source": [
246
+ "retriever = vectorstore.as_retriever(\n",
247
+ " search_type=\"mmr\",\n",
248
+ " search_kwargs={\"k\": 4, \"fetch_k\": 10},\n",
249
+ ")\n",
250
+ "llm = ChatOpenAI(\n",
251
+ " model=\"gpt-4o-mini\",\n",
252
+ " temperature=0,\n",
253
+ " streaming=True,\n",
254
+ ")"
255
+ ]
256
+ },
257
+ {
258
+ "cell_type": "code",
259
+ "execution_count": 29,
260
+ "metadata": {},
261
+ "outputs": [],
262
+ "source": [
263
+ "custom_template = \"\"\"\n",
264
+ "You are an expert in artificial intelligence policy, ethics, and industry trends. Your task is to provide clear and accurate answers to questions related to AI's role in politics, government regulations, and its ethical implications for enterprises. Use reliable and up-to-date information from government documents, industry reports, and academic research to inform your responses. Make sure to consider how AI is evolving, especially in relation to the current political landscape, and provide answers in a way that is easy to understand for both AI professionals and non-experts.\n",
265
+ "\n",
266
+ "For each question:\n",
267
+ "\n",
268
+ "Provide a summary of the most relevant insights from industry trends and government regulations.\n",
269
+ "Mention any government agencies, regulations, or political initiatives that play a role in AI governance.\n",
270
+ "Explain potential ethical concerns and how enterprises can navigate them.\n",
271
+ "Use real-world examples when possible to illustrate your points.\n",
272
+ "Here are a few example questions you might receive:\n",
273
+ "\n",
274
+ "How are governments regulating AI, and what new policies have been implemented?\n",
275
+ "What are the ethical risks of using AI in political decision-making?\n",
276
+ "How can enterprises ensure their AI applications meet government ethical standards?\n",
277
+ "\n",
278
+ "One final rule for you to remember. You CANNOT under any circumstance, answer any question that does not pertain to the AI. If you do answer an out-of-scope question, you could lose your job. If you are asked a question that does not have to do with AI, you must say: \"I'm sorry, I don't know the answer to that question.\"\n",
279
+ "Context: {context}\n",
280
+ "Chat History: {chat_history}\n",
281
+ "Human: {question}\n",
282
+ "AI:\"\"\""
283
+ ]
284
+ },
285
+ {
286
+ "cell_type": "code",
287
+ "execution_count": 30,
288
+ "metadata": {},
289
+ "outputs": [],
290
+ "source": [
291
+ "PROMPT = PromptTemplate(\n",
292
+ " template=custom_template, input_variables=[\"context\", \"question\", \"chat_history\"]\n",
293
+ ")"
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": 93,
299
+ "metadata": {},
300
+ "outputs": [],
301
+ "source": [
302
+ "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True, output_key=\"answer\")"
303
+ ]
304
+ },
305
+ {
306
+ "cell_type": "code",
307
+ "execution_count": 32,
308
+ "metadata": {},
309
+ "outputs": [],
310
+ "source": [
311
+ "from langchain.schema.runnable import RunnablePassthrough\n",
312
+ "from langchain.schema.runnable import RunnableLambda"
313
+ ]
314
+ },
315
+ {
316
+ "cell_type": "code",
317
+ "execution_count": 33,
318
+ "metadata": {},
319
+ "outputs": [],
320
+ "source": [
321
+ "from operator import itemgetter"
322
+ ]
323
+ },
324
+ {
325
+ "cell_type": "code",
326
+ "execution_count": 94,
327
+ "metadata": {},
328
+ "outputs": [],
329
+ "source": [
330
+ "qa_chain = ConversationalRetrievalChain.from_llm(\n",
331
+ " llm,\n",
332
+ " retriever=retriever,\n",
333
+ " memory=memory,\n",
334
+ " combine_docs_chain_kwargs={\"prompt\": PROMPT},\n",
335
+ " return_source_documents=True,\n",
336
+ " )"
337
+ ]
338
+ },
339
+ {
340
+ "cell_type": "code",
341
+ "execution_count": 67,
342
+ "metadata": {},
343
+ "outputs": [],
344
+ "source": [
345
+ "import pprint"
346
+ ]
347
+ },
348
+ {
349
+ "cell_type": "code",
350
+ "execution_count": 84,
351
+ "metadata": {},
352
+ "outputs": [],
353
+ "source": [
354
+ "response = qa_chain.invoke({\n",
355
+ " \"question\": \"How is AI being used in government services?\",\n",
356
+ " \"chat_history\": []\n",
357
+ "})\n",
358
+ "chat_history = response['chat_history']\n",
359
+ "source_documents = response['source_documents']\n",
360
+ "response = response['answer']"
361
+ ]
362
+ },
363
+ {
364
+ "cell_type": "code",
365
+ "execution_count": 95,
366
+ "metadata": {},
367
+ "outputs": [],
368
+ "source": [
369
+ "response = qa_chain.invoke({\n",
370
+ " \"question\": \"What are Trustworthy AI Characteristics?\",\n",
371
+ " \"chat_history\": []\n",
372
+ "})\n",
373
+ "chat_history = response['chat_history']\n",
374
+ "source = response['source_documents'][0].metadata['source']\n",
375
+ "page_number = str(response['source_documents'][0].metadata['page'])\n",
376
+ "response = response['answer'] + f\"\\nSource: {source} Page number: {page_number}\""
377
+ ]
378
+ },
379
+ {
380
+ "cell_type": "code",
381
+ "execution_count": 96,
382
+ "metadata": {},
383
+ "outputs": [
384
+ {
385
+ "name": "stdout",
386
+ "output_type": "stream",
387
+ "text": [
388
+ "('Trustworthy AI characteristics refer to the essential attributes that ensure '\n",
389
+ " 'artificial intelligence systems operate in a manner that is ethical, '\n",
390
+ " 'reliable, and aligned with societal values. These characteristics include:\\n'\n",
391
+ " '\\n'\n",
392
+ " '1. **Accountable and Transparent**: AI systems should be designed to allow '\n",
393
+ " 'for accountability in their decision-making processes. This means that the '\n",
394
+ " 'rationale behind AI decisions should be clear and understandable to users '\n",
395
+ " 'and stakeholders.\\n'\n",
396
+ " '\\n'\n",
397
+ " '2. **Safe**: AI systems must be developed and deployed in a way that '\n",
398
+ " 'minimizes risks to users and society. This includes ensuring that systems '\n",
399
+ " 'are robust against failures and adversarial attacks.\\n'\n",
400
+ " '\\n'\n",
401
+ " '3. **Valid and Reliable**: AI systems should produce consistent and accurate '\n",
402
+ " 'results across various scenarios. This involves rigorous testing and '\n",
403
+ " 'validation to ensure that the AI performs as intended.\\n'\n",
404
+ " '\\n'\n",
405
+ " '4. **Interpretable and Explainable**: Users should be able to understand how '\n",
406
+ " 'AI systems arrive at their conclusions. This is crucial for building trust '\n",
407
+ " 'and ensuring that users can make informed decisions based on AI outputs.\\n'\n",
408
+ " '\\n'\n",
409
+ " '5. **Fair with Harmful Bias Managed**: AI systems should be designed to '\n",
410
+ " 'avoid perpetuating or amplifying biases. This involves actively managing and '\n",
411
+ " 'mitigating any harmful biases that may exist in training data or '\n",
412
+ " 'algorithms.\\n'\n",
413
+ " '\\n'\n",
414
+ " '6. **Human-AI Configuration**: The interaction between humans and AI systems '\n",
415
+ " 'should be optimized to enhance collaboration. This includes understanding '\n",
416
+ " 'the strengths and limitations of both humans and AI to prevent issues like '\n",
417
+ " 'automation bias, where users may over-rely on AI outputs.\\n'\n",
418
+ " '\\n'\n",
419
+ " 'These characteristics are increasingly recognized in government regulations '\n",
420
+ " 'and industry standards, as they form the foundation for developing '\n",
421
+ " 'trustworthy AI systems that respect ethical considerations and societal '\n",
422
+ " 'norms.\\n'\n",
423
+ " 'Source: /Users/xico/AIMakerSpace-Midterm/AI_Risk_Management_Framework.pdf '\n",
424
+ " 'Page number: 13')\n"
425
+ ]
426
+ }
427
+ ],
428
+ "source": [
429
+ "pprint.pprint(response)"
430
+ ]
431
+ },
432
+ {
433
+ "cell_type": "code",
434
+ "execution_count": 85,
435
+ "metadata": {},
436
+ "outputs": [
437
+ {
438
+ "data": {
439
+ "text/plain": [
440
+ "dict_keys(['source', 'page', '_id', '_collection_name'])"
441
+ ]
442
+ },
443
+ "execution_count": 85,
444
+ "metadata": {},
445
+ "output_type": "execute_result"
446
+ }
447
+ ],
448
+ "source": [
449
+ "source_documents[0].metadata.keys()"
450
+ ]
451
+ },
452
+ {
453
+ "cell_type": "code",
454
+ "execution_count": null,
455
+ "metadata": {},
456
+ "outputs": [],
457
+ "source": []
458
+ }
459
+ ],
460
+ "metadata": {
461
+ "colab": {
462
+ "provenance": []
463
+ },
464
+ "kernelspec": {
465
+ "display_name": "Python 3",
466
+ "name": "python3"
467
+ },
468
+ "language_info": {
469
+ "codemirror_mode": {
470
+ "name": "ipython",
471
+ "version": 3
472
+ },
473
+ "file_extension": ".py",
474
+ "mimetype": "text/x-python",
475
+ "name": "python",
476
+ "nbconvert_exporter": "python",
477
+ "pygments_lexer": "ipython3",
478
+ "version": "3.12.4"
479
+ }
480
+ },
481
+ "nbformat": 4,
482
+ "nbformat_minor": 0
483
+ }
app.py CHANGED
@@ -116,6 +116,8 @@ async def main(message: cl.Message):
116
  )
117
 
118
  answer = response["answer"]
119
- source_documents = response["source_documents"]
120
 
121
- await cl.Message(content=answer, author="AI").send()
 
 
 
116
  )
117
 
118
  answer = response["answer"]
119
+ source_documents = response["source_documents"][0].metadata["source"]
120
 
121
+ content = answer + source_documents
122
+
123
+ await cl.Message(content=content, author="AI").send()
baseline_ragas_results.csv ADDED
The diff for this file is too large to render. See raw diff
 
chunksize_eval.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Metric,Baseline,MediumChunk,LargeChunk,HigestValue
2
+ faithfulness,0.6984065006200426,0.8953588522951268,0.7961314889437262,0.9 (MediumChunk)
3
+ answer_relevancy,0.9467657853940756,0.9554186934800531,0.9592959096955642,0.96 (LargeChunk)
4
+ context_recall,0.8559027777777777,0.9340277777777777,0.84375,0.93 (MediumChunk)
5
+ context_precision,0.90393518515679,0.9374999999721645,0.9293981481191937,0.94 (MediumChunk)
6
+ answer_correctness,0.6487440233100689,0.6292674192183824,0.6325804534778305,0.65 (Baseline)
df_baseline_metrics.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Metric,Baseline
2
+ faithfulness,0.6984065006200426
3
+ answer_relevancy,0.9467657853940756
4
+ context_recall,0.8559027777777777
5
+ context_precision,0.90393518515679
6
+ answer_correctness,0.6487440233100689
large_chunk_metrics.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Metric,LargeChunk
2
+ faithfulness,0.7961314889437262
3
+ answer_relevancy,0.9592959096955642
4
+ context_recall,0.84375
5
+ context_precision,0.9293981481191937
6
+ answer_correctness,0.6325804534778305
large_chunk_ragas_results.csv ADDED
The diff for this file is too large to render. See raw diff
 
medium_chunk_metrics.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Metric,MediumChunk
2
+ faithfulness,0.8953588522951268
3
+ answer_relevancy,0.9554186934800531
4
+ context_recall,0.9340277777777777
5
+ context_precision,0.9374999999721645
6
+ answer_correctness,0.6292674192183824
medium_chunk_ragas_results.csv ADDED
The diff for this file is too large to render. See raw diff
 
medium_chunk_vs_med_chunk_med_overlap_metrics.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Metric,MediumChunk,MedChunkMedOverlap,MediumChunk -> MedChunkMedOverlap
2
+ faithfulness,0.8953588522951268,0.8408805591636662,-0.054478293131460576
3
+ answer_relevancy,0.9554186934800531,0.9520680673597367,-0.0033506261203164467
4
+ context_recall,0.9340277777777777,0.8472222222222222,-0.08680555555555547
5
+ context_precision,0.9374999999721645,0.9571759258948881,0.019675925922723603
6
+ answer_correctness,0.6292674192183824,0.6181490090254416,-0.011118410192940797
midterm_synthetic_data_gen.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
synthetic_midterm_question_dataset.csv ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ question,contexts,ground_truth,evolution_type,metadata,episode_done
2
+ What is the significance of providing notice and explanation as a legal requirement in the context of automated systems?,"["" \n \n NOTICE & \nEXPLANATION \nWHY THIS PRINCIPLE IS IMPORTANT\nThis section provides a brief summary of the problems which the principle seeks to address and protect \nagainst, including illustrative examples. \nAutomated systems now determine opportunities, from employment to credit, and directly shape the American \npublic’s experiences, from the courtroom to online classrooms, in ways that profoundly impact people’s lives. But this expansive impact is not always visible. An applicant might not know whether a person rejected their resume or a hiring algorithm moved them to the bottom of the list. A defendant in the courtroom might not know if a judge deny\n-\ning their bail is informed by an automated system that labeled them “high risk.” From correcting errors to contesting decisions, people are often denied the knowledge they need to address the impact of automated systems on their lives. Notice and explanations also serve an important safety and efficacy purpose, allowing experts to verify the reasonable\n-\nness of a recommendation before enacting it. \nIn order to guard against potential harms, the American public needs to know if an automated system is being used. Clear, brief, and understandable notice is a prerequisite for achieving the other protections in this framework. Like\n-\nwise, the public is often unable to ascertain how or why an automated system has made a decision or contributed to a particular outcome. The decision-making processes of automated systems tend to be opaque, complex, and, therefore, unaccountable, whether by design or by omission. These factors can make explanations both more challenging and more important, and should not be used as a pretext to avoid explaining important decisions to the people impacted by those choices. In the context of automated systems, clear and valid explanations should be recognized as a baseline requirement. \nProviding notice has long been a standard practice, and in many cases is a legal requirement, when, for example, making a video recording of someone (outside of a law enforcement or national security context). In some cases, such as credit, lenders are required to provide notice and explanation to consumers. Techniques used to automate the process of explaining such systems are under active research and improvement and such explanations can take many forms. Innovative companies and researchers are rising to the challenge and creating and deploying explanatory systems that can help the public better understand decisions that impact them. \nWhile notice and explanation requirements are already in place in some sectors or situations, the American public deserve to know consistently and across sectors if an automated system is being used in a way that impacts their rights, opportunities, or access. This knowledge should provide confidence in how the public is being treated, and trust in the validity and reasonable use of automated systems. \n• A lawyer representing an older client with disabilities who had been cut off from Medicaid-funded home\nhealth-care assistance couldn't determine why\n, especially since the decision went against historical access\npractices. In a court hearing, the lawyer learned from a witness that the state in which the older client\nlived \nhad recently adopted a new algorithm to determine eligibility.83 The lack of a timely explanation made it\nharder \nto understand and contest the decision.\n•\nA formal child welfare investigation is opened against a parent based on an algorithm and without the parent\never \nbeing notified that data was being collected and used as part of an algorithmic child maltreatment\nrisk assessment.84 The lack of notice or an explanation makes it harder for those performing child\nmaltreatment assessments to validate the risk assessment and denies parents knowledge that could help them\ncontest a decision.\n41""]","Providing notice and explanation as a legal requirement in the context of automated systems is significant because it allows individuals to understand how automated systems are impacting their lives. It helps in correcting errors, contesting decisions, and verifying the reasonableness of recommendations before enacting them. Clear and valid explanations are essential to ensure transparency, accountability, and trust in the use of automated systems across various sectors.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 40}]",True
3
+ "How can structured human feedback exercises, such as GAI red-teaming, be beneficial for GAI risk measurement and management?","[' \n29 MS-1.1-006 Implement continuous monitoring of GAI system impacts to identify whether GAI \noutputs are equitable across various sub- populations. Seek active and direct \nfeedback from affected communities via structured feedback mechanisms or red -\nteaming to monitor and improve outputs. Harmful Bias and Homogenization \nMS-1.1-007 Evaluate the quality and integrity of data used in training and the provenance of \nAI-generated content , for example by e mploying techniques like chaos \nengineering and seeking stakeholder feedback. Information Integrity \nMS-1.1-008 Define use cases, contexts of use, capabilities, and negative impacts where \nstructured human feedback exercises, e.g., GAI red- teaming, would be most \nbeneficial for GAI risk measurement and management based on the context of \nuse. Harmful Bias and \nHomogenization ; CBRN \nInformation or Capabilities \nMS-1.1-0 09 Track and document risks or opportunities related to all GAI risks that cannot be \nmeasured quantitatively, including explanations as to why some risks cannot be \nmeasured (e.g., due to technological limitations, resource constraints, or trustworthy considerations). Include unmeasured risks in marginal risks. Information Integrity \nAI Actor Tasks: AI Development, Domain Experts, TEVV \n \nMEASURE 1.3: Internal experts who did not serve as front -line developers for the system and/or independent assessors are \ninvolved in regular assessments and updates. Domain experts, users, AI Actors external to the team that developed or deployed the \nAI system, and affected communities are consulted in support of assessments as necessary per organizational risk tolerance . \nAction ID Suggested Action GAI Risks \nMS-1.3-001 Define relevant groups of interest (e.g., demographic groups, subject matter \nexperts, experience with GAI technology) within the context of use as part of \nplans for gathering structured public feedback. Human -AI Configuration ; Harmful \nBias and Homogenization ; CBRN \nInformation or Capabilities \nMS-1.3-002 Engage in internal and external evaluations , GAI red -teaming, impact \nassessments, or other structured human feedback exercises in consultation \nwith representative AI Actors with expertise and familiarity in the context of \nuse, and/or who are representative of the populations associated with the context of use. Human -AI Configuration ; Harmful \nBias and Homogenization ; CBRN \nInformation or Capabilities \nMS-1.3-0 03 Verify those conducting structured human feedback exercises are not directly \ninvolved in system development tasks for the same GAI model. Human -AI Configuration ; Data \nPrivacy \nAI Actor Tasks: AI Deployment, AI Development, AI Impact Assessment, Affected Individuals and Communities, Domain Experts, \nEnd-Users, Operation and Monitoring, TEVV \n ']","Structured human feedback exercises, such as GAI red-teaming, can be beneficial for GAI risk measurement and management by defining use cases, contexts of use, capabilities, and negative impacts where such exercises would be most beneficial. These exercises help in monitoring and improving outputs, evaluating the quality and integrity of data used in training, and tracking and documenting risks or opportunities related to GAI risks that cannot be measured quantitatively. Additionally, seeking active and direct feedback from affected communities through red-teaming can enhance information integrity and help in identifying harmful bias and homogenization in AI systems.",simple,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 32}]",True
4
+ How do measurement gaps between laboratory and real-world settings impact the assessment of GAI systems in the context of pre-deployment testing?,"[' \n49 early lifecycle TEVV approaches are developed and matured for GAI, organizations may use \nrecommended “pre- deployment testing” practices to measure performance, capabilities, limits, risks, \nand impacts. This section describes risk measurement and estimation as part of pre -deployment TEVV, \nand examines the state of play for pre -deployment testing methodologies. \nLimitations of Current Pre -deployment Test Approaches \nCurrently available pre -deployment TEVV processes used for GAI applications may be inadequate, non-\nsystematically applied, or fail to reflect or mismatched to deployment contexts. For example, the \nanecdotal testing of GAI system capabilities through video games or standardized tests designed for \nhumans (e.g., intelligence tests, professional licensing exams) does not guarantee GAI system validity or \nreliability in those domains. Similarly, jailbreaking or prompt engineering tests may not systematically \nasse ss validity or reliability risks. \nMeasurement gaps can arise from mismatches between laboratory and real -world settings. Current \ntesting approaches often remain focused on laboratory conditions or restricted to benchmark test \ndatasets and in silico techniques that may not extrapolate well to —or directly assess GAI impacts in real -\nworld conditions. For example, current measurement gaps for GAI make it difficult to precisely estimate \nits potential ecosystem -level or longitudinal risks and related political, social, and economic impacts. \nGaps between benchmarks and real-world use of GAI systems may likely be exacerbated due to prompt \nsensitivity and broad heterogeneity of contexts of use. \nA.1.5. Structured Public Feedback \nStructured public feedback can be used to evaluate whether GAI systems are performing as intended and to calibrate and verify traditional measurement methods. Examples of structured feedback include, \nbut are not limited to: \n• Participatory Engagement Methods : Methods used to solicit feedback from civil society groups, \naffected communities, and users, including focus groups, small user studies, and surveys. \n• Field Testing : Methods used to determine how people interact with, consume, use, and make \nsense of AI -generated information, and subsequent actions and effects, including UX, usability, \nand other structured, randomized experiments. \n• AI Red -teaming: A structured testing exercise\n used to probe an AI system to find flaws and \nvulnerabilities such as inaccurate, harmful, or discriminatory outputs, often in a controlled \nenvironment and in collaboration with system developers. \nInformation gathered from structured public feedback can inform design, implementation, deployment \napproval , maintenance, or decommissioning decisions. Results and insights gleaned from these exercises \ncan serve multiple purposes, including improving data quality and preprocessing, bolstering governance decision making, and enhancing system documentation and debugging practices. When implementing \nfeedback activities, organizations should follow human subjects research requirements and best \npractices such as informed consent and subject compensation. ']","Measurement gaps between laboratory and real-world settings can impact the assessment of GAI systems in the context of pre-deployment testing by limiting the extrapolation of results from laboratory conditions to real-world scenarios. Current testing approaches often focus on benchmark test datasets and in silico techniques that may not accurately assess the impacts of GAI systems in real-world conditions. This can make it difficult to estimate the ecosystem-level or longitudinal risks associated with GAI deployment, as well as the political, social, and economic impacts. Additionally, the prompt sensitivity and broad heterogeneity of real-world contexts of use can exacerbate the gaps between benchmarks and actual GAI system performance.",simple,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 52}]",True
5
+ How should data collection and use-case scope limits be determined and implemented in automated systems to prevent 'mission creep'?,"[' DATA PRIVACY \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nTraditional terms of service—the block of text that the public is accustomed to clicking through when using a web -\nsite or digital app—are not an adequate mechanism for protecting privacy. The American public should be protect -\ned via built-in privacy protections, data minimization, use and collection limitations, and transparency, in addition \nto being entitled to clear mechanisms to control access to and use of their data—including their metadata—in a proactive, informed, and ongoing way. Any automated system collecting, using, sharing, or storing personal data should meet these expectations. \nProtect privacy by design and by default \nPrivacy by design and by default. Automated systems should be designed and built with privacy protect -\ned by default. Privacy risks should be assessed throughout the development life cycle, including privacy risks from reidentification, and appropriate technical and policy mitigation measures should be implemented. This includes potential harms to those who are not users of the automated system, but who may be harmed by inferred data, purposeful privacy violations, or community surveillance or other community harms. Data collection should be minimized and clearly communicated to the people whose data is collected. Data should only be collected or used for the purposes of training or testing machine learning models if such collection and use is legal and consistent with the expectations of the people whose data is collected. User experience research should be conducted to confirm that people understand what data is being collected about them and how it will be used, and that this collection matches their expectations and desires. \nData collection and use-case scope limits. Data collection should be limited in scope, with specific, \nnarrow identified goals, to avoid ""mission creep."" Anticipated data collection should be determined to be strictly necessary to the identified goals and should be minimized as much as possible. Data collected based on these identified goals and for a specific context should not be used in a different context without assessing for new privacy risks and implementing appropriate mitigation measures, which may include express consent. Clear timelines for data retention should be established, with data deleted as soon as possible in accordance with legal or policy-based limitations. Determined data retention timelines should be documented and justi\n-\nfied. \nRisk identification and mitigation. Entities that collect, use, share, or store sensitive data should attempt to proactively identify harms and seek to manage them so as to avoid, mitigate, and respond appropri\n-\nately to identified risks. Appropriate responses include determining not to process data when the privacy risks outweigh the benefits or implementing measures to mitigate acceptable risks. Appropriate responses do not include sharing or transferring the privacy risks to users via notice or consent requests where users could not reasonably be expected to understand the risks without further support. \nPrivacy-preserving security. Entities creating, using, or governing automated systems should follow privacy and security best practices designed to ensure data and metadata do not leak beyond the specific consented use case. Best practices could include using privacy-enhancing cryptography or other types of privacy-enhancing technologies or fine-grained permissions and access control mechanisms, along with conventional system security protocols. \n33']","Data collection and use-case scope limits in automated systems should be determined by setting specific, narrow goals to avoid 'mission creep.' Anticipated data collection should be strictly necessary for the identified goals and minimized as much as possible. Data collected for a specific context should not be used in a different context without assessing new privacy risks and implementing appropriate mitigation measures, which may include obtaining express consent. Clear timelines for data retention should be established, with data deleted as soon as possible in accordance with legal or policy-based limitations. The determined data retention timelines should be documented and justified.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 32}]",True
6
+ What action did the Federal Trade Commission take against Kochava regarding the sale of sensitive location tracking data?,"[' \n \n ENDNOTES\n75. See., e.g., Sam Sabin. Digital surveillance in a post-Roe world. Politico. May 5, 2022. https://\nwww.politico.com/newsletters/digital-future-daily/2022/05/05/digital-surveillance-in-a-post-roe-\nworld-00030459; Federal Trade Commission. FTC Sues Kochava for Selling Data that Tracks People atReproductive Health Clinics, Places of Worship, and Other Sensitive Locations. Aug. 29, 2022. https://\nwww.ftc.gov/news-events/news/press-releases/2022/08/ftc-sues-kochava-selling-data-tracks-people-reproductive-health-clinics-places-worship-other\n76. Todd Feathers. This Private Equity Firm Is Amassing Companies That Collect Data on America’s\nChildren. The Markup. Jan. 11, 2022.\nhttps://themarkup.org/machine-learning/2022/01/11/this-private-equity-firm-is-amassing-companies-\nthat-collect-data-on-americas-children\n77.Reed Albergotti. Every employee who leaves Apple becomes an ‘associate’: In job databases used by\nemployers to verify resume information, every former Apple employee’s title gets erased and replaced witha generic title. The Washington Post. Feb. 10, 2022.\nhttps://www.washingtonpost.com/technology/2022/02/10/apple-associate/\n78. National Institute of Standards and Technology. Privacy Framework Perspectives and Success\nStories. Accessed May 2, 2022.\nhttps://www.nist.gov/privacy-framework/getting-started-0/perspectives-and-success-stories\n79. ACLU of New York. What You Need to Know About New York’s Temporary Ban on Facial\nRecognition in Schools. Accessed May 2, 2022.\nhttps://www.nyclu.org/en/publications/what-you-need-know-about-new-yorks-temporary-ban-facial-\nrecognition-schools\n80. New York State Assembly. Amendment to Education Law. Enacted Dec. 22, 2020.\nhttps://nyassembly.gov/leg/?default_fld=&leg_video=&bn=S05140&term=2019&Summary=Y&Text=Y\n81.U.S Department of Labor. Labor-Management Reporting and Disclosure Act of 1959, As Amended.\nhttps://www.dol.gov/agencies/olms/laws/labor-management-reporting-and-disclosure-act (Section\n203). See also: U.S Department of Labor. Form LM-10. OLMS Fact Sheet, Accessed May 2, 2022. https://\nwww.dol.gov/sites/dolgov/files/OLMS/regs/compliance/LM-10_factsheet.pdf\n82. See, e.g., Apple. Protecting the User’s Privacy. Accessed May 2, 2022.\nhttps://developer.apple.com/documentation/uikit/protecting_the_user_s_privacy; Google Developers.Design for Safety: Android is secure by default and private by design . Accessed May 3, 2022.\nhttps://developer.android.com/design-for-safety\n83. Karen Hao. The coming war on the hidden algorithms that trap people in poverty . MIT Tech Review.\nDec. 4, 2020.\nhttps://www.technologyreview.com/2020/12/04/1013068/algorithms-create-a-poverty-trap-lawyers-\nfight-back/\n84. Anjana Samant, Aaron Horowitz, Kath Xu, and Sophie Beiers. Family Surveillance by Algorithm.\nACLU. Accessed May 2, 2022.\nhttps://www.aclu.org/fact-sheet/family-surveillance-algorithm\n70']","FTC sued Kochava for selling data that tracks people at reproductive health clinics, places of worship, and other sensitive locations.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 69}]",True
7
+ How should explanatory mechanisms be built into system design to ensure full behavior transparency in high-risk settings?,"["" NOTICE & \nEXPLANATION \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nTailored to the level of risk. An assessment should be done to determine the level of risk of the auto -\nmated system. In settings where the consequences are high as determined by a risk assessment, or extensive \noversight is expected (e.g., in criminal justice or some public sector settings), explanatory mechanisms should be built into the system design so that the system’s full behavior can be explained in advance (i.e., only fully transparent models should be used), rather than as an after-the-decision interpretation. In other settings, the extent of explanation provided should be tailored to the risk level. \nValid. The explanation provided by a system should accurately reflect the factors and the influences that led \nto a particular decision, and should be meaningful for the particular customization based on purpose, target, and level of risk. While approximation and simplification may be necessary for the system to succeed based on the explanatory purpose and target of the explanation, or to account for the risk of fraud or other concerns related to revealing decision-making information, such simplifications should be done in a scientifically supportable way. Where appropriate based on the explanatory system, error ranges for the explanation should be calculated and included in the explanation, with the choice of presentation of such information balanced with usability and overall interface complexity concerns. \nDemonstrate protections for notice and explanation \nReporting. Summary reporting should document the determinations made based on the above consider -\nations, including: the responsible entities for accountability purposes; the goal and use cases for the system, identified users, and impacted populations; the assessment of notice clarity and timeliness; the assessment of the explanation's validity and accessibility; the assessment of the level of risk; and the account and assessment of how explanations are tailored, including to the purpose, the recipient of the explanation, and the level of risk. Individualized profile information should be made readily available to the greatest extent possible that includes explanations for any system impacts or inferences. Reporting should be provided in a clear plain language and machine-readable manner. \n44""]","In settings where the consequences are high as determined by a risk assessment, or extensive oversight is expected (e.g., in criminal justice or some public sector settings), explanatory mechanisms should be built into the system design so that the system’s full behavior can be explained in advance (i.e., only fully transparent models should be used), rather than as an after-the-decision interpretation.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 43}]",True
8
+ What are some examples of GAI risks that organizations need to consider in the development and deployment of AI systems?,"[' \n15 GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use , in \naccordance with activities in the AI RMF Map function . CBRN Information or Capabilities ; \nObscene, Degrading, and/or \nAbusive Content ; Harmful Bias \nand Homogenization ; Dangerous, \nViolent, or Hateful Content \nGV-1.3-005 Maintain an updated hierarch y of identified and expected GAI risks connected to \ncontexts of GAI model advancement and use, potentially including specialized risk \nlevels for GAI systems that address issues such as model collapse and algorithmic \nmonoculture. Harmful Bias and Homogenization \nGV-1.3-006 Reevaluate organizational risk tolerances to account for unacceptable negative risk \n(such as where significant negative impacts are imminent, severe harms are actually occurring, or large -scale risks could occur); and broad GAI negative risks, \nincluding: Immature safety or risk cultures related to AI and GAI design, development and deployment, public information integrity risks, including impacts on democratic processes, unknown long -term performance characteristics of GAI. Information Integrity ; Dangerous , \nViolent, or Hateful Content ; CBRN \nInformation or Capabilities \nGV-1.3-007 Devise a plan to halt development or deployment of a GAI system that poses unacceptable negative risk. CBRN Information and Capability ; \nInformation Security ; Information \nIntegrity \nAI Actor Tasks: Governance and Oversight \n \nGOVERN 1.4: The risk management process and its outcomes are established through transparent policies, procedures, and other \ncontrols based on organizational risk priorities. \nAction ID Suggested Action GAI Risks \nGV-1.4-001 Establish policies and mechanisms to prevent GAI systems from generating \nCSAM, NCII or content that violates the law. Obscene, Degrading, and/or \nAbusive Content ; Harmful Bias \nand Homogenization ; \nDangerous, Violent, or Hateful Content\n \nGV-1.4-002 Establish transparent acceptable use policies for GAI that address illegal use or \napplications of GAI. CBRN Information or \nCapabilities ; Obscene, \nDegrading, and/or Abusive Content ; Data Privacy ; Civil \nRights violations\n \nAI Actor Tasks: AI Development, AI Deployment, Governance and Oversight \n ']","Organizations need to consider various GAI risks in the development and deployment of AI systems, including unacceptable use identified by stakeholder communities, harmful bias and homogenization, dangerous, violent, or hateful content, immature safety or risk cultures related to AI and GAI design, development, and deployment, public information integrity risks impacting democratic processes, unknown long-term performance characteristics of GAI, and risks related to generating illegal content or violating laws.",simple,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 18}]",True
9
+ How should the validity of explanations provided by automated systems be ensured?,"["" NOTICE & \nEXPLANATION \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nTailored to the level of risk. An assessment should be done to determine the level of risk of the auto -\nmated system. In settings where the consequences are high as determined by a risk assessment, or extensive \noversight is expected (e.g., in criminal justice or some public sector settings), explanatory mechanisms should be built into the system design so that the system’s full behavior can be explained in advance (i.e., only fully transparent models should be used), rather than as an after-the-decision interpretation. In other settings, the extent of explanation provided should be tailored to the risk level. \nValid. The explanation provided by a system should accurately reflect the factors and the influences that led \nto a particular decision, and should be meaningful for the particular customization based on purpose, target, and level of risk. While approximation and simplification may be necessary for the system to succeed based on the explanatory purpose and target of the explanation, or to account for the risk of fraud or other concerns related to revealing decision-making information, such simplifications should be done in a scientifically supportable way. Where appropriate based on the explanatory system, error ranges for the explanation should be calculated and included in the explanation, with the choice of presentation of such information balanced with usability and overall interface complexity concerns. \nDemonstrate protections for notice and explanation \nReporting. Summary reporting should document the determinations made based on the above consider -\nations, including: the responsible entities for accountability purposes; the goal and use cases for the system, identified users, and impacted populations; the assessment of notice clarity and timeliness; the assessment of the explanation's validity and accessibility; the assessment of the level of risk; and the account and assessment of how explanations are tailored, including to the purpose, the recipient of the explanation, and the level of risk. Individualized profile information should be made readily available to the greatest extent possible that includes explanations for any system impacts or inferences. Reporting should be provided in a clear plain language and machine-readable manner. \n44""]","The explanation provided by a system should accurately reflect the factors and influences that led to a particular decision, and should be meaningful for the particular customization based on purpose, target, and level of risk. While approximation and simplification may be necessary for the system to succeed based on the explanatory purpose and target of the explanation, or to account for the risk of fraud or other concerns related to revealing decision-making information, such simplifications should be done in a scientifically supportable way. Where appropriate based on the explanatory system, error ranges for the explanation should be calculated and included in the explanation, with the choice of presentation of such information balanced with usability and overall interface complexity concerns.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 43}]",True
10
+ How do generative models like LLMs generate outputs that can lead to confabulations in GAI systems?,"[' \n6 2.2. Confabulation \n“Confabulation” refers to a phenomenon in which GAI systems generate and confidently present \nerroneous or false content in response to prompts . Confabulations also include generated outputs that \ndiverge from the prompts or other input or that contradict previously generated statements in the same \ncontext. Th ese phenomena are colloquially also referred to as “hallucination s” or “fabrication s.” \nConfabulations can occur across GAI outputs and contexts .9,10 Confabulations are a natural result of the \nway generative models are designed : they generate outputs that approximate the statistical distribution \nof their training data ; for example, LLMs predict the next token or word in a sentence or phrase . While \nsuch statistical prediction can produce factual ly accurate and consistent outputs , it can also produce \noutputs that are factually inaccurat e or internally inconsistent . This dynamic is particularly relevant when \nit comes to open -ended prompts for long- form responses and in domains which require highly \ncontextual and/or domain expertise. \nRisks from confabulations may arise when users believe false content – often due to the confident nature \nof the response – leading users to act upon or promote the false information. This poses a challenge for \nmany real -world applications, such as in healthcare, where a confabulated summary of patient \ninformation reports could cause doctors to make incorrect diagnoses and/or recommend the wrong \ntreatments. Risks of confabulated content may be especially important to monitor when integrating GAI \ninto applications involving consequential decision making. \nGAI outputs may also include confabulated logic or citations that purport to justify or explain the \nsystem’s answer , which may further mislead humans into inappropriately trusting the system’s output. \nFor instance, LLMs sometimes provide logical steps for how they arrived at an answer even when the \nanswer itself is incorrect. Similarly, an LLM could falsely assert that it is human or has human traits, \npotentially deceiv ing humans into believing they are speaking with another human. \nThe extent to which humans can be deceived by LLMs, the mechanisms by which this may occur, and the \npotential risks from adversarial prompting of such behavior are emerging areas of study . Given the wide \nrange of downstream impacts of GAI, it is difficult to estimate the downstream scale and impact of \nconfabulations . \nTrustworthy AI Characteristics: Fair with Harmful Bias Managed, Safe, Valid and Reliable , Explainable \nand Interpretable \n2.3. Dangerous , Violent , or Hateful Content \nGAI systems can produce content that is inciting, radicalizing, or threatening, or that glorifi es violence , \nwith greater ease and scale than other technologies . LLMs have been reported to generate dangerous or \nviolent recommendations , and s ome models have generated actionable instructions for dangerous or \n \n \n9 Confabulations of falsehoods are most commonly a problem for text -based outputs; for audio, image, or video \ncontent, creative generation of non- factual content can be a desired behavior. \n10 For example, legal confabulations have been shown to be pervasive in current state -of-the-art LLMs. See also, \ne.g., ']","Generative models like LLMs generate outputs that can lead to confabulations in GAI systems by approximating the statistical distribution of their training data. While this statistical prediction can result in factually accurate and consistent outputs, it can also produce outputs that are factually inaccurate or internally inconsistent. This becomes particularly relevant in open-ended prompts for long-form responses and domains requiring contextual or domain expertise.",simple,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 9}]",True
11
+ How can appropriate diligence on training data use help assess intellectual property risks in AI systems?,"["" \n27 MP-4.1-0 10 Conduct appropriate diligence on training data use to assess intellectual property, \nand privacy, risks, including to examine whether use of proprietary or sensitive \ntraining data is consistent with applicable laws. Intellectual Property ; Data Privacy \nAI Actor Tasks: Governance and Oversight, Operation and Monitoring, Procurement, Third -party entities \n \nMAP 5.1: Likelihood and magnitude of each identified impact (both potentially beneficial and harmful) based on expected use, past \nuses of AI systems in similar contexts, public incident reports, feedback from those external to the team that developed or d eployed \nthe AI system, or other data are identified and documented. \nAction ID Suggested Action GAI Risks \nMP-5.1-001 Apply TEVV practices for content provenance (e.g., probing a system's synthetic \ndata generation capabilities for potential misuse or vulnerabilities . Information Integrity ; Information \nSecurity \nMP-5.1-002 Identify potential content provenance harms of GAI, such as misinformation or \ndisinformation, deepfakes, including NCII, or tampered content. Enumerate and rank risks based on their likelihood and potential impact, and determine how well provenance solutions address specific risks and/or harms. Information Integrity ; Dangerous , \nViolent, or Hateful Content ; \nObscene, Degrading, and/or Abusive Content \nMP-5.1-003 Consider d isclos ing use of GAI to end user s in relevant contexts, while considering \nthe objective of disclosure, the context of use, the likelihood and magnitude of the \nrisk posed, the audience of the disclosure, as well as the frequency of the disclosures. Human -AI Configuration \nMP-5.1-004 Prioritize GAI structured public feedback processes based on risk assessment estimates. Information Integrity ; CBRN \nInformation or Capabilities ; \nDangerous , Violent, or Hateful \nContent ; Harmful Bias and \nHomogenization \nMP-5.1-005 Conduct adversarial role -playing exercises, GAI red -teaming, or chaos testing to \nidentify anomalous or unforeseen failure modes. Information Security \nMP-5.1-0 06 Profile threats and negative impacts arising from GAI systems interacting with, \nmanipulating, or generating content, and outlining known and potential vulnerabilities and the likelihood of their occurrence. Information Security \nAI Actor Tasks: AI Deployment, AI Design, AI Development, AI Impact Assessment, Affected Individuals and Communities, End -\nUsers, Operation and Monitoring \n ""]","Appropriate diligence on training data use can help assess intellectual property risks in AI systems by examining whether the use of proprietary or sensitive training data aligns with relevant laws. This includes evaluating the likelihood and magnitude of potential impacts, both beneficial and harmful, based on past uses of AI systems in similar contexts, public incident reports, feedback from external parties, and other relevant data. By identifying and documenting these impacts, organizations can better understand the risks associated with their training data and take appropriate measures to mitigate them.",simple,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 30}]",True
12
+ How do integrated human-AI systems benefit companies in providing customer service?,"["" \n \n \n \n \n \n HUMAN ALTERNATIVES, \nCONSIDERATION, AND \nFALLBACK \nHOW THESE PRINCIPLES CAN MOVE INTO PRACTICE\nReal-life examples of how these principles can become reality, through laws, policies, and practical \ntechnical and sociotechnical approaches to protecting rights, opportunities, and access. \nHealthcare “navigators” help people find their way through online signup forms to choose \nand obtain healthcare. A Navigator is “an individual or organization that's trained and able to help \nconsumers, small businesses, and their employees as they look for health coverage options through the \nMarketplace (a government web site), including completing eligibility and enrollment forms.”106 For \nthe 2022 plan year, the Biden-Harris Administration increased funding so that grantee organizations could \n“train and certify more than 1,500 Navigators to help uninsured consumers find affordable and comprehensive \nhealth coverage. ”107\nThe customer service industry has successfully integrated automated services such as \nchat-bots and AI-driven call response systems with escalation to a human support team.\n108 Many businesses now use partially automated customer service platforms that help answer customer \nquestions and compile common problems for human agents to review. These integrated human-AI \nsystems allow companies to provide faster customer care while maintaining human agents to answer \ncalls or otherwise respond to complicated requests. Using both AI and human agents is viewed as key to \nsuccessful customer service.109\nBallot curing laws in at least 24 states require a fallback system that allows voters to \ncorrect their ballot and have it counted in the case that a voter signature matching algorithm incorrectly flags their ballot as invalid or there is another issue with their ballot, and review by an election official does not rectify the problem. Some federal courts have found that such cure procedures are constitutionally required.\n110 Ballot \ncuring processes vary among states, and include direct phone calls, emails, or mail contact by election \nofficials.111 Voters are asked to provide alternative information or a new signature to verify the validity of their \nballot. \n52""]","Integrated human-AI systems benefit companies in providing customer service by allowing for faster customer care while maintaining human agents to handle complicated requests. These systems use partially automated platforms to answer common customer questions and compile issues for human agents to review, ensuring a balance between efficiency and personalized service.",simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 51}]",True
13
+ What was the purpose of the year of public engagement that informed the development of the Blueprint for an AI Bill of Rights?,"[' \n \n \n \n \n \n \n \n \n \n \n About this Document \nThe Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was \npublished by the White House Office of Science and Technology Policy in October 2022. This framework was \nreleased one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered \nworld.” Its release follows a year of public engagement to inform this initiative. The framework is available \nonline at: https://www.whitehouse.gov/ostp/ai-bill-of-rights \nAbout the Office of Science and Technology Policy \nThe Office of Science and Technology Policy (OSTP) was established by the National Science and Technology \nPolicy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office \nof the President with advice on the scientific, engineering, and technological aspects of the economy, national \nsecurity, health, foreign relations, the environment, and the technological recovery and use of resources, among \nother topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of \nManagement and Budget (OMB) with an annual review and analysis of Federal research and development in \nbudgets, and serves as a source of scientific and technological analysis and judgment for the President with \nrespect to major policies, plans, and programs of the Federal Government. \nLegal Disclaimer \nThe Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper \npublished by the White House Office of Science and Technology Policy. It is intended to support the \ndevelopment of policies and practices that protect civil rights and promote democratic values in the building, \ndeployment, and governance of automated systems. \nThe Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It \ndoes not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or \ninternational instrument. It does not constitute binding guidance for the public or Federal agencies and \ntherefore does not require compliance with the principles described herein. It also is not determinative of what \nthe U.S. government’s position will be in any international negotiation. Adoption of these principles may not \nmeet the requirements of existing statutes, regulations, policies, or international instruments, or the \nrequirements of the Federal agencies that enforce them. These principles are not intended to, and do not, \nprohibit or limit any lawful activity of a government agency, including law enforcement, national security, or \nintelligence activities. \nThe appropriate application of the principles set forth in this white paper depends significantly on the \ncontext in which automated systems are being utilized. In some circumstances, application of these principles \nin whole or in part may not be appropriate given the intended use of automated systems to achieve government \nagency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of \nautomated systems in certain settings such as AI systems used as part of school building security or automated \nhealth diagnostic systems. \nThe Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of \nequities, for example, between the protection of sensitive law enforcement information and the principle of \nnotice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and \nother law enforcement equities. Even in contexts where these principles may not apply in whole or in part, \nfederal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as \nexisting policies and safeguards that govern automated systems, including, for example, Executive Order 13960, \nPromoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020). \nThis white paper recognizes that national security (which includes certain law enforcement and \nhomeland security activities) and defense activities are of increased sensitivity and interest to our nation’s \nadversaries and are often subject to special requirements, such as those governing classified information and \nother protected data. Such activities require alternative, compatible safeguards through existing policies that \ngovern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and \nResponsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and \nFramework. The implementation of these policies to national security and defense activities can be informed by \nthe Blueprint for an AI Bill of Rights where feasible. \nThe Blueprint for an AI Bill of Rights is not intended to, and does not, create any legal right, benefit']",The purpose of the year of public engagement that informed the development of the Blueprint for an AI Bill of Rights was to gather input and feedback from the public to shape the framework and ensure it reflects the values and concerns of the American people.,simple,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 1}]",True
14
+ How can automated systems prevent 'mission creep' while ensuring privacy and user control?,"[' DATA PRIVACY \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nTraditional terms of service—the block of text that the public is accustomed to clicking through when using a web -\nsite or digital app—are not an adequate mechanism for protecting privacy. The American public should be protect -\ned via built-in privacy protections, data minimization, use and collection limitations, and transparency, in addition \nto being entitled to clear mechanisms to control access to and use of their data—including their metadata—in a proactive, informed, and ongoing way. Any automated system collecting, using, sharing, or storing personal data should meet these expectations. \nProtect privacy by design and by default \nPrivacy by design and by default. Automated systems should be designed and built with privacy protect -\ned by default. Privacy risks should be assessed throughout the development life cycle, including privacy risks from reidentification, and appropriate technical and policy mitigation measures should be implemented. This includes potential harms to those who are not users of the automated system, but who may be harmed by inferred data, purposeful privacy violations, or community surveillance or other community harms. Data collection should be minimized and clearly communicated to the people whose data is collected. Data should only be collected or used for the purposes of training or testing machine learning models if such collection and use is legal and consistent with the expectations of the people whose data is collected. User experience research should be conducted to confirm that people understand what data is being collected about them and how it will be used, and that this collection matches their expectations and desires. \nData collection and use-case scope limits. Data collection should be limited in scope, with specific, \nnarrow identified goals, to avoid ""mission creep."" Anticipated data collection should be determined to be strictly necessary to the identified goals and should be minimized as much as possible. Data collected based on these identified goals and for a specific context should not be used in a different context without assessing for new privacy risks and implementing appropriate mitigation measures, which may include express consent. Clear timelines for data retention should be established, with data deleted as soon as possible in accordance with legal or policy-based limitations. Determined data retention timelines should be documented and justi\n-\nfied. \nRisk identification and mitigation. Entities that collect, use, share, or store sensitive data should attempt to proactively identify harms and seek to manage them so as to avoid, mitigate, and respond appropri\n-\nately to identified risks. Appropriate responses include determining not to process data when the privacy risks outweigh the benefits or implementing measures to mitigate acceptable risks. Appropriate responses do not include sharing or transferring the privacy risks to users via notice or consent requests where users could not reasonably be expected to understand the risks without further support. \nPrivacy-preserving security. Entities creating, using, or governing automated systems should follow privacy and security best practices designed to ensure data and metadata do not leak beyond the specific consented use case. Best practices could include using privacy-enhancing cryptography or other types of privacy-enhancing technologies or fine-grained permissions and access control mechanisms, along with conventional system security protocols. \n33']","Automated systems can prevent 'mission creep' and ensure privacy and user control by limiting data collection to specific, narrow goals that are strictly necessary for the identified purposes. Data collection should be minimized, clearly communicated to users, and used only for legal and expected purposes. Any use of data in a different context should be assessed for new privacy risks and appropriate mitigation measures should be implemented, potentially including obtaining express consent. Clear timelines for data retention should be established, with data deleted as soon as possible in accordance with legal or policy-based limitations. Entities should proactively identify and manage privacy risks, avoiding processing data when risks outweigh benefits and implementing measures to mitigate acceptable risks. Privacy-preserving security measures, such as privacy-enhancing cryptography and access control mechanisms, should be employed to prevent data leakage beyond consented use cases.",multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 32}]",True
15
+ "How can GAI tech improve red-teaming with human teams, ensuring content origin and incident disclosure?","[' \n51 general public participants. For example, expert AI red- teamers could modify or verify the \nprompts written by general public AI red- teamers. These approaches may also expand coverage \nof the AI risk attack surface. \n• Human / AI: Performed by GAI in combinatio n with specialist or non -specialist human teams. \nGAI- led red -teaming can be more cost effective than human red- teamers alone. Human or GAI-\nled AI red -teaming may be better suited for eliciting different types of harms. \nA.1.6. Content Provenance \nOverview \nGAI technologies can be leveraged for many applications such as content generation and synthetic data. \nSome aspects of GAI output s, such as the production of deepfake content, can challenge our ability to \ndistinguish human- generated content from AI -generated synthetic content. To help manage and mitigate \nthese risks, digital transparency mechanisms like provenance data tracking can trace the origin and \nhistory of content. Provenance data tracking and synthetic content detection can help facilitate greater \ninformation access about both authentic and synthetic content to users, enabling better knowledge of \ntrustworthiness in AI systems. When combined with other organizational accountability mechanisms, \ndigital content transparency approaches can enable processes to trace negative outcomes back to their \nsource, improve information integrity, and uphold public trust. Provenance data tracking and synthetic content detection mechanisms provide information about the origin \nand history of content to assist in \nGAI risk management efforts. \nProvenance metad ata can include information about GAI model developers or creators of GAI content , \ndate/time of creation, location, modifications, and sources. Metadata can be tracked for text, images, videos, audio, and underlying datasets. The implementation of p rovenance data tracking techniques can \nhelp assess the authenticity, integrity, intellectual property rights , and potential manipulations in digital \ncontent . Some well -known techniques for provenance data tracking include digital watermarking\n, \nmetadata recording , digital fingerprinting, and human authentication, among others . \nProvenance Data Tracking Approaches \nProvenance data tracking techniques for GAI systems can be used to track the history and origin of data \ninputs, metadata, and synthetic content. Provenance data tracking records the origin and history for \ndigital content, allowing its authenticity to be determined. It consists of techniques to record metadata \nas well as overt and covert digital watermarks on content. Data provenance refers to tracking the origin \nand history of input data through metadata and digital watermarking techniques. Provenance data tracking processes can include and assist AI Actors across the lifecycle who may not have full visibility or \ncontrol over the various trade -offs and cascading impacts of early -stage model decisions on downstream \nperformance and synthetic outputs. For example, by selecting a watermarking model to prioritize \nrobustness (the durability of a watermark) , an AI actor may inadvertently diminis h \ncomputational \ncomplexity ( the resources required to implement watermarking). Organizational risk management \nefforts for enhancing content provenance include: \n• Tracking provenance of training data and metadata for GAI systems; \n• Documenting provenance data limitations within GAI systems; ']","GAI technologies can improve red-teaming with human teams by combining GAI with specialist or non-specialist human teams. GAI-led red-teaming can be more cost-effective than human red-teamers alone and may be better suited for eliciting different types of harms. Content provenance mechanisms like provenance data tracking can trace the origin and history of content, helping to manage and mitigate risks associated with GAI output. These approaches can enable processes to trace negative outcomes back to their source, improve information integrity, and uphold public trust.",multi_context,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 54}]",True
16
+ Why is it important for lenders to inform consumers about decisions made under FCRA in automated systems?,"[' \n \n \n \n \n \n \n \n \n \n NOTICE & \nEXPLANATION \nHOW THESE PRINCIPLES CAN MOVE INTO PRACTICE\nReal-life examples of how these principles can become reality, through laws, policies, and practical \ntechnical and sociotechnical approaches to protecting rights, opportunities, and access. \nPeople in Illinois are given written notice by the private sector if their biometric informa-\ntion is used . The Biometric Information Privacy Act enacted by the state contains a number of provisions \nconcerning the use of individual biometric data and identifiers. Included among them is a provision that no private \nentity may ""collect, capture, purchase, receive through trade, or otherwise obtain"" such information about an \nindividual, unless written notice is provided to that individual or their legally appointed representative. 87\nMajor technology companies are piloting new ways to communicate with the public about \ntheir automated technologies. For example, a collection of non-profit organizations and companies have \nworked together to develop a framework that defines operational approaches to transparency for machine \nlearning systems.88 This framework, and others like it,89 inform the public about the use of these tools, going \nbeyond simple notice to include reporting elements such as safety evaluations, disparity assessments, and \nexplanations of how the systems work. \nLenders are required by federal law to notify consumers about certain decisions made about \nthem. Both the Fair Credit Reporting Act and the Equal Credit Opportunity Act require in certain circumstances \nthat consumers who are denied credit receive ""adverse action"" notices. Anyone who relies on the information in a \ncredit report to deny a consumer credit must, under the Fair Credit Reporting Act, provide an ""adverse action"" \nnotice to the consumer, which includes ""notice of the reasons a creditor took adverse action on the application \nor on an existing credit account.""90 In addition, under the risk-based pricing rule,91 lenders must either inform \nborrowers of their credit score, or else tell consumers when ""they are getting worse terms because of \ninformation in their credit report."" The CFPB has also asserted that ""[t]he law gives every applicant the right to \na specific explanation if their application for credit was denied, and that right is not diminished simply because \na company uses a complex algorithm that it doesn\'t understand.""92 Such explanations illustrate a shared value \nthat certain decisions need to be explained. \nA California law requires that warehouse employees are provided with notice and explana-\ntion about quotas, potentially facilitated by automated systems, that apply to them. Warehous-\ning employers in California that use quota systems (often facilitated by algorithmic monitoring systems) are \nrequired to provide employees with a written description of each quota that applies to the employee, including \n“quantified number of tasks to be performed or materials to be produced or handled, within the defined \ntime period, and any potential adverse employment action that could result from failure to meet the quota.”93\nAcross the federal government, agencies are conducting and supporting research on explain-\nable AI systems. The NIST is conducting fundamental research on the explainability of AI systems. A multidis-\nciplinary team of researchers aims to develop measurement methods and best practices to support the \nimplementation of core tenets of explainable AI.94 The Defense Advanced Research Projects Agency has a \nprogram on Explainable Artificial Intelligence that aims to create a suite of machine learning techniques that \nproduce more explainable models, while maintaining a high level of learning performance (prediction \naccuracy), and enable human users to understand, appropriately trust, and effectively manage the emerging \ngeneration of artificially intelligent partners.95 The National Science Foundation’s program on Fairness in \nArtificial Intelligence also includes a specific interest in research foundations for explainable AI.96\n45']","It is important for lenders to inform consumers about decisions made under FCRA in automated systems because the Fair Credit Reporting Act requires that consumers who are denied credit receive ""adverse action"" notices. These notices must include the reasons for the adverse action taken on the application or an existing credit account. Additionally, under the risk-based pricing rule, lenders must inform borrowers of their credit score or explain when they are receiving worse terms due to information in their credit report. This transparency is crucial to ensure that consumers understand the basis for credit decisions, especially when complex algorithms are involved.",multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 44}]",True
17
+ Why is public transparency important in automated systems affecting people's lives and decisions?,"[' \n \n NOTICE & \nEXPLANATION \nWHY THIS PRINCIPLE IS IMPORTANT\nThis section provides a brief summary of the problems which the principle seeks to address and protect \nagainst, including illustrative examples. \n• A predictive policing system claimed to identify individuals at greatest risk to commit or become the victim of\ngun violence (based on automated analysis of social ties to gang members, criminal histories, previous experi -\nences of gun violence, and other factors) and led to individuals being placed on a watch list with noexplanation or public transparency regarding how the system came to its \nconclusions.85 Both police and\nthe public deserve to understand why and how such a system is making these determinations.\n• A system awarding benefits changed its criteria invisibl y. Individuals were denied benefits due to data entry\nerrors and other system flaws. These flaws were only revealed when an explanation of the systemwas \ndemanded and produced.86 The lack of an explanation made it harder for errors to be corrected in a\ntimely manner.\n42', "" \n \n NOTICE & \nEXPLANATION \nWHY THIS PRINCIPLE IS IMPORTANT\nThis section provides a brief summary of the problems which the principle seeks to address and protect \nagainst, including illustrative examples. \nAutomated systems now determine opportunities, from employment to credit, and directly shape the American \npublic’s experiences, from the courtroom to online classrooms, in ways that profoundly impact people’s lives. But this expansive impact is not always visible. An applicant might not know whether a person rejected their resume or a hiring algorithm moved them to the bottom of the list. A defendant in the courtroom might not know if a judge deny\n-\ning their bail is informed by an automated system that labeled them “high risk.” From correcting errors to contesting decisions, people are often denied the knowledge they need to address the impact of automated systems on their lives. Notice and explanations also serve an important safety and efficacy purpose, allowing experts to verify the reasonable\n-\nness of a recommendation before enacting it. \nIn order to guard against potential harms, the American public needs to know if an automated system is being used. Clear, brief, and understandable notice is a prerequisite for achieving the other protections in this framework. Like\n-\nwise, the public is often unable to ascertain how or why an automated system has made a decision or contributed to a particular outcome. The decision-making processes of automated systems tend to be opaque, complex, and, therefore, unaccountable, whether by design or by omission. These factors can make explanations both more challenging and more important, and should not be used as a pretext to avoid explaining important decisions to the people impacted by those choices. In the context of automated systems, clear and valid explanations should be recognized as a baseline requirement. \nProviding notice has long been a standard practice, and in many cases is a legal requirement, when, for example, making a video recording of someone (outside of a law enforcement or national security context). In some cases, such as credit, lenders are required to provide notice and explanation to consumers. Techniques used to automate the process of explaining such systems are under active research and improvement and such explanations can take many forms. Innovative companies and researchers are rising to the challenge and creating and deploying explanatory systems that can help the public better understand decisions that impact them. \nWhile notice and explanation requirements are already in place in some sectors or situations, the American public deserve to know consistently and across sectors if an automated system is being used in a way that impacts their rights, opportunities, or access. This knowledge should provide confidence in how the public is being treated, and trust in the validity and reasonable use of automated systems. \n• A lawyer representing an older client with disabilities who had been cut off from Medicaid-funded home\nhealth-care assistance couldn't determine why\n, especially since the decision went against historical access\npractices. In a court hearing, the lawyer learned from a witness that the state in which the older client\nlived \nhad recently adopted a new algorithm to determine eligibility.83 The lack of a timely explanation made it\nharder \nto understand and contest the decision.\n•\nA formal child welfare investigation is opened against a parent based on an algorithm and without the parent\never \nbeing notified that data was being collected and used as part of an algorithmic child maltreatment\nrisk assessment.84 The lack of notice or an explanation makes it harder for those performing child\nmaltreatment assessments to validate the risk assessment and denies parents knowledge that could help them\ncontest a decision.\n41""]","Public transparency is crucial in automated systems affecting people's lives and decisions because it allows both the authorities and the public to understand why and how decisions are being made. Without transparency, individuals may be subject to decisions made by automated systems without any explanation or accountability, leading to potential errors, biases, and injustices. Transparency also enables experts to verify the reasonableness of recommendations before they are implemented, ensuring safety and efficacy. In summary, public transparency in automated systems is essential for accountability, fairness, and the protection of individuals' rights and opportunities.",multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 41}, {'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 40}]",True
18
+ How can governance principles manage risks of GAI effectively?,"[' \n47 Appendix A. Primary GAI Considerations \nThe following primary considerations were derived as overarching themes from the GAI PWG \nconsultation process. These considerations (Governance, Pre- Deployment Testing, Content Provenance, \nand Incident Disclosure) are relevant for volun tary use by any organization designing, developing, and \nusing GAI and also inform the Actions to Manage GAI risks. Information included about the primary \nconsiderations is not exhaustive , but highlights the most relevant topics derived from the GAI PWG. \nAcknowledgments: These considerations could not have been surfaced without the helpful analysis and \ncontributions from the community and NIST staff GAI PWG leads: George Awad, Luca Belli, Harold Booth, \nMat Heyman, Yoo young Lee, Mark Pryzbocki, Reva Schwartz, Martin Stanley, and Kyra Yee. \nA.1. Governance \nA.1.1. Overview \nLike any other technology system, governance principles and techniques can be used to manage risks \nrelated to generative AI models, capabilities, and applications. Organizations may choose to apply their \nexisting risk tiering to GAI systems, or they may op t to revis e or update AI system risk levels to address \nthese unique GAI risks. This section describes how organizational governance regimes may be re -\nevaluated and adjusted for GAI contexts. It also addresses third -party considerations for governing across \nthe AI value chain. \nA.1.2. Organizational Governance \nGAI opportunities, risks and long- term performance characteristics are typically less well -understood \nthan non- generative AI tools and may be perceived and acted upon by humans in ways that vary greatly. \nAccordingly, GAI may call for different levels of oversight from AI Actors or different human- AI \nconfigurations in order to manage their risks effectively. Organizations’ use of GAI systems may also \nwarrant additional human review, tracking and documentation, and greater management oversight. \nAI technology can produce varied outputs in multiple modalities and present many classes of user \ninterfaces. This leads to a broader set of AI Actors interacting with GAI systems for widely differing \napplications and contexts of use. These can include data labeling and preparation, development of GAI \nmodels, content moderation, code generation and review, text generation and editing, image and video \ngeneration, summarization, search, and chat. These activities can take place within organizational \nsettings or in the public domain. \nOrganizations can restrict AI applications that cause harm, exceed stated risk tolerances, or that conflict with their tolerances or values. Governance tools and protocols that are applied to other types of AI systems can be applied to GAI systems. These p lans and actions include: \n• Accessibility and reasonable accommodations \n• AI actor credentials and qualifications \n• Alignment to organizational values • Auditing and assessment \n• Change -management controls \n• Commercial use \n• Data provenance ']","Governance principles can be used to manage risks related to generative AI models, capabilities, and applications. Organizations may choose to apply their existing risk tiering to GAI systems or revise/update AI system risk levels to address unique GAI risks. Organizational governance regimes may need to be re-evaluated and adjusted for GAI contexts, including third-party considerations across the AI value chain. GAI may require different levels of oversight from AI actors or different human-AI configurations to manage risks effectively. Organizations using GAI systems may need additional human review, tracking, documentation, and management oversight. Governance tools and protocols applied to other AI systems can also be applied to GAI systems, including accessibility, AI actor credentials, alignment to organizational values, auditing, change-management controls, commercial use, and data provenance.",multi_context,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 50}]",True
19
+ "Why is accuracy important in reviewing and documenting data throughout the AI life cycle, considering factors like bias, IP, integrity, and GAI risks?","[' \n25 MP-2.3-002 Review and document accuracy, representativeness, relevance, suitability of data \nused at different stages of AI life cycle. Harmful Bias and Homogenization ; \nIntellectual Property \nMP-2.3-003 Deploy and document fact -checking techniques to verify the accuracy and \nveracity of information generated by GAI systems, especially when the \ninformation comes from multiple (or unknown) sources. Information Integrity \nMP-2.3-004 Develop and implement testing techniques to identify GAI produced content (e.g., synthetic media) that might be indistinguishable from human -generated content. Information Integrity \nMP-2.3-005 Implement plans for GAI systems to undergo regular adversarial testing to identify \nvulnerabilities and potential manipulation or misuse. Information Security \nAI Actor Tasks: AI Development, Domain Experts, TEVV \n \nMAP 3.4: Processes for operator and practitioner proficiency with AI system performance and trustworthiness – and relevant \ntechnical standards and certifications – are defined, assessed, and documented. \nAction ID Suggested Action GAI Risks \nMP-3.4-001 Evaluate whether GAI operators and end -users can accurately understand \ncontent lineage and origin. Human -AI Configuration ; \nInformation Integrity \nMP-3.4-002 Adapt existing training programs to include modules on digital content \ntransparency. Information Integrity \nMP-3.4-003 Develop certification programs that test proficiency in managing GAI risks and \ninterpreting content provenance, relevant to specific industry and context. Information Integrity \nMP-3.4-004 Delineate human proficiency tests from tests of GAI capabilities. Human -AI Configuration \nMP-3.4-005 Implement systems to continually monitor and track the outcomes of human- GAI \nconfigurations for future refinement and improvements . Human -AI Configuration ; \nInformation Integrity \nMP-3.4-006 Involve the end -users, practitioners, and operators in GAI system in prototyping \nand testing activities. Make sure these tests cover various scenarios , such as crisis \nsituations or ethically sensitive contexts. Human -AI Configuration ; \nInformation Integrity ; Harmful Bias \nand Homogenization ; Dangerous , \nViolent, or Hateful Content \nAI Actor Tasks: AI Design, AI Development, Domain Experts, End -Users, Human Factors, Operation and Monitoring \n ']","Accuracy is crucial in reviewing and documenting data throughout the AI life cycle to ensure the data's reliability, representativeness, relevance, and suitability at different stages. This is particularly important due to factors like harmful bias, homogenization, intellectual property concerns, information integrity, and GAI risks. Ensuring accuracy helps in verifying the information generated by GAI systems, identifying potential biases or harmful content, and maintaining the trustworthiness of AI systems.",multi_context,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 28}]",True
20
+ How can feedback be used to gather user input on AI content while aligning with values and detecting quality shifts?,"[' \n41 MG-2.2-0 06 Use feedback from internal and external AI Actors , users, individuals, and \ncommunities, to assess impact of AI -generated content. Human -AI Configuration \nMG-2.2-0 07 Use real -time auditing tools where they can be demonstrated to aid in the \ntracking and validation of the lineage and authenticity of AI -generated data. Information Integrity \nMG-2.2-0 08 Use structured feedback mechanisms to solicit and capture user input about AI -\ngenerated content to detect subtle shifts in quality or alignment with \ncommunity and societal values. Human -AI Configuration ; Harmful \nBias and Homogenization \nMG-2.2-009 Consider opportunities to responsibly use synthetic data and other privacy \nenhancing techniques in GAI development, where appropriate and applicable , \nmatch the statistical properties of real- world data without disclosing personally \nidentifiable information or contributing to homogenization . Data Privacy ; Intellectual Property; \nInformation Integrity ; \nConfabulation ; Harmful Bias and \nHomogenization \nAI Actor Tasks: AI Deployment, AI Impact Assessment, Governance and Oversight, Operation and Monitoring \n \nMANAGE 2.3: Procedures are followed to respond to and recover from a previously unknown risk when it is identified. \nAction ID Suggested Action GAI Risks \nMG-2.3-001 Develop and update GAI system incident response and recovery plans and \nprocedures to address the following: Review and maintenance of policies and procedures to account for newly encountered uses; Review and maintenance of policies and procedures for detec tion of unanticipated uses; Verify response \nand recovery plans account for the GAI system value chain; Verify response and \nrecovery plans are updated for and include necessary details to communicate with downstream GAI system Actors: Points -of-Contact (POC), Contact \ninformation, notification format. Value Chain and Component Integration \nAI Actor Tasks: AI Deployment, Operation and Monitoring \n \nMANAGE 2.4: Mechanisms are in place and applied, and responsibilities are assigned and understood, to supersede, disengage, or \ndeactivate AI systems that demonstrate performance or outcomes inconsistent with intended use. \nAction ID Suggested Action GAI Risks \nMG-2.4-001 Establish and maintain communication plans to inform AI stakeholders as part of \nthe deactivation or disengagement process of a specific GAI system (including for open -source models) or context of use, including r easons, workarounds, user \naccess removal, alternative processes, contact information, etc. Human -AI Configuration ']",Use structured feedback mechanisms to solicit and capture user input about AI-generated content to detect subtle shifts in quality or alignment with community and societal values.,multi_context,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 44}]",True
21
+ What measures are being taken to address issues for transgender travelers at airport checkpoints?,"[' WHY THIS PRINCIPLE IS IMPORTANT\nThis section provides a brief summary of the problems which the principle seeks to address and protect \nagainst, including illustrative examples. \n• An automated sentiment analyzer, a tool often used by technology platforms to determine whether a state-\nment posted online expresses a positive or negative sentiment, was found to be biased against Jews and gay\npeople. For example, the analyzer marked the statement “I’m a Jew” as representing a negative sentiment,\nwhile “I’m a Christian” was identified as expressing a positive sentiment.36 This could lead to the\npreemptive blocking of social media comments such as: “I’m gay .” A related company with this bias concern\nhas made their data public to encourage researchers to help address the issue37 \nand has released reports\nidentifying and measuring this problem as well as detailing attempts to address it.38\n• Searches for “Black girls,” “Asian girls,” or “Latina girls” return predominantly39 sexualized content, rather\nthan role models, toys, or activities.40 Some search engines have been\n working to reduce the prevalence of\nthese results, but the problem remains.41\n• Advertisement delivery systems that predict who is most likely to click on a job advertisement end up deliv-\nering ads in ways that reinforce racial and gender stereotypes, such as overwhelmingly directing supermar-\nket cashier ads to women and jobs with taxi companies to primarily Black people.42\n•Body scanners, used by TSA at airport checkpoints, require the operator to select a “male” or “female”\nscanning setting based on the passenger’s sex, but the setting is chosen based on the operator’s perception of\nthe passenger’s gender identity\n. These scanners are more likely to flag transgender travelers as requiring\nextra screening done by a person. Transgender travelers have described degrading experiences associated\nwith these extra screenings.43 TSA has recently announced plans to implement a gender-neutral algorithm44 \nwhile simultaneously enhancing the security effectiveness capabilities of the existing technology. \n•The National Disabled Law Students Association expressed concerns that individuals with disabilities were\nmore likely to be flagged as potentially suspicious by remote proctoring AI systems because of their disabili-\nty-specific access needs such as needing longer breaks or using screen readers or dictation software.45 \n•An algorithm designed to identify patients with high needs for healthcare systematically assigned lower\nscores (indicating that they were not as high need) to Black patients than to those of white patients, even\nwhen those patients had similar numbers of chronic conditions and other markers of health.46 In addition,\nhealthcare clinical algorithms that are used by physicians to guide clinical decisions may include\nsociodemographic variables that adjust or “correct” the algorithm’s output on the basis of a patient’s race or\nethnicity\n, which can lead to race-based health inequities.47\n25Algorithmic \nDiscrimination \nProtections ']",TSA has announced plans to implement a gender-neutral algorithm at airport checkpoints to address issues for transgender travelers. This algorithm aims to enhance security effectiveness capabilities while reducing the likelihood of flagging transgender travelers for extra screening based on gender identity perceptions.,multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 24}]",True
22
+ How do ballot curing laws help voters fix ballot issues despite flaws in signature matching systems?,"["" \n \n \n \n \n \n HUMAN ALTERNATIVES, \nCONSIDERATION, AND \nFALLBACK \nHOW THESE PRINCIPLES CAN MOVE INTO PRACTICE\nReal-life examples of how these principles can become reality, through laws, policies, and practical \ntechnical and sociotechnical approaches to protecting rights, opportunities, and access. \nHealthcare “navigators” help people find their way through online signup forms to choose \nand obtain healthcare. A Navigator is “an individual or organization that's trained and able to help \nconsumers, small businesses, and their employees as they look for health coverage options through the \nMarketplace (a government web site), including completing eligibility and enrollment forms.”106 For \nthe 2022 plan year, the Biden-Harris Administration increased funding so that grantee organizations could \n“train and certify more than 1,500 Navigators to help uninsured consumers find affordable and comprehensive \nhealth coverage. ”107\nThe customer service industry has successfully integrated automated services such as \nchat-bots and AI-driven call response systems with escalation to a human support team.\n108 Many businesses now use partially automated customer service platforms that help answer customer \nquestions and compile common problems for human agents to review. These integrated human-AI \nsystems allow companies to provide faster customer care while maintaining human agents to answer \ncalls or otherwise respond to complicated requests. Using both AI and human agents is viewed as key to \nsuccessful customer service.109\nBallot curing laws in at least 24 states require a fallback system that allows voters to \ncorrect their ballot and have it counted in the case that a voter signature matching algorithm incorrectly flags their ballot as invalid or there is another issue with their ballot, and review by an election official does not rectify the problem. Some federal courts have found that such cure procedures are constitutionally required.\n110 Ballot \ncuring processes vary among states, and include direct phone calls, emails, or mail contact by election \nofficials.111 Voters are asked to provide alternative information or a new signature to verify the validity of their \nballot. \n52""]","Ballot curing laws in at least 24 states provide a fallback system that allows voters to correct their ballot and have it counted in case a voter signature matching algorithm incorrectly flags their ballot as invalid or if there is another issue with their ballot that cannot be rectified by an election official review. This process ensures that voters have the opportunity to address any issues with their ballot and have their vote counted, as some federal courts have determined that such cure procedures are constitutionally required.",multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 51}]",True
23
+ How can feedback and red-teaming assess GAI equity and ensure content transparency?,"[' \n29 MS-1.1-006 Implement continuous monitoring of GAI system impacts to identify whether GAI \noutputs are equitable across various sub- populations. Seek active and direct \nfeedback from affected communities via structured feedback mechanisms or red -\nteaming to monitor and improve outputs. Harmful Bias and Homogenization \nMS-1.1-007 Evaluate the quality and integrity of data used in training and the provenance of \nAI-generated content , for example by e mploying techniques like chaos \nengineering and seeking stakeholder feedback. Information Integrity \nMS-1.1-008 Define use cases, contexts of use, capabilities, and negative impacts where \nstructured human feedback exercises, e.g., GAI red- teaming, would be most \nbeneficial for GAI risk measurement and management based on the context of \nuse. Harmful Bias and \nHomogenization ; CBRN \nInformation or Capabilities \nMS-1.1-0 09 Track and document risks or opportunities related to all GAI risks that cannot be \nmeasured quantitatively, including explanations as to why some risks cannot be \nmeasured (e.g., due to technological limitations, resource constraints, or trustworthy considerations). Include unmeasured risks in marginal risks. Information Integrity \nAI Actor Tasks: AI Development, Domain Experts, TEVV \n \nMEASURE 1.3: Internal experts who did not serve as front -line developers for the system and/or independent assessors are \ninvolved in regular assessments and updates. Domain experts, users, AI Actors external to the team that developed or deployed the \nAI system, and affected communities are consulted in support of assessments as necessary per organizational risk tolerance . \nAction ID Suggested Action GAI Risks \nMS-1.3-001 Define relevant groups of interest (e.g., demographic groups, subject matter \nexperts, experience with GAI technology) within the context of use as part of \nplans for gathering structured public feedback. Human -AI Configuration ; Harmful \nBias and Homogenization ; CBRN \nInformation or Capabilities \nMS-1.3-002 Engage in internal and external evaluations , GAI red -teaming, impact \nassessments, or other structured human feedback exercises in consultation \nwith representative AI Actors with expertise and familiarity in the context of \nuse, and/or who are representative of the populations associated with the context of use. Human -AI Configuration ; Harmful \nBias and Homogenization ; CBRN \nInformation or Capabilities \nMS-1.3-0 03 Verify those conducting structured human feedback exercises are not directly \ninvolved in system development tasks for the same GAI model. Human -AI Configuration ; Data \nPrivacy \nAI Actor Tasks: AI Deployment, AI Development, AI Impact Assessment, Affected Individuals and Communities, Domain Experts, \nEnd-Users, Operation and Monitoring, TEVV \n ']","Implement continuous monitoring of GAI system impacts to identify whether GAI outputs are equitable across various sub-populations. Seek active and direct feedback from affected communities via structured feedback mechanisms or red-teaming to monitor and improve outputs. Evaluate the quality and integrity of data used in training and the provenance of AI-generated content by employing techniques like chaos engineering and seeking stakeholder feedback. Define use cases, contexts of use, capabilities, and negative impacts where structured human feedback exercises, e.g., GAI red-teaming, would be most beneficial for GAI risk measurement and management based on the context of use. Track and document risks or opportunities related to all GAI risks that cannot be measured quantitatively, including explanations as to why some risks cannot be measured (e.g., due to technological limitations, resource constraints, or trustworthy considerations). Include unmeasured risks in marginal risks.",multi_context,"[{'source': 'AI_Risk_Management_Framework.pdf', 'page': 32}]",True
24
+ How can algorithmic discrimination be prevented through proactive measures and equity assessments?,"[' ALGORITHMIC DISCRIMINATION Protections\nYou should not face discrimination by algorithms \nand systems should be used and designed in an \nequitable way. Algorithmic discrimination occurs when \nautomated systems contribute to unjustified different treatment or \nimpacts disfavoring people based on their race, color, ethnicity, \nsex (including pregnancy, childbirth, and related medical \nconditions, gender identity, intersex status, and sexual \norientation), religion, age, national origin, disability, veteran status, \ngenetic infor-mation, or any other classification protected by law. \nDepending on the specific circumstances, such algorithmic \ndiscrimination may violate legal protections. Designers, developers, \nand deployers of automated systems should take proactive and \ncontinuous measures to protect individuals and communities \nfrom algorithmic discrimination and to use and design systems in \nan equitable way. This protection should include proactive equity \nassessments as part of the system design, use of representative data \nand protection against proxies for demographic features, ensuring \naccessibility for people with disabilities in design and development, \npre-deployment and ongoing disparity testing and mitigation, and \nclear organizational oversight. Independent evaluation and plain \nlanguage reporting in the form of an algorithmic impact assessment, \nincluding disparity testing results and mitigation information, \nshould be performed and made public whenever possible to confirm \nthese protections.\n23']","Algorithmic discrimination can be prevented through proactive measures and equity assessments by ensuring that automated systems are designed and used in an equitable manner. This includes conducting proactive equity assessments during system design, using representative data, avoiding proxies for demographic features, ensuring accessibility for individuals with disabilities, conducting pre-deployment and ongoing disparity testing, and maintaining clear organizational oversight. Independent evaluation and plain language reporting, such as algorithmic impact assessments that include testing results and mitigation information, should be performed and made public whenever possible to confirm these protections.",reasoning,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 22}]",True
25
+ How can system design ensure behavior transparency in high-risk settings while meeting expectations for automated systems?,"["" NOTICE & \nEXPLANATION \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nTailored to the level of risk. An assessment should be done to determine the level of risk of the auto -\nmated system. In settings where the consequences are high as determined by a risk assessment, or extensive \noversight is expected (e.g., in criminal justice or some public sector settings), explanatory mechanisms should be built into the system design so that the system’s full behavior can be explained in advance (i.e., only fully transparent models should be used), rather than as an after-the-decision interpretation. In other settings, the extent of explanation provided should be tailored to the risk level. \nValid. The explanation provided by a system should accurately reflect the factors and the influences that led \nto a particular decision, and should be meaningful for the particular customization based on purpose, target, and level of risk. While approximation and simplification may be necessary for the system to succeed based on the explanatory purpose and target of the explanation, or to account for the risk of fraud or other concerns related to revealing decision-making information, such simplifications should be done in a scientifically supportable way. Where appropriate based on the explanatory system, error ranges for the explanation should be calculated and included in the explanation, with the choice of presentation of such information balanced with usability and overall interface complexity concerns. \nDemonstrate protections for notice and explanation \nReporting. Summary reporting should document the determinations made based on the above consider -\nations, including: the responsible entities for accountability purposes; the goal and use cases for the system, identified users, and impacted populations; the assessment of notice clarity and timeliness; the assessment of the explanation's validity and accessibility; the assessment of the level of risk; and the account and assessment of how explanations are tailored, including to the purpose, the recipient of the explanation, and the level of risk. Individualized profile information should be made readily available to the greatest extent possible that includes explanations for any system impacts or inferences. Reporting should be provided in a clear plain language and machine-readable manner. \n44"", "" NOTICE & \nEXPLANATION \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nAn automated system should provide demonstrably clear, timely, understandable, and accessible notice of use, and \nexplanations as to how and why a decision was made or an action was taken by the system. These expectations are explained below. \nProvide clear, timely, understandable, and accessible notice of use and explanations \nGenerally accessible plain language documentation. The entity responsible for using the automated \nsystem should ensure that documentation describing the overall system (including any human components) is \npublic and easy to find. The documentation should describe, in plain language, how the system works and how \nany automated component is used to determine an action or decision. It should also include expectations about \nreporting described throughout this framework, such as the algorithmic impact assessments described as \npart of Algorithmic Discrimination Protections. \nAccount able. Notices should clearly identify the entity r esponsible for designing each component of the \nsystem and the entity using it. \nTimely and up-to-date. Users should receive notice of the use of automated systems in advance of using or \nwhile being impacted by the technolog y. An explanation should be available with the decision itself, or soon \nthereafte r. Notice should be kept up-to-date and people impacted by the system should be notified of use case \nor key functionality changes. \nBrief and clear. Notices and explanations should be assessed, such as by research on users’ experiences, \nincluding user testing, to ensure that the people using or impacted by the automated system are able to easily \nfind notices and explanations, read them quickl y, and understand and act on them. This includes ensuring that \nnotices and explanations are accessible to users with disabilities and are available in the language(s) and read-\ning level appropriate for the audience. Notices and explanations may need to be available in multiple forms, \n(e.g., on pape r, on a physical sign, or online), in order to meet these expectations and to be accessible to the \nAmerican public. \nProvide explanations as to how and why a decision was made or an action was taken by an \nautomated system \nTailored to the purpose. Explanations should be tailored to the specific purpose for which the user is \nexpected to use the explanation, and should clearly state that purpose. An informational explanation might differ from an explanation provided to allow for the possibility of recourse, an appeal, or one provided in the context of a dispute or contestation process. For the purposes of this framework, 'explanation' should be construed broadly. An explanation need not be a plain-language statement about causality but could consist of any mechanism that allows the recipient to build the necessary understanding and intuitions to achieve the stated purpose. Tailoring should be assessed (e.g., via user experience research). \nTailored to the target of the explanation. Explanations should be targeted to specific audiences and clearly state that audience. An explanation provided to the subject of a decision might differ from one provided to an advocate, or to a domain expert or decision maker. Tailoring should be assessed (e.g., via user experience research). \n43""]","In settings where the consequences are high as determined by a risk assessment, or extensive oversight is expected (e.g., in criminal justice or some public sector settings), explanatory mechanisms should be built into the system design so that the system’s full behavior can be explained in advance (i.e., only fully transparent models should be used), rather than as an after-the-decision interpretation. The extent of explanation provided should be tailored to the risk level.",multi_context,"[{'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 43}, {'source': 'Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 42}]",True