| from langchain_core.output_parsers import JsonOutputParser | |
| from langchain_core.prompts import PromptTemplate | |
| chain_of_density_prompt_template = """ | |
| Research Paper: {paper} | |
| You will generate increasingly concise, entity-dense summaries of the above research paper. | |
| Repeat the following 2 steps 10 times. | |
| Step 1. Identify 1-3 informative Entities ('; ' delimited) from the research paper that are missing from the previously generated summary. These entities should be key components such as research questions, methodologies, findings, theoretical contributions, or implications. | |
| Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities. | |
| A Missing Entity is: | |
| - Relevant: critical to understanding the paper’s contribution. | |
| - Specific: descriptive yet concise (5 words or fewer). | |
| - Novel: not included in the previous summary. | |
| - Faithful: accurately represented in the research paper. | |
| - Anywhere: can be found anywhere in the research paper. | |
| Guidelines: | |
| - The first summary should be long (4-5 sentences, ~100 words) yet focus on general information about the research paper, including its broad topic and objectives, without going into detail. | |
| - Avoid using verbose language and fillers (e.g., 'This research paper discusses') to reach the word count. | |
| - Strive for efficiency in word use: rewrite the previous summary to improve readability and make space for additional entities. | |
| - Employ strategies such as fusion (combining entities), compression (shortening descriptions), and removal of uninformative phrases to make space for new entities. | |
| - The summaries should evolve to be highly dense and concise yet remain self-contained, meaning they can be understood without reading the full paper. | |
| - Missing entities should be integrated seamlessly into the new summary. | |
| - Never omit entities from previous summaries. If space is a challenge, incorporate fewer new entities but maintain the same word count. | |
| Remember, use the exact same number of words for each summary. | |
| The JSON output should be a list (length 10) of dictionaries. Each dictionary must have two keys: 'missing_entities', listing the 1-3 entities added in each round; and 'denser_summary', presenting the new summary that integrates these entities without increasing the length. | |
| """ | |
| chain_of_density_output_parser = JsonOutputParser() | |
| chain_of_density_prompt = PromptTemplate( | |
| template=chain_of_density_prompt_template, | |
| input_variables=["paper"], | |
| ) | |
| chain_of_density_chain = ( | |
| lambda model: chain_of_density_prompt | model | chain_of_density_output_parser | |
| ) | |