NILE: Internal Consistency Alignment in Large Language Models
Abstract
As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.
Community
Instruction fine-tuning has been proven to be a crucial method for enhancing the capabilities of LLMs. But how does Instruction fine-tuning differ from traditional fine-tuning in deep learning? And can this distinction help make instruction fine-tuning more effective? Some studies suggest that fine-tuning LLMs should not focus on acquiring new knowledge for pretrained LLMs but rather on understanding tasks. It emphasizes the importance of maintaining consistency with the internal knowledge of LLMs during fine-tuning. This approach has emerged as a promising strategy for optimizing instruction fine-tuning (IFT) datasets to further unlock the potential of LLMs. Inspired by these findings, we propose a novel framework called NILE (INTERNAL CONSISTENCY ALIGNMENT), which generates and selects better IFT datasets by considering the consistency between the internal parameter knowledge of LLMs and the world knowledge in IFT datasets. NILE works by eliciting the target pre-trained LLM's internal knowledge corresponding to instruction data. This internal knowledge is then used to revise the answers in the IFT datasets. Our experiments demonstrate that NILE-aligned IFT datasets significantly enhance LLM performance across multiple LLM evaluation benchmarks, achieving up to a 66.6% improvement on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE framework contributes to these remarkable performance gains, providing compelling evidence that ensuring dataset consistency with the internal knowledge of pre-trained LLMs is pivotal for maximizing their potential.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Constraint Back-translation Improves Complex Instruction Following of Large Language Models (2024)
- MDCure: A Scalable Pipeline for Multi-Document Instruction-Following (2024)
- Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models (2024)
- Unveiling Context-Aware Criteria in Self-Assessing LLMs (2024)
- From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge (2024)
- Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models (2024)
- LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper