MadeAgents
/

Hammer-7b

Safetensors

qwen2

Model card Files Files and versions Community

Guanyu419 commited on Sep 6, 2024

Commit

0133e65

verified ·

1 Parent(s): 9c9bfe3

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -84

README.md CHANGED Viewed

@@ -7,85 +7,12 @@ base_model: Qwen/Qwen2-7B-Instruct
 # Hammer-7b Function Calling Model
 ## Introduction
-Function calling enables LLMs to invoke specific functions, integrating external features, accessing real-world data, and extending beyond text generation. We present Hammer, a finetuned model based on Qwen2-7B-Instruct. Unlike previous works emphasizing on data refinement (cite xlam, IBM…), our focus is on applying novel training techniques to address recognized issues in existing function-calling models. Such issues are listed below:
-1. Hallucination
-- a)  Function name hallucination: The model, rather than selecting from the provided function pool, has a tendency to generate a new function based on its own world knowledge.
-- b)  Parameter name hallucination: When the user fails to provide sufficient information to fulfill their request (lacking necessary parameters), the model is inclined to fill in the parameters relying on its own knowledge.
-2. Overfitting
-- a)  Function name and parameter name: The model pays excessive attention to the function name and parameter name while neglecting other information such as description, input, and output. This leads to a lack of generalization and reduces the model's ability to handle diverse scenarios.
-- b)  Parameter filling: The model does not extract parameters based on the provided function definition. Instead, it fills in the parameters based on the learned knowledge from training. For instance, when expecting "San Francisco", "San Francisco, CA" might be filled in because in the training data, all "San Francisco"s are followed by "CA"s.
-- c)  Default value filling: The model fills in parameter default values according to patterns in the training data rather than the provided function definition. For example, when "default = inch" is most common in the training data, the model is likely to fill in "inch" instead of "cm", even though the latter is the provided default value in the function definition.
-- d)  Ordering of provided function list and parameter list: When the provided function list or parameter list have consistent orderings during training, it is possible that the model learns patterns that are not intended, such as remembering the orderings.
-3. Instructions missing key information
-Occasionally, user instructions may lack essential details vital for effective function execution. For instance, the command "Set an alarm to wake me up”, lacks a time specification. Ideally, in such instances, the model should either request additional information or merely output the function name, excluding the unspecified parameter.  Existing methods either disregard such situations or output an “irrelevant” signal, indicating the query is unfulfillable with the given tools.
-4. Prompt design
-Inconsistency in instruction formatting between training and testing can result in a significant performance gap. For example, during the training phase, the default value is provided in the parameter description, while during testing, the default value is provided as a separate parameter in JSON format.
-In this work, we focus on introducing function calling abilities with an inherent emphasis on addressing the aforementioned limitations. We summarize our techniques as follows:
-1.Masking: We propose function/parameter mask technique, a dynamic data augmentation method. This approach enhances the model's focus on tool descriptions rather than the tool names within tool definitions. The masking operations include:
-- a)  Function Name Masking: Replacing the function name with a randomly generated string to ensure the model pays more attention to the function description rather than function names.
-- b)  Parameter Name Masking: Replacing the parameter name with a randomly generated string to ensure the model pays more attention to the parameter description rather than parameter names.
-- c)  Default Value Masking: Default values are replaced with random strings to prevent overfitting to specific values.
-2.Function Shuffling: Random reordering of functions and parameters during training deters the model from memorizing their sequence.
-3.Prompt Optimization: As our model concentrates on function/parameter descriptions, we incorporate default value information into those descriptions to boost performance at inference.
-Addressing these multifaceted issues necessitates a refined and sophisticated approach to model training and optimization. To this end, we have meticulously developed an advanced function calling model through the fine-tuning of the *Qwen2-7B-instruct*. The ensuing sections provide a comprehensive overview of the methods and processes implemented during the training phase to mitigate these issues effectively:
-1. **Data Extraction and Preparation**:
-    We extracted 7.5k sample data from *Salesforce/xlam-function-calling-60k* and removed the target tools from the candidate toolset to generate irrelevant data samples. This data was mixed with 60k XLAM data samples for training.
-2. **Fine Tuning**:
-    Our fine-tuning process primarily leveraged the Low-Rank Adaptation (LoRA) technique, incorporating specific hyperparameters and strategies to ensure optimal model performance.
-   Masking And Function Shuffling Technique was used during the training process;The Training Setup is as follows:
-  - **LoRA Rank**: 32
-  - **Learning Rate**: 5e-5
-  - **Warmup Steps**: 100
-  - **LR Scheduler Type**: Cosine
-  - **Batch Size**: 4
-  - **Gradient Accumulation Steps**: 2
-  - **Hardware**: 4x A100 (80G) GPUs
-3. **Inference**:
-   During inference, since our model focuses more on function/parameter descriptions, we added default value information in parameter descriptions to obtain better performance.
-## Supported Function Calling Types
-The model is capable of handling various function calling scenarios. Here, the supported types are classified based on the nature of inputs and outputs:
-### Input Types
-- 1. **Single Function Input**
-- 2. **Multiple Functions Input**
-### Output Types
-- 1. **Simple Function Calling**
-- 2. **Parallel Function Calling**
-- 3. **Irrelevance**
-- 4. **Relevance**
-By categorizing function calling types based on inputs and outputs, our model provides robust support for a wide range of function calling scenarios, ensuring both flexibility and precision in handling diverse tasks.
-## Performance
 1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows：
 <style type="text/css">
 .tg  {border-collapse:collapse;border-spacing:0;}
@@ -527,13 +454,10 @@ The table below replicates and extends the format found in ["Granite-Function Ca
   </tr>
 </tbody></table>
-## Upcoming Developments
-We are actively working on preparing smaller models derived from this architecture, which will be open-sourced soon.
-## Example Usage
 This is a simple example of how to use our model.
 ~~~python
 import json
@@ -659,7 +583,6 @@ print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
----
 ## References
 - 1.Yan F, Mao H, Ji C C-J, et al. Berkeley Function Calling Leaderboard.

 # Hammer-7b Function Calling Model
 ## Introduction
+Hammer-7b is a cutting-edge Large Language Model (LLM) crafted to boost the critical capability of AI agents: function calling. Differing from existing models focusing on traning data refinement, Hammer-7b optimizes performance primarily through advanced training techniques.
+## Model Details
+Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
+## Evaluation
 1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows：
 <style type="text/css">
 .tg  {border-collapse:collapse;border-spacing:0;}
   </tr>
 </tbody></table>
+## Requiements
+The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
+## How to Use
 This is a simple example of how to use our model.
 ~~~python
 import json
 ## References
 - 1.Yan F, Mao H, Ji C C-J, et al. Berkeley Function Calling Leaderboard.