Safetensors
qwen2
Guanyu419 commited on
Commit
0133e65
·
verified ·
1 Parent(s): 9c9bfe3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -84
README.md CHANGED
@@ -7,85 +7,12 @@ base_model: Qwen/Qwen2-7B-Instruct
7
  # Hammer-7b Function Calling Model
8
 
9
  ## Introduction
10
- Function calling enables LLMs to invoke specific functions, integrating external features, accessing real-world data, and extending beyond text generation. We present Hammer, a finetuned model based on Qwen2-7B-Instruct. Unlike previous works emphasizing on data refinement (cite xlam, IBM…), our focus is on applying novel training techniques to address recognized issues in existing function-calling models. Such issues are listed below:
11
 
12
- 1. Hallucination
13
-
14
- - a) Function name hallucination: The model, rather than selecting from the provided function pool, has a tendency to generate a new function based on its own world knowledge.
15
-
16
- - b) Parameter name hallucination: When the user fails to provide sufficient information to fulfill their request (lacking necessary parameters), the model is inclined to fill in the parameters relying on its own knowledge.
17
-
18
-
19
- 2. Overfitting
20
-
21
- - a) Function name and parameter name: The model pays excessive attention to the function name and parameter name while neglecting other information such as description, input, and output. This leads to a lack of generalization and reduces the model's ability to handle diverse scenarios.
22
-
23
- - b) Parameter filling: The model does not extract parameters based on the provided function definition. Instead, it fills in the parameters based on the learned knowledge from training. For instance, when expecting "San Francisco", "San Francisco, CA" might be filled in because in the training data, all "San Francisco"s are followed by "CA"s.
24
-
25
- - c) Default value filling: The model fills in parameter default values according to patterns in the training data rather than the provided function definition. For example, when "default = inch" is most common in the training data, the model is likely to fill in "inch" instead of "cm", even though the latter is the provided default value in the function definition.
26
-
27
- - d) Ordering of provided function list and parameter list: When the provided function list or parameter list have consistent orderings during training, it is possible that the model learns patterns that are not intended, such as remembering the orderings.
28
-
29
-
30
- 3. Instructions missing key information
31
- Occasionally, user instructions may lack essential details vital for effective function execution. For instance, the command "Set an alarm to wake me up”, lacks a time specification. Ideally, in such instances, the model should either request additional information or merely output the function name, excluding the unspecified parameter. Existing methods either disregard such situations or output an “irrelevant” signal, indicating the query is unfulfillable with the given tools.
32
-
33
-
34
- 4. Prompt design
35
- Inconsistency in instruction formatting between training and testing can result in a significant performance gap. For example, during the training phase, the default value is provided in the parameter description, while during testing, the default value is provided as a separate parameter in JSON format.
36
-
37
-
38
- In this work, we focus on introducing function calling abilities with an inherent emphasis on addressing the aforementioned limitations. We summarize our techniques as follows:
39
-
40
- 1.Masking: We propose function/parameter mask technique, a dynamic data augmentation method. This approach enhances the model's focus on tool descriptions rather than the tool names within tool definitions. The masking operations include:
41
- - a) Function Name Masking: Replacing the function name with a randomly generated string to ensure the model pays more attention to the function description rather than function names.
42
- - b) Parameter Name Masking: Replacing the parameter name with a randomly generated string to ensure the model pays more attention to the parameter description rather than parameter names.
43
- - c) Default Value Masking: Default values are replaced with random strings to prevent overfitting to specific values.
44
-
45
- 2.Function Shuffling: Random reordering of functions and parameters during training deters the model from memorizing their sequence.
46
-
47
- 3.Prompt Optimization: As our model concentrates on function/parameter descriptions, we incorporate default value information into those descriptions to boost performance at inference.
48
-
49
- Addressing these multifaceted issues necessitates a refined and sophisticated approach to model training and optimization. To this end, we have meticulously developed an advanced function calling model through the fine-tuning of the *Qwen2-7B-instruct*. The ensuing sections provide a comprehensive overview of the methods and processes implemented during the training phase to mitigate these issues effectively:
50
-
51
- 1. **Data Extraction and Preparation**:
52
- We extracted 7.5k sample data from *Salesforce/xlam-function-calling-60k* and removed the target tools from the candidate toolset to generate irrelevant data samples. This data was mixed with 60k XLAM data samples for training.
53
-
54
- 2. **Fine Tuning**:
55
- Our fine-tuning process primarily leveraged the Low-Rank Adaptation (LoRA) technique, incorporating specific hyperparameters and strategies to ensure optimal model performance.
56
- Masking And Function Shuffling Technique was used during the training process;The Training Setup is as follows:
57
- - **LoRA Rank**: 32
58
- - **Learning Rate**: 5e-5
59
- - **Warmup Steps**: 100
60
- - **LR Scheduler Type**: Cosine
61
- - **Batch Size**: 4
62
- - **Gradient Accumulation Steps**: 2
63
- - **Hardware**: 4x A100 (80G) GPUs
64
-
65
-
66
- 3. **Inference**:
67
- During inference, since our model focuses more on function/parameter descriptions, we added default value information in parameter descriptions to obtain better performance.
68
-
69
- ## Supported Function Calling Types
70
-
71
- The model is capable of handling various function calling scenarios. Here, the supported types are classified based on the nature of inputs and outputs:
72
-
73
- ### Input Types
74
- - 1. **Single Function Input**
75
- - 2. **Multiple Functions Input**
76
-
77
- ### Output Types
78
-
79
- - 1. **Simple Function Calling**
80
- - 2. **Parallel Function Calling**
81
- - 3. **Irrelevance**
82
- - 4. **Relevance**
83
-
84
- By categorizing function calling types based on inputs and outputs, our model provides robust support for a wide range of function calling scenarios, ensuring both flexibility and precision in handling diverse tasks.
85
-
86
-
87
- ## Performance
88
 
 
89
  1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
90
  <style type="text/css">
91
  .tg {border-collapse:collapse;border-spacing:0;}
@@ -527,13 +454,10 @@ The table below replicates and extends the format found in ["Granite-Function Ca
527
  </tr>
528
  </tbody></table>
529
 
530
- ## Upcoming Developments
 
531
 
532
- We are actively working on preparing smaller models derived from this architecture, which will be open-sourced soon.
533
-
534
-
535
-
536
- ## Example Usage
537
  This is a simple example of how to use our model.
538
  ~~~python
539
  import json
@@ -659,7 +583,6 @@ print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
659
 
660
 
661
 
662
- ---
663
  ## References
664
  - 1.Yan F, Mao H, Ji C C-J, et al. Berkeley Function Calling Leaderboard.
665
 
 
7
  # Hammer-7b Function Calling Model
8
 
9
  ## Introduction
10
+ Hammer-7b is a cutting-edge Large Language Model (LLM) crafted to boost the critical capability of AI agents: function calling. Differing from existing models focusing on traning data refinement, Hammer-7b optimizes performance primarily through advanced training techniques.
11
 
12
+ ## Model Details
13
+ Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ ## Evaluation
16
  1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
17
  <style type="text/css">
18
  .tg {border-collapse:collapse;border-spacing:0;}
 
454
  </tr>
455
  </tbody></table>
456
 
457
+ ## Requiements
458
+ The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
459
 
460
+ ## How to Use
 
 
 
 
461
  This is a simple example of how to use our model.
462
  ~~~python
463
  import json
 
583
 
584
 
585
 
 
586
  ## References
587
  - 1.Yan F, Mao H, Ji C C-J, et al. Berkeley Function Calling Leaderboard.
588