File size: 3,500 Bytes
08e7ccb
 
 
 
 
 
 
 
46dc72c
08e7ccb
 
 
 
 
 
 
6d4f724
 
08e7ccb
6d4f724
 
08e7ccb
6d4f724
 
08e7ccb
6d4f724
08e7ccb
6d4f724
 
 
08e7ccb
6d4f724
 
 
08e7ccb
6d4f724
08e7ccb
6d4f724
 
08e7ccb
6d4f724
 
 
 
 
 
 
 
 
08e7ccb
6d4f724
08e7ccb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6d4f724
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08e7ccb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: mit
base_model: gpt2-medium
tags:
- generated_from_trainer
model-index:
- name: gpt2-medium-finetuned-contract-gen
  results: []
pipeline_tag: text-generation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt2-medium-finetuned-contract-gen

## Overview
`gpt2-medium-finetuned-contract-gen` is a model specialized in generating Solidity contract codes. Derived from the [gpt2-medium](https://huggingface.co/gpt2-medium) model by Hugging Face, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures.

## Model Description
This model has been designed specifically for generating Solidity contracts. Being a derivative of the `gpt2-medium` model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts.

### Performance
The model reported a loss of `0.3127` on the evaluation set.

## Intended Uses & Limitations

### Intended Uses:
1. Assist developers by auto-generating contract code snippets based on prompts.
2. Help in understanding and drafting complex contract structures.

### Limitations:
1. The generated code must be reviewed for security and functional correctness.
2. The clarity of the generated code largely depends on the specificity of the prompt.

## Training Details

### Dataset
The model was fine-tuned on an undisclosed dataset comprised of a range of Solidity contracts.

### Training Hyperparameters:
- Learning Rate: `5e-05`
- Train Batch Size: `4`
- Evaluation Batch Size: `4`
- Seed: `42`
- Optimizer: Adam (`betas=(0.9,0.999)`, `epsilon=1e-08`)
- Learning Rate Scheduler: Cosine with restarts
- Warmup Steps: `241`
- Epochs: `4`

### Training Results:


| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.4744        | 0.21  | 1000  | 0.4736          |
| 0.467         | 0.41  | 2000  | 0.4146          |
| 0.4089        | 0.62  | 3000  | 0.3852          |
| 0.4018        | 0.83  | 4000  | 0.3688          |
| 0.3475        | 1.04  | 5000  | 0.3523          |
| 0.2751        | 1.24  | 6000  | 0.3434          |
| 0.2966        | 1.45  | 7000  | 0.3334          |
| 0.292         | 1.66  | 8000  | 0.3230          |
| 0.2899        | 1.87  | 9000  | 0.3200          |
| 0.2508        | 2.07  | 10000 | 0.3164          |
| 0.28          | 2.28  | 11000 | 0.3127          |


### Dependencies:
- Transformers: `4.31.0`
- Pytorch: `2.0.1+cu118`
- Datasets: `2.14.2`
- Tokenizers: `0.13.3`

## How to Use
If you wish to use this model to generate Solidity contract code, follow the steps below:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ckandemir/gpt2-medium-finetuned-contract-gen")
model = AutoModelForCausalLM.from_pretrained("ckandemir/gpt2-medium-finetuned-contract-gen")

# Input your code prompt
input_text = "contract MyToken"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
print(generated_text)