GPT-2 Hacker password generator.

This model can generate hacker passwords.

Fine-tuning results

Number of epochs: 5

Number of steps: 3125

Loss: 0.519600

Fine-tuning time: almost 34:39 on Nvidia Geforce RTX 4060 8 GB GPU (laptop)

Fine-tuned on 20k examples of 128 tokens.

Using the model

Use this code:

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model_name = "CodeferSystem/GPT2-Hacker-password-generator"

# Load the pre-trained GPT-2 model and tokenizer from the specified directory
tokenizer = GPT2Tokenizer.from_pretrained(model_name)  # Load standard GPT-2 tokenizer
model = GPT2LMHeadModel.from_pretrained(model_name)  # Load fine-tuned GPT-2 model

# Function to generate an answer based on a given question
def generate_answer(question):
    # Create a prompt by formatting the question for the model
    prompt = f"Question: {question}\nAnswer:"
    
    # Encode the prompt into input token IDs suitable for the model
    input_ids = tokenizer.encode(prompt, return_tensors="pt")

    # Set the model to evaluation mode
    model.eval()

    # Generate the output without calculating gradients (for efficiency)
    with torch.no_grad():
        output = model.generate(
            input_ids,                        # Provide the input tokens
            max_length=50,                     # Set the maximum length of the generated text
            num_return_sequences=1,           # Only return one sequence of text
            no_repeat_ngram_size=2,           # Prevent repeating n-grams (sequences of n words)
            do_sample=True,                   # Enable sampling (randomized generation)
            top_k=50,                          # Limit the model's choices to the top 50 probable words
            top_p=0.95,                        # Use nucleus sampling (the cumulative probability distribution)
            temperature=2.0,                   # Control the randomness/creativity of the output
            pad_token_id=tokenizer.eos_token_id  # Specify the padding token ID (EOS token in this case)
        )

    # Decode the generated token IDs back to a string and strip any special tokens
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    
    # Extract the part after "Answer:" to get the model's generated answer
    answer = generated_text.split("Answer:")[-1].strip()
    
    return answer

# Example usage
question = "generate password."
print(generate_answer(question))  # Print the generated password

Example passwords generation with this model:

If you write a prompt like "Generate a hacker password." - the password will be something like this (5 examples):

0Qk=4CdPQQv0>n1K
o4K*mQq9>Zu
e5vx=KqE_j>kFj&*
xD2PZ5@kz_hFq|W=
h=rZ?^<Qp~7&z7XZ

Fine-tuned data

The dataset on which the model was fine-tuned was uploaded to the public. CodeferSystem/GPT2-Hacker-password-generator-dataset

CodeferSystem
/

GPT2-Hacker-password-generator