--- library_name: llmpromptkit title: LLMPromptKit emoji: 🚀 tags: - prompt-engineering - llm - nlp - prompt-management - huggingface - version-control - ab-testing - evaluation languages: - python license: mit pipeline_tag: text-generation datasets: - none --- # LLMPromptKit: LLM Prompt Management System LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts. ## Features - **Prompt Management**: Create, update, and organize prompts with metadata and tags - **Version Control**: Track prompt changes over time with full version history - **A/B Testing**: Compare different prompt variations to find the most effective one - **Evaluation Framework**: Measure prompt quality with customizable metrics - **Advanced Templating**: Create dynamic prompts with variables, conditionals, and loops - **Command-line Interface**: Easily integrate into your workflow - **Hugging Face Integration**: Seamlessly test prompts with thousands of open-source models ## Hugging Face Integration LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to: - Test prompts with thousands of open-source models - Run evaluations with models like FLAN-T5, GPT-2, and others - Compare prompt performance across different model architectures - Access specialized models for tasks like translation, summarization, and question answering ```python from llmpromptkit import PromptManager, PromptTesting from llmpromptkit.integrations.huggingface import get_huggingface_callback # Initialize components prompt_manager = PromptManager() testing = PromptTesting(prompt_manager) # Get a HuggingFace callback hf_callback = get_huggingface_callback( model_name="google/flan-t5-base", task="text2text-generation" ) # Run tests with the model test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback) ``` ## Documentation For detailed documentation, see the [docs](./docs) directory: - [Getting Started](./docs/getting_started.md) - [API Reference](./docs/api_reference.md) - [CLI Usage](./docs/cli_usage.md) - [Advanced Features](./docs/advanced_features.md) - [Integration Examples](./docs/integration_examples.md) - [Integration Examples](./docs/integration_examples.md) ## Installation ```bash pip install llmpromptkit Quick Start from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator # Initialize components prompt_manager = PromptManager() version_control = VersionControl(prompt_manager) testing = PromptTesting(prompt_manager) evaluator = Evaluator(prompt_manager) # Create a prompt prompt = prompt_manager.create( content="Summarize the following text: {text}", name="Simple Summarization", description="A simple prompt for text summarization", tags=["summarization", "basic"] ) # Create a new version version_control.commit( prompt_id=prompt.id, commit_message="Initial version" ) # Update the prompt prompt_manager.update( prompt.id, content="Please provide a concise summary of the following text in 2-3 sentences: {text}" ) # Commit the updated version version_control.commit( prompt_id=prompt.id, commit_message="Improved prompt with length guidance" ) # Create a test case test_case = testing.create_test_case( prompt_id=prompt.id, input_vars={"text": "Lorem ipsum dolor sit amet..."}, expected_output="This is a summary of the text." ) # Define an LLM callback for testing async def llm_callback(prompt, vars): # In a real scenario, this would call an actual LLM API return "This is a summary of the text." # Run the test case import asyncio test_result = asyncio.run(testing.run_test_case( test_case_id=test_case.id, llm_callback=llm_callback )) # Evaluate a prompt with multiple inputs evaluation_result = asyncio.run(evaluator.evaluate_prompt( prompt_id=prompt.id, inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}], llm_callback=llm_callback )) print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}") Command-line Interface LLMPromptKit comes with a powerful CLI for managing prompts: # Create a prompt llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic" # List all prompts llmpromptkit prompt list # Create a new version llmpromptkit version commit --message "Updated prompt" # Run tests llmpromptkit test run-all --llm openai Advanced Usage Advanced Templating LLMPromptKit supports advanced templating with conditionals and loops: from llmpromptkit import PromptTemplate template = PromptTemplate(""" {system_message} {for example in examples} Input: {example.input} Output: {example.output} {endfor} Input: {input} Output: """) rendered = template.render( system_message="You are a helpful assistant.", examples=[ {"input": "Hello", "output": "Hi there!"}, {"input": "How are you?", "output": "I'm doing well, thanks!"} ], input="What's the weather like?" ) Custom Evaluation Metrics Create custom metrics to evaluate prompt performance: from llmpromptkit import EvaluationMetric, Evaluator class CustomMetric(EvaluationMetric): def __init__(self): super().__init__("custom_metric", "My custom evaluation metric") def compute(self, generated_output, expected_output=None, **kwargs): # Custom logic to score the output return score # A float between 0 and 1 # Register the custom metric evaluator = Evaluator(prompt_manager) evaluator.register_metric(CustomMetric()) Use Cases Prompt Development: Iteratively develop and refine prompts with version control Prompt Optimization: A/B test different prompt variations to find the most effective approach Quality Assurance: Ensure prompt quality with automated testing and evaluation Team Collaboration: Share and collaborate on prompts with a centralized management system Production Deployment: Maintain consistent prompt quality in production applications License MIT License ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Author Biswanath Roul - [GitHub](https://github.com/biswanathroul)