AI & ML interests

code LLMs, static analysis, software composition analysis, vulnerability remediation, application security

Recent Activity

patched-codes's activity

codelionĀ 
posted an update 4 days ago
view post
Post
2688
🧬 Hey everyone! Just released **OpenEvolve** - an open-source implementation of Google DeepMind's AlphaEvolve system.

It's an evolutionary coding agent that uses LLMs to discover and optimize algorithms. I successfully replicated DeepMind's results on circle packing (99.97% match!) and evolved a random search into a simulated annealing algorithm.

✨ Key features:
- Evolves entire codebases (not just single functions)
- Works with any OpenAI-compatible API
- LLM ensemble approach for better results
- Multi-objective optimization

šŸ‘‰ Check it out:
GitHub: https://github.com/codelion/openevolve
Blog post: https://huggingface.co/blog/codelion/openevolve

Would love to hear your thoughts or answer any questions about it!
codelionĀ 
posted an update 6 days ago
view post
Post
2326
Introducing Pivotal Token Search (PTS): A new technique for targeted LLM alignment

Excited to share Pivotal Token Search (PTS), a technique for identifying and optimizing critical decision points in LLM generations!

GitHub repository: https://github.com/codelion/pts

What is PTS?
PTS helps identify specific "pivotal tokens" that dramatically shift the probability of a successful generation. Unlike traditional DPO which treats all tokens equally, PTS focuses optimization on the tokens that actually matter for success.

Inspired by Microsoft's recent Phi-4 paper (which used this technique to achieve SOTA reasoning with only 14B parameters), PTS is especially effective for:
- Mathematical reasoning
- Coding tasks
- Multi-step problem solving
- Any domain where specific decision points strongly impact outcomes

What we're releasing today: codelion/pivotal-token-search-68241145d8b8502122f3ce4f

1. Open-source code:
- Complete implementation of the PTS algorithm
- Data generation pipelines
- Usage examples and documentation

2. Huggingface resources:
- Datasets collection: https://huggingface.co/datasets?other=pts
* Pre-generated preference pairs for various domains
* Ready to use in your DPO training pipelines

- Models collection: https://huggingface.co/models?other=pts
* Pre-trained models fine-tuned with PTS
* Specialized versions for different reasoning tasks

The algorithm is straightforward to implement and can significantly improve your model's reasoning capabilities. Check out the repository for details on getting started!

We welcome feedback, contributions, and collaborations. Let us know if you use PTS in your projects!

Add link to paper

1
#3 opened about 1 month ago by
nielsr
codelionĀ 
updated a Space 8 months ago
codelionĀ 
posted an update 9 months ago
view post
Post
2319
We recently worked with OpenAI to fine-tune gpt-4o and built the SOTA model for the patched-codes/static-analysis-eval benchmark. All the code and data patched-codes/synth-vuln-fixes on how we did it is available on their GitHub - https://github.com/openai/build-hours/tree/main/5-4o_fine_tuning.

Here are some tips based on our experience:

→ Establish baseline with "conditioning" / prompting

→ Task-specific datasets are ideal for PEFT; hard to beat gpt-4o on "broad" tasks

→ Add your best system prompt to each example

→ Ensure training data distribution is similar to inference data

→ Shorten instructions with concise prompts; may require more examples.

→ Define clear evaluation metrics (seriously, please eval!)

You can see more details on the benchmark and process here - https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities
codelionĀ 
posted an update 11 months ago
view post
Post
2882
A new paper titled "STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis" shows the benefits of integrating static analysis with LLMs. (https://arxiv.org/abs/2406.10018)

Authors evaluate 4 key questions:

- How does each static analysis integration strategy perform in LLM-based repository-level code completion?
> They found that integrating static analysis in the prompting phase (especially with file-level dependencies) can achieve the substantially larger improvements than other phases.

- How do different combinations of integration strategies affect LLM-based repository-level code completion?
> Languages that are easier to analyze like Java show more improvements compared to dynamic languages like Python.

- How do static analysis integration strategies perform when compared or combined with RAG in LLM-based repository-level code completion?
> Static analysis and RAG are complementary and boost the overall accuracy.

- What are the online costs of different integration strategies in LLM-based repository-level code completion?
> Combining prompting-phase static analysis and RAG is the best option for cost-effectiveness.

In my @owasp App Sec keynote last year, I had described how one can do static analysis augmented generation (SaAG) to boost the accuracy of LLM based patches for vulnerability remediation. (you can see the talk here - https://www.youtube.com/watch?v=Cw4-ZnUNVLs)