{"id":1027,"date":"2026-06-12T10:37:17","date_gmt":"2026-06-12T09:37:17","guid":{"rendered":"https:\/\/metrics.blogg.gu.se\/?p=1027"},"modified":"2026-06-05T10:47:53","modified_gmt":"2026-06-05T09:47:53","slug":"my-prompt-is-better-than-your-prompt-how-to-optimize-your-prompts-in-the-age-of-agentic-ai","status":"publish","type":"post","link":"https:\/\/metrics.blogg.gu.se\/?p=1027","title":{"rendered":"My prompt is better than your prompt &#8211; how to optimize your prompts in the age of agentic AI"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"559\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-1024x559.jpg\" alt=\"\" class=\"wp-image-1028\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-1024x559.jpg 1024w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-300x164.jpg 300w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-768x419.jpg 768w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-1536x838.jpg 1536w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-2048x1117.jpg 2048w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-1200x655.jpg 1200w, https:\/\/metrics.blogg.gu.se\/files\/2026\/06\/prompts-1320x720.jpg 1320w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\n\n\n\n<p>Image generated by Gemini based on the content of this post<\/p>\n\n\n\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2605.19102\">https:\/\/arxiv.org\/pdf\/2605.19102<\/a><\/p>\n\n\n\n<p class=\"has-drop-cap\">Getting Large Language Models (LLMs) to write functional code often feels like casting spells; a slight misphrasing in your prompt can result in a buggy output. This is even more important now that we have agents which work for days on our tasks. <\/p>\n\n\n\n<p>The core issue is that while LLMs are powerful, their code generation performance is highly sensitive to prompt formulation. Traditional manual engineering is tedious, and existing automated techniques often treat prompt modifications\u2014like lexical edits or semantic rewriting\u2014in isolation. They also typically rely on binary (pass\/fail) signals, ignoring valuable information about partial correctness.<\/p>\n\n\n\n<p>When I was at VECS, I got to meet that Swedish Champion in prompting. He told me that the best technique is to use LLMs to create prompts. This paper embraces that idea and goes even further &#8211; creating a full reinforcement learning framework to make prompts. <\/p>\n\n\n\n<p>In this paper, the agent is guided by <em>shaped rewards<\/em> derived from unit-test feedback. Instead of just rewarding full passes, the system provides denser learning signals by rewarding the <em>proportion<\/em> of test cases passed. This enables the agent to discover sequences of transformations that progressively improve the functional correctness of the generated code.<\/p>\n\n\n\n<p>The framework was evaluated on a few widely known benchmarks (MBPP+, HumanEval+, APPS) using three code generators: CodeT5+, CodeLLaMA, and DeepSeek-Coder. On the MBPP+ test set (500 tasks), the PPO agent achieved strict Pass@1 scores of:<\/p>\n\n\n\n<ul>\n<li><strong>57.58%<\/strong> for CodeT5+<\/li>\n\n\n\n<li><strong>64.80%<\/strong> for CodeLLaMA<\/li>\n\n\n\n<li><strong>85.50%<\/strong> for DeepSeek-Coder<\/li>\n<\/ul>\n\n\n\n<p>These results significantly outperformed direct generation and existing iterative strategies like EPIC and Reflexion. Furthermore, comparison against a &#8220;Random-Hybrid&#8221; baseline confirmed that the gains aren&#8217;t just from having the transformation tools, but from the agent <em>learning how to intelligently schedule them<\/em> based on feedback.<\/p>\n\n\n\n<p>The key takeaway is clear: feedback-driven, multi-step RL optimization can move code generation beyond manual prompt engineering, providing an adaptive, automated path to functionally correct code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Image generated by Gemini based on the content of this post https:\/\/arxiv.org\/pdf\/2605.19102 Getting Large Language Models (LLMs) to write functional code often feels like casting spells; a slight misphrasing in your prompt can result in a buggy output. This is even more important now that we have agents which work for days on our tasks. &hellip; <a href=\"https:\/\/metrics.blogg.gu.se\/?p=1027\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;My prompt is better than your prompt &#8211; how to optimize your prompts in the age of agentic AI&#8221;<\/span><\/a><\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1027"}],"collection":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1027"}],"version-history":[{"count":2,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1027\/revisions"}],"predecessor-version":[{"id":1033,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1027\/revisions\/1033"}],"wp:attachment":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1027"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1027"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1027"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}