Poetiq’s Meta System Automates Model Optimization, Drastically Improves Performance Across LLMs on LiveCodeBench Pro

Poetiq has recently published groundbreaking results showing its Meta System significantly enhancing the performance of various large language models (LLMs) on LiveCodeBench Pro (LCB Pro), a rigorous coding benchmark. The Meta System automatically builds and optimizes an inference harness for each model, achieving state of the art scores without fine tuning or accessing internal model details.

For instance, Poetiq’s GPT 5.5 High, equipped with the optimized harness, scored 93.9% on LCB Pro, up from its baseline of 89.6%. Gemini 3.1 Pro also saw a notable improvement, jumping from 78.6% to 90.9%, surpassing Google’s Gemini 3 Deep Think (88.8%). This is particularly impressive given that the harness was specifically optimized for Gemini 3.1 Pro.

LiveCodeBench Pro is designed to test AI coding abilities in a manner resistant to common benchmark failures such as data contamination and overfitting. The benchmark draws problems from competitive programming competitions, ensuring solutions are validated against comprehensive testing frameworks. Correct output alone isn’t sufficient; models must also satisfy specific memory and runtime constraints. Continuous updates further distinguish LCB Pro from static benchmarks.

Poetiq’s Meta System was tested on a wide range of LLMs across different difficulty levels Easy, Medium, and Hard mirroring the three distinct task categories in their research: reasoning challenges (ARC-AGI), retrieval challenges (Humanity’s Last Exam or HLE), and coding challenges. The latter category involves complex problem solving and high quality procedural logic.

The Meta System’s key objectives were to prove that an intelligent harness can enhance model performance without fine tuning, validate its recursive self improvement capabilities, and ensure the resulting harness is model agnostic. Poetiq claims their system constructs a custom task specific harness from scratch using only standard API access.

Here are some specific results:

  • Gemini 3.1 Pro: Improved from 78.6% to 90.9%

  • GPT 5.5 High: Improved from 89.6% to 93.9%

  • Gemini 3.0 Flash: Improved by 10 percentage points, going from 72.3% to 82.3%, overtaking larger and more expensive models like Claude Opus 4.7.

  • Kimi K2.6: Improved by a significant 30 percentage points, from 50.0% to 79.9%

  • Nemotron 3 Super 120B: Improved by 12.8%

The Hard category saw the most substantial improvements, with models like Gemini 3.1 Pro and GPT 5.5 High achieving scores well above their baselines.

Poetiq’s Meta System demonstrates a powerful approach to enhancing AI capabilities across various LLMs without requiring extensive fine tuning or access to internal model details. The results highlight the potential for automation in building efficient, task specific harnesses that can significantly boost performance on complex coding tasks.

Source: https://www.marktechpost.com/2026/05/14/poetiqs-meta-system-automatically-builds-a-model-agnostic-harness-that-improved-every-llm-tested-on-livecodebench-pro-without-fine-tuning/

Thinking about building an AI product?

Get in Touch