Revolutionary Data Explorer Outpaces Competitors with 30x Speed Boost in Multi-step Reasoning
March 13, 2026
The architecture separates foundational knowledge building from rapid inference through a three-phase workflow: Learning to build reusable tools, fast inference with pre-built helper libraries, and offline reflection to refine the system.
Two primary applications drive the approach: open-ended exploratory data analysis powered by a ReAct agent with Jupyter Notebook tools, and multi-step rule-based tabular data QA using a Tool Calling Agent.
The project achieves state-of-the-art performance on the Data Agent Benchmark for Multi-step Reasoning, ranking first with about a 30x speedup over the Claude code baseline.
Data Explorer is framed as a new paradigm for data-intensive research, enabling scalable, high-quality insights through LLM-powered agents and reusable toolkits, with NVIDIA Launchable resources highlighted for building similar agents.
During the Learning phase, tasks are tackled to create generalized functions and scripts, which are distilled into helper.py and few-shot examples for efficient reuse.
Key architectural components include a stateful Python interpreter, a retriever, and a file-structure detector, unified by a centralized helper.py library that consolidates core logic.
The Offline Reflection phase uses heavyweight models for reflection and group-consistency to evaluate code and reasoning, feeding insights back into the system prompt to accelerate future inferences.
In the Inference phase, a lighter model (such as Haiku) operates with a pruned context window to maximize speed while leveraging pre-built tools.
NVIDIA’s KGMON (NeMo Agent Toolkit) Data Explorer serves as an autonomous data-analysis agent designed to tackle structured tabular data through multi-step reasoning, tool use, and automatic code execution.
Results show substantial speed and efficiency gains: about 20 seconds per task with 1,870 characters of output versus 10 minutes and 5,011 characters for a scratch approach, with strong performance on hard tasks and first place on DABStep.
Summary based on 1 source
Get a daily email with more AI stories
Source

Hugging Face • Mar 13, 2026
Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation