Abstract
The approaches that guide Large Language Models (LLMs) to emulate human reasoning during response generation have emerged as an effective method for enabling them to solve complex problems in a step-by-step manner, thereby achieving superior performance. However, most existing approaches using few-shot prompts to generate responses heavily depend on the provided examples, limiting the utilization of the model’s inherent reasoning capabilities. Moreover, constructing task-specific few-shot prompts is often costly and may lead to inconsistencies across different tasks. In this work, we introduce Template-Oriented Reasoning (TORSO), which elicits the model to utilize internal reasoning abilities to generate proper responses across various tasks without the need for manually crafted few-shot examples. Our experimental results demonstrate that TORSO achieves strong performance on diverse LLMs benchmarks with reasonable rationales.
Citation:
Kim, M. et al. (2025). TORSO: Template-Oriented Reasoning Towards General Tasks. arXiv preprint arXiv:2509.09448. https://arxiv.org/abs/2509.09448
Beyond Generic Prompts: An Adaptive Approach to Controllable and Transparent AI Reasoning
Introduction
Large Language Models (LLMs) have proven to be incredibly versatile, but consistently eliciting sophisticated, transparent, and context-aware reasoning remains a significant challenge. While techniques like Chain-of-Thought (CoT) prompting offer improvements, they often lack the specificity needed for complex, multi-faceted problems. The original TORSO research introduced a powerful, lightweight method to signal an LLM to “reason.” Building upon this foundation, I’ve developed SIRE: Structured Intent Reasoning Engine, an innovative pipeline designed to dynamically steer LLMs towards precise, intent-driven thought processes.
SIRE moves beyond a generic reasoning trigger to offer a “cognitive steering wheel,” enabling LLMs to intelligently select and apply the optimal thinking mode for any given query. This project is an exciting exploration into how we can make LLMs not just smarter, but more predictable, transparent, and truly adaptive in their reasoning.
The Foundation: The TORSO Concept
The original TORSO (Template-Oriented Reasoning Towards General Tasks) research paper presented a novel, minimal-intervention method. It proposed injecting a generic <reasoning> token at the beginning of an LLM’s response generation. This served as an explicit signal, prompting the model to produce a structured rationale before concluding with an <answer>. This simple yet effective technique demonstrated that LLMs possess latent reasoning abilities that can be activated and guided without extensive fine-tuning or cumbersome few-shot examples. TORSO highlighted the power of structural cues in fostering more coherent and plausible AI-generated thought processes.
Introducing SIRE: Orchestrating Adaptive Reasoning
While TORSO’s generic <reasoning> token is effective, real-world tasks often demand specific types of thinking. Strategizing, diagnosing a problem, or creatively brainstorming all involve distinct cognitive patterns. SIRE addresses this by transforming the generic reasoning signal into a dynamically selected, intent-specific reasoning template. This allows us to guide the LLM not just to “reason,” but to “reason causally,” or “reason step-by-step,” or “reason strategically,” tailoring its approach to the precise demands of the query.
The SIRE Pipeline operates in three interconnected, adaptive stages:
Stage 1: Intent/Query Classification (The Router)
The journey begins by understanding the user’s intent. Before any generation occurs, an initial, capable LLM (in my PoC, llama-4-scout-17b-16e-instruct) analyzes the raw user query. Its task is to classify the query into a predefined reasoningPattern (e.g., strategic_planning, causal_reasoning, stepwise_decomposition). This crucial step ensures the downstream pipeline knows what kind of thinking the problem requires. The output is a structured JSON, providing not just the classification but also a confidence score, adding a layer of transparency.
# Example: Stage 1 Schema for LLM Classification
stage1_schema = {
"type": "object",
"properties": {
"reasoningPattern": {
"type": "string",
"description": "One of: 'strategic_planning', 'causal_reasoning', 'stepwise_decomposition', 'heuristic_problem_solving', 'creative_brainstorm', 'deductive_inference', 'abductive_inference', 'counterfactual_analysis', 'comparative_analysis', 'predictive_forecasting', 'ethical_consideration'"
},
"confidence": {"type": "number"},
"explanation": {"type": "string"}
},
"required": ["reasoningPattern", "confidence"],
"additionalProperties": False
}
# This schema guides the classification LLM to output a structured JSON,
# enabling robust parsing and routing decisions.
Stage 2: Prompt Rewriting & Meta-Instruction Augmentation (The Enhancer)
Once the reasoning intent is classified, the original query can be optimized for clarity. More importantly, this stage dynamically prepends a task-specific meta-instruction to the prompt. This involves minor edits for specificity, such as resolving ambiguous pronouns or adding key context to ensure the main reasoning engine interprets the task correctly. More importantly, this stage dynamically prepends a task-specific meta-instruction to the prompt. These meta-instructions are comprehensive, pre-defined guides for each reasoning pattern. This is where the magic of composite reasoning happens: a single primary pattern’s meta-instruction (e.g., strategic_planning) can subtly guide the LLM to employ other reasoning patterns (like comparative analysis for competitors, predictive forecasting for market trends, or ethical considerations for impact assessment) within its overall strategic thought process.
# Example: Snippet of Composite Meta-Instruction for Strategic Planning
REASONING_META_INSTRUCTIONS = {
'strategic_planning': """To create this strategic plan, proceed as follows:
1. **Define Objectives & Goals:** Clearly state measurable outcomes and key success indicators.
2. **Analyze Environment & Audience (using Comparative, Causal, & Predictive Reasoning):**
* Conduct a **comparative analysis** of direct and indirect competitors...
* Identify **causal factors**...
* Provide **predictive forecasts** for market growth...
... (full instruction is much longer, guiding multi-faceted thinking)
""",
# ... other detailed meta-instructions for each reasoning pattern
}
# The chosen meta-instruction dynamically augments the user's prompt,
# effectively providing a sophisticated playbook for the LLM.
augmented_prompt = f"{meta_instruction}\n\nQuery: {rewritten_prompt}"
Stage 3: Template-Guided Decoding (The Reasoner)
Finally, the augmented prompt, now enriched with specific meta-instructions and the original query, is fed to the main reasoning LLM (qwen-3-235b-a22b-instruct-2507 in my PoC). A custom LogitsProcessor ensures that a specialized, invisible token (e.g., <strategic_planning>) corresponding to the chosen reasoning type is forced as the very first token in the LLM’s output. This technique, superior to simply typing the pattern name, provides a stronger and less ambiguous signal within the model’s token vocabulary, explicitly setting the desired cognitive mode. This explicitly sets the cognitive mode. Moreover, adaptive decoding parameters (such as temperature, top_p, and max_new_tokens) are dynamically adjusted for each reasoning pattern, optimizing the output style, verbosity, and length. The LLM then generates its detailed rationale, framed by the specialized start and end reasoning tags, followed by the final answer.
# Example: Adaptive Decoding Parameters, including max_new_tokens
REASONING_DECODING_PARAMS = {
'strategic_planning': {'temperature': 0.3, 'top_p': 0.85, 'max_new_tokens': 2500, 'repetition_penalty': 1.05},
'causal_reasoning': {'temperature': 0.2, 'top_p': 0.8, 'max_new_tokens': 1000, 'repetition_penalty': 1.02},
# ... and so on for all defined reasoning types
}
# These parameters are passed to the LLM generation call, dynamically
# optimizing output characteristics for the specific reasoning task.
Why SIRE Matters: Potential Gains and Future Work
SIRE’s adaptive, intent-driven approach promises several significant advantages:
- Precision and Relevance: By explicitly routing to the right reasoning template, SIRE ensures that LLM outputs are precisely aligned with the query’s true intent, minimizing irrelevant or off-topic information.
- Enhanced Rationale Quality: The guided, often composite thinking, leads to deeply structured, coherent, and comprehensive rationales. The generated outputs aren’t just steps; they reflect a strategically organized and multi-faceted thought process.
- Greater Control & Explainability: This framework offers unprecedented control over how LLMs think, fostering more transparent, auditable, and ultimately more trustworthy AI-generated insights for complex business decisions.
- Scalability & Efficiency: SIRE minimizes the need for extensive per-task human prompt engineering, making LLMs more efficient and cost-effective for a diverse range of enterprise applications.
Benchmark (Work in Progress)
Formal benchmarks comparing SIRE against established baselines like Chain-of-Thought (CoT) and Tree-of-Thought (ToT) are currently work in progress. Future work will also focus on expanding the library of reasoning patterns and exploring SIRE’s application in orchestrating multi-agent systems. However, the qualitative results from my Proof of Concept are exceptionally promising, demonstrating a clear ability to generate outputs that are both accurate and structured according to the desired reasoning pattern.
Sample Output Example (Digital Marketing Campaign Plan – Classified as strategic_planning):
When presented with a complex query like, “Create a 6-week digital marketing campaign plan for a new organic skincare brand launching in Nairobi…”, SIRE’s output showcased remarkable depth:
- Intent Classified: strategic_planning.
- Guided Reasoning: The LLM, leveraging the detailed strategic_planning meta-instruction, structured its response to include:
- Clear Objectives and Goals.
- A comprehensive Market Analysis, incorporating comparative insights on competitors, causal factors influencing the market, and predictive forecasts for growth.
- A stepwise decomposed Action Plan with a weekly timeline.
- Key Performance Indicators (KPIs) deductively linked to the initial goals.
- A Risk Assessment, employing causal and counterfactual analysis for mitigation.
- An explicit section on ethical considerations regarding the campaign’s impact.
This generated output was a multi-faceted, business-ready document, illustrating the powerful effect of combining intent classification with dynamic, composite reasoning guidance.
Tech Stack Breakdown
The development of the SIRE Proof of Concept utilizes a robust and flexible tech stack:
- Python: The core programming language orchestrating the entire pipeline.
- Cerebras Cloud SDK: Providing seamless and secure access to powerful LLMs (e.g., llama-4-scout-17b-16e-instruct for classification and rewriting; qwen-3-235b-a22b-instruct-2507 for the main reasoning engine).
- dotenv: For managing API keys and environment variables securely.
- json: Crucial for defining and validating structured data schemas for inter-stage communication and robust output parsing.
- Custom Python Logic: Including the implementation of specialized LogitsProcessor classes (for token injection) and the overall orchestration of the multi-stage reasoning pipeline.
Conclusion
SIRE represents a compelling step forward in making LLM reasoning more controllable, precise, and transparent. By enabling LLMs to adapt their cognitive approach to the specific demands of a task, we unlock new levels of intelligence and utility. This project is an ongoing journey, and I’m excited by the prospect of formal benchmarking and further refinements. Stay tuned for more updates as SIRE evolves!
