AI Agents: Part 1 | A Comprehensive Guide
- Pulkit Sahu

- Sep 22
- 4 min read
Updated: Oct 13
In this series of 2 posts, we will be understanding AI agents and how to build them.
AI Agents: Post 1 of 2 | Check Post 2 here
Large language models (LLMs) are becoming increasingly adept at problem-solving, reasoning, and tackling complex tasks. However, their ability to make decisions independently and perform tasks without much human intervention is still a work in progress.
This is where AI agents come into the picture. Agents can operate independently, make decisions on their own, and complete tasks with minimal human involvement.

In this post, I’ll walk you through the basics of AI agents—what they are, why they matter, and how to start building them.
Quick Links
#1: What is an AI Agent?
At its core, an AI agent is a system designed to perform tasks independently. For any agent to function effectively, three key components come into play: the Brain, the Body, and the Actions. Let’s put this into context:
The Brain: The “brain” of an AI agent is usually a large language model. It understands tasks, breaks down larger problems into smaller ones, and handles planning and decision-making.
The Body: The “body” represents the system or environment in which the agent operates. This could be a software framework, an interface, or even a robotic form—essentially, the structure that enables the brain to interact with the real (or digital) world.
The Actions: Actions are how the agent executes tasks. They are carried out through tools, APIs, or direct interactions with the environment.
A Quick Example: The Shopping Agent
Let’s consider a simple example of a shopping agent.
The Brain of this agent will be a state-of-the-art LLM, say GPT-4o-mini, for our working demo.
We’ll design a small interactive platform (body) that allows the agent to act. We’ll equip it with tools such as web browsing, order placement, and basic computer use.
The agent can also store a customer’s preferences and likes in memory to personalise shopping experiences.
To keep things safe, we’ll add guardrails, such as:
Spending limits (e.g., cannot place an order above ₹2000 without approval).
Safety filters (e.g., no harmful or restricted products).
Confirmation prompts before final purchases.
Logs and activity history for review.
Yes, I know this may sound a bit too technical right now—but don’t worry. As we move along, it will all start to make sense.
#2: The Brain of an AI Agent
A large language model (LLM) acts as the brain of an AI agent. It helps the agent:
Understand complex tasks.
Define and design goals along with possible courses of action.
Break down a complex task into smaller, manageable steps.
Plan and make decisions.
All of this is possible through the reasoning and generation abilities of language models.
For example, given a text prompt like:
“I want to shop for the latest autumn–winter style dresses.”
An LLM can produce detailed instructions or actions in natural language that the AI agent can then use to act.
This is exactly what happens in our Shopping Agent example, where the LLM provides the brainpower to understand the request and guide the next steps.
#3: The Body of an AI Agent
While an LLM acts as an agent’s brain, you can think of the body as the interactive environment or platform that allows the agent to perceive and interact with its surroundings. All the instructions, planning, and decisions generated by the brain are executed through this environment.
The body of an AI agent may take different forms:
Software environment – such as a web app, automation framework, or chatbot interface.
Physical hardware – a robot made of steel, actuators, motors, and sensors that can perceive its surroundings and act in the real world.
Hybrid systems – where the agent exists in software but can also control hardware components.
In our Shopping Agent example, the body is a web platform. This interactive environment connects the LLM (the brain) with tools like web browsing and order placement, enabling the agent to take real, practical actions.
#4: The Actions of an AI Agent
Once an agent has a brain and a body, the next step is actions. Actions are the visible outcomes of the agent’s reasoning and decision-making—the part you actually see happening.
Actions are typically carried out through:
Tools and APIs – An agent can use tools like a weather-checking API to get current weather data, or a payment API to place an order (as in our Shopping Agent).
Function Calling – Agents often need to access external functionalities beyond their own scope, such as retrieving customer account details or fetching weather data. This is done by calling predefined external functions.
Web Search – A web search tool helps an agent explore the internet for relevant information, sources, or products. In our case, the Shopping Agent can search for the latest autumn–winter dresses online.
Computer Use – Some agents are designed to directly control a device, performing tasks like opening apps, organising files, or automating workflows.
In the Shopping Agent example, actions might include:
Searching online for dresses.
Filtering results based on stored customer preferences.
Adding items to the cart.
Placing an order once confirmed.
These actions close the loop: the brain (LLM) understands, the body (platform) enables interaction, and the actions make everything real and useful.
We just discussed the three important components of an AI agent: LLM (Brain), Environment (Body), and Actions. But in practice, agents can be made much more capable with additional layers:
Memory and Knowledge Base – Agents can be augmented with memory, such as a database of custom articles, posts, or domain-specific knowledge. For example, a shopping agent might store curated content about dresses, fashion trends, or customer preferences to serve requests more effectively.
Guardrails and Safety – Safety measures are essential to ensure agents produce reliable and responsible results. These can include filters, spending limits, confirmation prompts, and content safety checks.
Orchestration – Think of this as the rhythmic dance that ties everything together. Orchestration involves stitching all components (brain, body, actions, memory, and guardrails) into one cohesive system. It also includes deploying the agent, monitoring its real-time behaviour, and evaluating whether it successfully performs the intended tasks.
Together, these enhancements move an AI agent from being a simple demo into a robust, production-ready system.
In the second part of this series, we will build our AI agent (Health Coach Agent) together.
Update: The second post is now ready!











Comments