Do all LLMs support streaming responses?

Most major LLM APIs (Gemini, OpenAI, Claude, Llama) support streaming. Implementation methods (SSE vs ReadableStream) may differ, but the principle is the same.

Does conversation memory increase token costs?

Yes, including previous conversations in context increases tokens. That's why efficiently compressing via Level 2 (summary) and Level 3 (profile) is key.

Can I try Agent8's Hana chatbot directly?

Yes! You can start for free at agent8.org. 8 AI partners including Hana will start chatting with you immediately.

Do all LLMs support streaming responses?

Most major LLM APIs (Gemini, OpenAI, Claude, Llama) support streaming. Implementation methods (SSE vs ReadableStream) may differ, but the principle is the same.

Does conversation memory increase token costs?

Yes, including previous conversations in context increases tokens. That's why efficiently compressing via Level 2 (summary) and Level 3 (profile) is key.

Can I try Agent8's Hana chatbot directly?

Yes! You can start for free at agent8.org. 8 AI partners including Hana will start chatting with you immediately.

[Build Your Own X] Build Your Own AI Chatbot — A 4-Step Hands-On Tutorial

Introduction: The Real Architecture Behind Something Familiar

"Great conversation experience comes from design, not technology." — Hana (Agent8 Secretary Partner)

Chatbots are the most fundamental form of AI application. ChatGPT, Gemini, Claude — they all use the "ask in a chat window, get an answer" interface. But surprisingly few people understand how they actually work.

In this tutorial, we go beyond simple input-output to build a chatbot with real-time streaming, conversation memory, and persona design.

This is the second article in the series, following Build Your Own AI Agent.

Step 1: Basic Chat UI — Building the Interface

First, create a basic interface where users type messages and the AI responds.

// step1-chat-ui.tsx
"use client";
import { useState, useRef, useEffect } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export default function ChatBot() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const scrollRef = useRef<HTMLDivElement>(null);

  // Auto-scroll
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: "smooth",
    });
  }, [messages]);

  const sendMessage = async () => {
    if (!input.trim() || isLoading) return;

    const userMsg: Message = {
      role: "user",
      content: input,
      timestamp: new Date(),
    };
    setMessages((prev) => [...prev, userMsg]);
    setInput("");
    setIsLoading(true);

    // API call (upgraded to streaming in Step 2)
    const res = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages: [...messages, userMsg] }),
    });
    const data = await res.json();

    setMessages((prev) => [
      ...prev,
      { role: "assistant", content: data.reply, timestamp: new Date() },
    ]);
    setIsLoading(false);
  };

  return (
    <div className="flex flex-col h-screen">
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4">
        {messages.map((m, i) => (
          <div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
            <p>{m.content}</p>
          </div>
        ))}
        {isLoading && <p>Thinking...</p>}
      </div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        onKeyDown={(e) => e.key === "Enter" && sendMessage()}
        placeholder="Type your message..."
      />
    </div>
  );
}

The basic structure is simple. Stack messages in an array, call the API, append results. But at this stage, you'll see a blank screen until the response is complete.

Step 2: Streaming Responses — Text That Flows

The ChatGPT-like experience where characters appear one by one is implemented through streaming. The server sends tokens progressively instead of all at once.

// step2-streaming-api.ts (Server Side)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Generate streaming response
  const response = await ai.models.generateContentStream({
    model: "gemini-2.0-flash",
    contents: messages.map((m: { role: string; content: string }) => ({
      role: m.role === "user" ? "user" : "model",
      parts: [{ text: m.content }],
    })),
  });

  // Deliver as ReadableStream to client
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.text ?? "";
        controller.enqueue(new TextEncoder().encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

// step2-streaming-client.ts (Client Side)
const sendMessageStreaming = async () => {
  if (!input.trim() || isLoading) return;
  setIsLoading(true);

  const userMsg = { role: "user" as const, content: input, timestamp: new Date() };
  setMessages((prev) => [...prev, userMsg]);
  setInput("");

  const res = await fetch("/api/chat", {
    method: "POST",
    body: JSON.stringify({ messages: [...messages, userMsg] }),
  });

  // Read stream
  const reader = res.body?.getReader();
  const decoder = new TextDecoder();
  let accumulated = "";

  // Add empty assistant message first
  setMessages((prev) => [
    ...prev,
    { role: "assistant", content: "", timestamp: new Date() },
  ]);

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;
    accumulated += decoder.decode(value, { stream: true });
    // Update last message in real-time
    setMessages((prev) => {
      const updated = [...prev];
      updated[updated.length - 1] = {
        ...updated[updated.length - 1],
        content: accumulated,
      };
      return updated;
    });
  }

  setIsLoading(false);
};

✨ Yuna (Design Partner) Commentary: "Streaming isn't just a technology — it's a UX strategy. When text appears in real-time, users feel the AI is 'thinking.' A blank screen for 3 seconds feels like a bug, but watching text flow for 3 seconds feels natural."

Step 3: Conversation Memory — Never Losing Context

The biggest weakness of simple chatbots is they don't remember previous conversations. Say "that thing you mentioned earlier" and they respond "I'm not sure what you're referring to."

Conversation memory is designed in 3 layers:

// step3-memory.ts

// Level 1: Session Memory — Current conversation history (basic)
interface SessionMemory {
  messages: Message[];
  maxTokens: number; // context window limit
}

function trimToFit(memory: SessionMemory): Message[] {
  // Remove oldest messages when exceeding token limit
  let totalTokens = 0;
  const kept: Message[] = [];
  for (let i = memory.messages.length - 1; i >= 0; i--) {
    const tokens = Math.ceil(memory.messages[i].content.length / 4);
    if (totalTokens + tokens > memory.maxTokens) break;
    kept.unshift(memory.messages[i]);
    totalTokens += tokens;
  }
  return kept;
}

// Level 2: Summary Memory — Compressed past conversations
async function summarizeConversation(messages: Message[]): Promise<string> {
  const transcript = messages
    .map((m) => `${m.role}: ${m.content}`)
    .join("\n");
  return ask(`Summarize this conversation in 3 lines:\n${transcript}`);
}

// Level 3: Persistent Memory — User profile-based personalization
interface UserProfile {
  name: string;
  preferences: string[];
  pastTopics: string[];
  lastActive: Date;
}

function buildContextWithMemory(
  profile: UserProfile,
  summary: string,
  recentMessages: Message[]
): string {
  return `
[User Profile] Name: ${profile.name}, Interests: ${profile.preferences.join(", ")}
[Previous Conversation Summary] ${summary}
[Recent Messages]
${recentMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}
`;
}

Agent8's session memory → 14-day trial long-term memory → permanent long-term memory (Supporter plan) structure is exactly this 3-Level architecture in practice.

Step 4: Persona Design — Giving AI a Personality

Even a technically perfect chatbot is bland without personality. A persona is designed through system prompts.

// step4-persona.ts

interface Persona {
  name: string;
  role: string;
  tone: string;
  rules: string[];
  greeting: string;
}

const hanaPersona: Persona = {
  name: "Hana",
  role: "AI Secretary Partner",
  tone: "warm and friendly yet professional",
  rules: [
    "Address the user respectfully",
    "Show expertise in scheduling, meetings, and document management",
    "Lead with the key point, then follow with details",
    "Use emojis appropriately but not excessively",
    "Honestly say 'Let me check and get back to you' when unsure",
  ],
  greeting: "Hello! I'm Hana, your AI secretary. Ready to help you today ✨",
};

function buildSystemPrompt(persona: Persona): string {
  return `
Your name is ${persona.name} and you are a ${persona.role}.
Use a ${persona.tone} tone in all responses.

Rules you must follow:
${persona.rules.map((r, i) => `${i + 1}. ${r}`).join("\n")}

First greeting: "${persona.greeting}"
`;
}

// Apply system prompt to LLM calls
async function chatWithPersona(
  persona: Persona,
  userMessage: string,
  history: Message[]
) {
  const systemPrompt = buildSystemPrompt(persona);
  return ask(`
${systemPrompt}

${history.map((m) => `${m.role}: ${m.content}`).join("\n")}
user: ${userMessage}
`);
}

💼 Juno (Sales Partner) Commentary: "A persona isn't just a 'character' — it's a brand experience. The 'warmth' or 'expertise' a customer feels when talking to a chatbot determines revisit rates. This is exactly why each of Agent8's 8 partners has a unique persona."

Conclusion: The Art of Conversation Starts with Listening

Through 4 steps, we built:

Chat UI — Basic structure for message input and response display
Streaming Responses — Real-time text appearance experience
Conversation Memory — 3-layer memory system (session/summary/persistent)
Persona Design — How to give AI personality and tone

Each of Agent8's 8 partners has a unique persona. Hana is a warm secretary, Rex is a sharp auditor, Miso is an energetic marketer. The harmony of these diverse personalities working as one team — that's what distinguishes simple chatbots from Agent 8.

In the next installment, we'll build a "Search Engine" to fully implement the RAG pipeline.

Introduction: The Real Architecture Behind Something Familiar

"Great conversation experience comes from design, not technology." — Hana (Agent8 Secretary Partner)

In this tutorial, we go beyond simple input-output to build a chatbot with real-time streaming, conversation memory, and persona design.

This is the second article in the series, following Build Your Own AI Agent.

Step 1: Basic Chat UI — Building the Interface

First, create a basic interface where users type messages and the AI responds.

// step1-chat-ui.tsx
"use client";
import { useState, useRef, useEffect } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export default function ChatBot() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const scrollRef = useRef<HTMLDivElement>(null);

  // Auto-scroll
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: "smooth",
    });
  }, [messages]);

  const sendMessage = async () => {
    if (!input.trim() || isLoading) return;

    const userMsg: Message = {
      role: "user",
      content: input,
      timestamp: new Date(),
    };
    setMessages((prev) => [...prev, userMsg]);
    setInput("");
    setIsLoading(true);

    // API call (upgraded to streaming in Step 2)
    const res = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages: [...messages, userMsg] }),
    });
    const data = await res.json();

    setMessages((prev) => [
      ...prev,
      { role: "assistant", content: data.reply, timestamp: new Date() },
    ]);
    setIsLoading(false);
  };

  return (
    <div className="flex flex-col h-screen">
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4">
        {messages.map((m, i) => (
          <div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
            <p>{m.content}</p>
          </div>
        ))}
        {isLoading && <p>Thinking...</p>}
      </div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        onKeyDown={(e) => e.key === "Enter" && sendMessage()}
        placeholder="Type your message..."
      />
    </div>
  );
}

The basic structure is simple. Stack messages in an array, call the API, append results. But at this stage, you'll see a blank screen until the response is complete.

Step 2: Streaming Responses — Text That Flows

The ChatGPT-like experience where characters appear one by one is implemented through streaming. The server sends tokens progressively instead of all at once.

// step2-streaming-api.ts (Server Side)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Generate streaming response
  const response = await ai.models.generateContentStream({
    model: "gemini-2.0-flash",
    contents: messages.map((m: { role: string; content: string }) => ({
      role: m.role === "user" ? "user" : "model",
      parts: [{ text: m.content }],
    })),
  });

  // Deliver as ReadableStream to client
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.text ?? "";
        controller.enqueue(new TextEncoder().encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

// step2-streaming-client.ts (Client Side)
const sendMessageStreaming = async () => {
  if (!input.trim() || isLoading) return;
  setIsLoading(true);

  const userMsg = { role: "user" as const, content: input, timestamp: new Date() };
  setMessages((prev) => [...prev, userMsg]);
  setInput("");

  const res = await fetch("/api/chat", {
    method: "POST",
    body: JSON.stringify({ messages: [...messages, userMsg] }),
  });

  // Read stream
  const reader = res.body?.getReader();
  const decoder = new TextDecoder();
  let accumulated = "";

  // Add empty assistant message first
  setMessages((prev) => [
    ...prev,
    { role: "assistant", content: "", timestamp: new Date() },
  ]);

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;
    accumulated += decoder.decode(value, { stream: true });
    // Update last message in real-time
    setMessages((prev) => {
      const updated = [...prev];
      updated[updated.length - 1] = {
        ...updated[updated.length - 1],
        content: accumulated,
      };
      return updated;
    });
  }

  setIsLoading(false);
};

✨ Yuna (Design Partner) Commentary: "Streaming isn't just a technology — it's a UX strategy. When text appears in real-time, users feel the AI is 'thinking.' A blank screen for 3 seconds feels like a bug, but watching text flow for 3 seconds feels natural."

Step 3: Conversation Memory — Never Losing Context

The biggest weakness of simple chatbots is they don't remember previous conversations. Say "that thing you mentioned earlier" and they respond "I'm not sure what you're referring to."

Conversation memory is designed in 3 layers:

// step3-memory.ts

// Level 1: Session Memory — Current conversation history (basic)
interface SessionMemory {
  messages: Message[];
  maxTokens: number; // context window limit
}

function trimToFit(memory: SessionMemory): Message[] {
  // Remove oldest messages when exceeding token limit
  let totalTokens = 0;
  const kept: Message[] = [];
  for (let i = memory.messages.length - 1; i >= 0; i--) {
    const tokens = Math.ceil(memory.messages[i].content.length / 4);
    if (totalTokens + tokens > memory.maxTokens) break;
    kept.unshift(memory.messages[i]);
    totalTokens += tokens;
  }
  return kept;
}

// Level 2: Summary Memory — Compressed past conversations
async function summarizeConversation(messages: Message[]): Promise<string> {
  const transcript = messages
    .map((m) => `${m.role}: ${m.content}`)
    .join("\n");
  return ask(`Summarize this conversation in 3 lines:\n${transcript}`);
}

// Level 3: Persistent Memory — User profile-based personalization
interface UserProfile {
  name: string;
  preferences: string[];
  pastTopics: string[];
  lastActive: Date;
}

function buildContextWithMemory(
  profile: UserProfile,
  summary: string,
  recentMessages: Message[]
): string {
  return `
[User Profile] Name: ${profile.name}, Interests: ${profile.preferences.join(", ")}
[Previous Conversation Summary] ${summary}
[Recent Messages]
${recentMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}
`;
}

Agent8's session memory → 14-day trial long-term memory → permanent long-term memory (Supporter plan) structure is exactly this 3-Level architecture in practice.

Step 4: Persona Design — Giving AI a Personality

Even a technically perfect chatbot is bland without personality. A persona is designed through system prompts.

// step4-persona.ts

interface Persona {
  name: string;
  role: string;
  tone: string;
  rules: string[];
  greeting: string;
}

const hanaPersona: Persona = {
  name: "Hana",
  role: "AI Secretary Partner",
  tone: "warm and friendly yet professional",
  rules: [
    "Address the user respectfully",
    "Show expertise in scheduling, meetings, and document management",
    "Lead with the key point, then follow with details",
    "Use emojis appropriately but not excessively",
    "Honestly say 'Let me check and get back to you' when unsure",
  ],
  greeting: "Hello! I'm Hana, your AI secretary. Ready to help you today ✨",
};

function buildSystemPrompt(persona: Persona): string {
  return `
Your name is ${persona.name} and you are a ${persona.role}.
Use a ${persona.tone} tone in all responses.

Rules you must follow:
${persona.rules.map((r, i) => `${i + 1}. ${r}`).join("\n")}

First greeting: "${persona.greeting}"
`;
}

// Apply system prompt to LLM calls
async function chatWithPersona(
  persona: Persona,
  userMessage: string,
  history: Message[]
) {
  const systemPrompt = buildSystemPrompt(persona);
  return ask(`
${systemPrompt}

${history.map((m) => `${m.role}: ${m.content}`).join("\n")}
user: ${userMessage}
`);
}

💼 Juno (Sales Partner) Commentary: "A persona isn't just a 'character' — it's a brand experience. The 'warmth' or 'expertise' a customer feels when talking to a chatbot determines revisit rates. This is exactly why each of Agent8's 8 partners has a unique persona."

Conclusion: The Art of Conversation Starts with Listening

Through 4 steps, we built:

Chat UI — Basic structure for message input and response display
Streaming Responses — Real-time text appearance experience
Conversation Memory — 3-layer memory system (session/summary/persistent)
Persona Design — How to give AI personality and tone

In the next installment, we'll build a "Search Engine" to fully implement the RAG pipeline.

Introduction: The Real Architecture Behind Something Familiar

"Great conversation experience comes from design, not technology." — Hana (Agent8 Secretary Partner)

In this tutorial, we go beyond simple input-output to build a chatbot with real-time streaming, conversation memory, and persona design.

This is the second article in the series, following Build Your Own AI Agent.

Step 1: Basic Chat UI — Building the Interface

First, create a basic interface where users type messages and the AI responds.

// step1-chat-ui.tsx
"use client";
import { useState, useRef, useEffect } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export default function ChatBot() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const scrollRef = useRef<HTMLDivElement>(null);

  // Auto-scroll
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: "smooth",
    });
  }, [messages]);

  const sendMessage = async () => {
    if (!input.trim() || isLoading) return;

    const userMsg: Message = {
      role: "user",
      content: input,
      timestamp: new Date(),
    };
    setMessages((prev) => [...prev, userMsg]);
    setInput("");
    setIsLoading(true);

    // API call (upgraded to streaming in Step 2)
    const res = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages: [...messages, userMsg] }),
    });
    const data = await res.json();

    setMessages((prev) => [
      ...prev,
      { role: "assistant", content: data.reply, timestamp: new Date() },
    ]);
    setIsLoading(false);
  };

  return (
    <div className="flex flex-col h-screen">
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4">
        {messages.map((m, i) => (
          <div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
            <p>{m.content}</p>
          </div>
        ))}
        {isLoading && <p>Thinking...</p>}
      </div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        onKeyDown={(e) => e.key === "Enter" && sendMessage()}
        placeholder="Type your message..."
      />
    </div>
  );
}

The basic structure is simple. Stack messages in an array, call the API, append results. But at this stage, you'll see a blank screen until the response is complete.

Step 2: Streaming Responses — Text That Flows

The ChatGPT-like experience where characters appear one by one is implemented through streaming. The server sends tokens progressively instead of all at once.

// step2-streaming-api.ts (Server Side)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Generate streaming response
  const response = await ai.models.generateContentStream({
    model: "gemini-2.0-flash",
    contents: messages.map((m: { role: string; content: string }) => ({
      role: m.role === "user" ? "user" : "model",
      parts: [{ text: m.content }],
    })),
  });

  // Deliver as ReadableStream to client
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.text ?? "";
        controller.enqueue(new TextEncoder().encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

// step2-streaming-client.ts (Client Side)
const sendMessageStreaming = async () => {
  if (!input.trim() || isLoading) return;
  setIsLoading(true);

  const userMsg = { role: "user" as const, content: input, timestamp: new Date() };
  setMessages((prev) => [...prev, userMsg]);
  setInput("");

  const res = await fetch("/api/chat", {
    method: "POST",
    body: JSON.stringify({ messages: [...messages, userMsg] }),
  });

  // Read stream
  const reader = res.body?.getReader();
  const decoder = new TextDecoder();
  let accumulated = "";

  // Add empty assistant message first
  setMessages((prev) => [
    ...prev,
    { role: "assistant", content: "", timestamp: new Date() },
  ]);

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;
    accumulated += decoder.decode(value, { stream: true });
    // Update last message in real-time
    setMessages((prev) => {
      const updated = [...prev];
      updated[updated.length - 1] = {
        ...updated[updated.length - 1],
        content: accumulated,
      };
      return updated;
    });
  }

  setIsLoading(false);
};

✨ Yuna (Design Partner) Commentary: "Streaming isn't just a technology — it's a UX strategy. When text appears in real-time, users feel the AI is 'thinking.' A blank screen for 3 seconds feels like a bug, but watching text flow for 3 seconds feels natural."

Step 3: Conversation Memory — Never Losing Context

The biggest weakness of simple chatbots is they don't remember previous conversations. Say "that thing you mentioned earlier" and they respond "I'm not sure what you're referring to."

Conversation memory is designed in 3 layers:

// step3-memory.ts

// Level 1: Session Memory — Current conversation history (basic)
interface SessionMemory {
  messages: Message[];
  maxTokens: number; // context window limit
}

function trimToFit(memory: SessionMemory): Message[] {
  // Remove oldest messages when exceeding token limit
  let totalTokens = 0;
  const kept: Message[] = [];
  for (let i = memory.messages.length - 1; i >= 0; i--) {
    const tokens = Math.ceil(memory.messages[i].content.length / 4);
    if (totalTokens + tokens > memory.maxTokens) break;
    kept.unshift(memory.messages[i]);
    totalTokens += tokens;
  }
  return kept;
}

// Level 2: Summary Memory — Compressed past conversations
async function summarizeConversation(messages: Message[]): Promise<string> {
  const transcript = messages
    .map((m) => `${m.role}: ${m.content}`)
    .join("\n");
  return ask(`Summarize this conversation in 3 lines:\n${transcript}`);
}

// Level 3: Persistent Memory — User profile-based personalization
interface UserProfile {
  name: string;
  preferences: string[];
  pastTopics: string[];
  lastActive: Date;
}

function buildContextWithMemory(
  profile: UserProfile,
  summary: string,
  recentMessages: Message[]
): string {
  return `
[User Profile] Name: ${profile.name}, Interests: ${profile.preferences.join(", ")}
[Previous Conversation Summary] ${summary}
[Recent Messages]
${recentMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}
`;
}

Agent8's session memory → 14-day trial long-term memory → permanent long-term memory (Supporter plan) structure is exactly this 3-Level architecture in practice.

Step 4: Persona Design — Giving AI a Personality

Even a technically perfect chatbot is bland without personality. A persona is designed through system prompts.

// step4-persona.ts

interface Persona {
  name: string;
  role: string;
  tone: string;
  rules: string[];
  greeting: string;
}

const hanaPersona: Persona = {
  name: "Hana",
  role: "AI Secretary Partner",
  tone: "warm and friendly yet professional",
  rules: [
    "Address the user respectfully",
    "Show expertise in scheduling, meetings, and document management",
    "Lead with the key point, then follow with details",
    "Use emojis appropriately but not excessively",
    "Honestly say 'Let me check and get back to you' when unsure",
  ],
  greeting: "Hello! I'm Hana, your AI secretary. Ready to help you today ✨",
};

function buildSystemPrompt(persona: Persona): string {
  return `
Your name is ${persona.name} and you are a ${persona.role}.
Use a ${persona.tone} tone in all responses.

Rules you must follow:
${persona.rules.map((r, i) => `${i + 1}. ${r}`).join("\n")}

First greeting: "${persona.greeting}"
`;
}

// Apply system prompt to LLM calls
async function chatWithPersona(
  persona: Persona,
  userMessage: string,
  history: Message[]
) {
  const systemPrompt = buildSystemPrompt(persona);
  return ask(`
${systemPrompt}

${history.map((m) => `${m.role}: ${m.content}`).join("\n")}
user: ${userMessage}
`);
}

💼 Juno (Sales Partner) Commentary: "A persona isn't just a 'character' — it's a brand experience. The 'warmth' or 'expertise' a customer feels when talking to a chatbot determines revisit rates. This is exactly why each of Agent8's 8 partners has a unique persona."

Conclusion: The Art of Conversation Starts with Listening

Through 4 steps, we built:

Chat UI — Basic structure for message input and response display
Streaming Responses — Real-time text appearance experience
Conversation Memory — 3-layer memory system (session/summary/persistent)
Persona Design — How to give AI personality and tone

In the next installment, we'll build a "Search Engine" to fully implement the RAG pipeline.

Introduction: The Real Architecture Behind Something Familiar

"Great conversation experience comes from design, not technology." — Hana (Agent8 Secretary Partner)

In this tutorial, we go beyond simple input-output to build a chatbot with real-time streaming, conversation memory, and persona design.

This is the second article in the series, following Build Your Own AI Agent.

Step 1: Basic Chat UI — Building the Interface

First, create a basic interface where users type messages and the AI responds.

// step1-chat-ui.tsx
"use client";
import { useState, useRef, useEffect } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export default function ChatBot() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const scrollRef = useRef<HTMLDivElement>(null);

  // Auto-scroll
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: "smooth",
    });
  }, [messages]);

  const sendMessage = async () => {
    if (!input.trim() || isLoading) return;

    const userMsg: Message = {
      role: "user",
      content: input,
      timestamp: new Date(),
    };
    setMessages((prev) => [...prev, userMsg]);
    setInput("");
    setIsLoading(true);

    // API call (upgraded to streaming in Step 2)
    const res = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages: [...messages, userMsg] }),
    });
    const data = await res.json();

    setMessages((prev) => [
      ...prev,
      { role: "assistant", content: data.reply, timestamp: new Date() },
    ]);
    setIsLoading(false);
  };

  return (
    <div className="flex flex-col h-screen">
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4">
        {messages.map((m, i) => (
          <div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
            <p>{m.content}</p>
          </div>
        ))}
        {isLoading && <p>Thinking...</p>}
      </div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        onKeyDown={(e) => e.key === "Enter" && sendMessage()}
        placeholder="Type your message..."
      />
    </div>
  );
}

The basic structure is simple. Stack messages in an array, call the API, append results. But at this stage, you'll see a blank screen until the response is complete.

Step 2: Streaming Responses — Text That Flows

The ChatGPT-like experience where characters appear one by one is implemented through streaming. The server sends tokens progressively instead of all at once.

// step2-streaming-api.ts (Server Side)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Generate streaming response
  const response = await ai.models.generateContentStream({
    model: "gemini-2.0-flash",
    contents: messages.map((m: { role: string; content: string }) => ({
      role: m.role === "user" ? "user" : "model",
      parts: [{ text: m.content }],
    })),
  });

  // Deliver as ReadableStream to client
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.text ?? "";
        controller.enqueue(new TextEncoder().encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

// step2-streaming-client.ts (Client Side)
const sendMessageStreaming = async () => {
  if (!input.trim() || isLoading) return;
  setIsLoading(true);

  const userMsg = { role: "user" as const, content: input, timestamp: new Date() };
  setMessages((prev) => [...prev, userMsg]);
  setInput("");

  const res = await fetch("/api/chat", {
    method: "POST",
    body: JSON.stringify({ messages: [...messages, userMsg] }),
  });

  // Read stream
  const reader = res.body?.getReader();
  const decoder = new TextDecoder();
  let accumulated = "";

  // Add empty assistant message first
  setMessages((prev) => [
    ...prev,
    { role: "assistant", content: "", timestamp: new Date() },
  ]);

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;
    accumulated += decoder.decode(value, { stream: true });
    // Update last message in real-time
    setMessages((prev) => {
      const updated = [...prev];
      updated[updated.length - 1] = {
        ...updated[updated.length - 1],
        content: accumulated,
      };
      return updated;
    });
  }

  setIsLoading(false);
};

✨ Yuna (Design Partner) Commentary: "Streaming isn't just a technology — it's a UX strategy. When text appears in real-time, users feel the AI is 'thinking.' A blank screen for 3 seconds feels like a bug, but watching text flow for 3 seconds feels natural."

Step 3: Conversation Memory — Never Losing Context

The biggest weakness of simple chatbots is they don't remember previous conversations. Say "that thing you mentioned earlier" and they respond "I'm not sure what you're referring to."

Conversation memory is designed in 3 layers:

// step3-memory.ts

// Level 1: Session Memory — Current conversation history (basic)
interface SessionMemory {
  messages: Message[];
  maxTokens: number; // context window limit
}

function trimToFit(memory: SessionMemory): Message[] {
  // Remove oldest messages when exceeding token limit
  let totalTokens = 0;
  const kept: Message[] = [];
  for (let i = memory.messages.length - 1; i >= 0; i--) {
    const tokens = Math.ceil(memory.messages[i].content.length / 4);
    if (totalTokens + tokens > memory.maxTokens) break;
    kept.unshift(memory.messages[i]);
    totalTokens += tokens;
  }
  return kept;
}

// Level 2: Summary Memory — Compressed past conversations
async function summarizeConversation(messages: Message[]): Promise<string> {
  const transcript = messages
    .map((m) => `${m.role}: ${m.content}`)
    .join("\n");
  return ask(`Summarize this conversation in 3 lines:\n${transcript}`);
}

// Level 3: Persistent Memory — User profile-based personalization
interface UserProfile {
  name: string;
  preferences: string[];
  pastTopics: string[];
  lastActive: Date;
}

function buildContextWithMemory(
  profile: UserProfile,
  summary: string,
  recentMessages: Message[]
): string {
  return `
[User Profile] Name: ${profile.name}, Interests: ${profile.preferences.join(", ")}
[Previous Conversation Summary] ${summary}
[Recent Messages]
${recentMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}
`;
}

Agent8's session memory → 14-day trial long-term memory → permanent long-term memory (Supporter plan) structure is exactly this 3-Level architecture in practice.

Step 4: Persona Design — Giving AI a Personality

Even a technically perfect chatbot is bland without personality. A persona is designed through system prompts.

// step4-persona.ts

interface Persona {
  name: string;
  role: string;
  tone: string;
  rules: string[];
  greeting: string;
}

const hanaPersona: Persona = {
  name: "Hana",
  role: "AI Secretary Partner",
  tone: "warm and friendly yet professional",
  rules: [
    "Address the user respectfully",
    "Show expertise in scheduling, meetings, and document management",
    "Lead with the key point, then follow with details",
    "Use emojis appropriately but not excessively",
    "Honestly say 'Let me check and get back to you' when unsure",
  ],
  greeting: "Hello! I'm Hana, your AI secretary. Ready to help you today ✨",
};

function buildSystemPrompt(persona: Persona): string {
  return `
Your name is ${persona.name} and you are a ${persona.role}.
Use a ${persona.tone} tone in all responses.

Rules you must follow:
${persona.rules.map((r, i) => `${i + 1}. ${r}`).join("\n")}

First greeting: "${persona.greeting}"
`;
}

// Apply system prompt to LLM calls
async function chatWithPersona(
  persona: Persona,
  userMessage: string,
  history: Message[]
) {
  const systemPrompt = buildSystemPrompt(persona);
  return ask(`
${systemPrompt}

${history.map((m) => `${m.role}: ${m.content}`).join("\n")}
user: ${userMessage}
`);
}

💼 Juno (Sales Partner) Commentary: "A persona isn't just a 'character' — it's a brand experience. The 'warmth' or 'expertise' a customer feels when talking to a chatbot determines revisit rates. This is exactly why each of Agent8's 8 partners has a unique persona."

Conclusion: The Art of Conversation Starts with Listening

Through 4 steps, we built:

Chat UI — Basic structure for message input and response display
Streaming Responses — Real-time text appearance experience
Conversation Memory — 3-layer memory system (session/summary/persistent)
Persona Design — How to give AI personality and tone

In the next installment, we'll build a "Search Engine" to fully implement the RAG pipeline.

[Build Your Own X] Build Your Own AI Chatbot — A 4-Step Hands-On Tutorial

Introduction: The Real Architecture Behind Something Familiar

Step 1: Basic Chat UI — Building the Interface

Step 2: Streaming Responses — Text That Flows

Step 3: Conversation Memory — Never Losing Context

Step 4: Persona Design — Giving AI a Personality

Conclusion: The Art of Conversation Starts with Listening

Frequently Asked Questions

Related Articles

[Build Your Own X] Build Your Own AI Agent — A 5-Step Hands-On Tutorial

[Build Your Own X] Build Your Own Search Engine — A 4-Step Hands-On Tutorial

Experience the Agent 8

[Build Your Own X] Build Your Own AI Chatbot — A 4-Step Hands-On Tutorial

Introduction: The Real Architecture Behind Something Familiar

Step 1: Basic Chat UI — Building the Interface

Step 2: Streaming Responses — Text That Flows

Step 3: Conversation Memory — Never Losing Context

Step 4: Persona Design — Giving AI a Personality

Conclusion: The Art of Conversation Starts with Listening

Frequently Asked Questions

Related Articles

[Build Your Own X] Build Your Own AI Agent — A 5-Step Hands-On Tutorial

[Build Your Own X] Build Your Own Search Engine — A 4-Step Hands-On Tutorial

Experience the Agent 8

[Build Your Own X] Build Your Own AI Chatbot — A 4-Step Hands-On Tutorial

Introduction: The Real Architecture Behind Something Familiar

Step 1: Basic Chat UI — Building the Interface

Step 2: Streaming Responses — Text That Flows

Step 3: Conversation Memory — Never Losing Context

Step 4: Persona Design — Giving AI a Personality

Conclusion: The Art of Conversation Starts with Listening

Frequently Asked Questions

Related Articles

[Build Your Own X] Build Your Own AI Agent — A 5-Step Hands-On Tutorial

[Build Your Own X] Build Your Own Search Engine — A 4-Step Hands-On Tutorial

Experience the Agent 8

[Build Your Own X] Build Your Own AI Chatbot — A 4-Step Hands-On Tutorial

Introduction: The Real Architecture Behind Something Familiar

Step 1: Basic Chat UI — Building the Interface

Step 2: Streaming Responses — Text That Flows

Step 3: Conversation Memory — Never Losing Context

Step 4: Persona Design — Giving AI a Personality

Conclusion: The Art of Conversation Starts with Listening

Frequently Asked Questions

Related Articles

[Build Your Own X] Build Your Own AI Agent — A 5-Step Hands-On Tutorial

[Build Your Own X] Build Your Own Search Engine — A 4-Step Hands-On Tutorial

Experience the Agent 8