Real-time Streaming Chat

A token-streaming pattern for demos where perceived responsiveness matters.

Problem

Students think the app is frozen when a full answer takes several seconds, especially during live Q&A or practice interviews.

Users

Students practicing interviews, tutors giving live feedback, and teams building chat-first demos.

Why this track

This extends the basic LLM API lesson into a richer interaction pattern without needing a new Azure service.

Architecture

Stay minimal. 5-6 nodes. Each arrow is one network hop.

user

Chat UI

api

Streaming endpoint

model

gpt-4o-mini stream=True

buffer

Token buffer

final

Final answer store

Edges

user api — prompt
api model — stream request
model buffer — delta chunks
buffer user — append text
buffer final — save completed answer

Prompt Pack

Starting prompts. Iterate. Move the system prompt into prompts/system.md so it can be versioned.

system

Answer in short chunks that read well while streamed. Do not reveal hidden reasoning, tool outputs, or private state. End with one clear next action.

user

Help me rehearse a 60-second explanation of our hackathon prototype.

Code Snippet

The pattern shape. Read it, run the matching scaffold, then adapt the idea for your own team.

python

stream = client.chat.completions.create(
    model=deployment,
    messages=messages,
    stream=True,
)
for chunk in stream:
    token = chunk.choices[0].delta.content or ""
    yield token
# ... your turn: connect yielded tokens to SSE or websocket UI

Reference: src/techniques/streaming/ in halla-ai/hackathon-sample-2026

Demo Screens

Three screens that prove the prototype works.

Prompt entry

User asks for interview or pitch-practice help.

Live stream

Answer appears incrementally in a stable panel.

Final answer

Completed answer is saved with retry and copy controls.

Azure budget

Same token pricing as normal gpt-4o-mini chat. Streaming changes UX, not the basic token meter.

Pitfalls

• Symptom: text jumps around. Cause: no stable output panel. Fix: reserve height and append tokens.
• Symptom: hidden data appears. Cause: streaming tool or debug messages. Fix: stream final user-facing text only.
• Symptom: duplicate final answer. Cause: saving every chunk. Fix: buffer chunks and save once.

Possible Extensions

If you finish the 1-day path early, use one question below to make the project more original.

How would your version add a stop button without losing the final state?
How would you stream progress for retrieval or evaluation steps?
How would you test streaming without relying on exact timing?

Document Vision Reader

Evaluation Harness