Prompt entry
User asks for interview or pitch-practice help.
All samples · Streaming
A token-streaming pattern for demos where perceived responsiveness matters.
Problem
Students think the app is frozen when a full answer takes several seconds, especially during live Q&A or practice interviews.
Users
Students practicing interviews, tutors giving live feedback, and teams building chat-first demos.
Why this track
This extends the basic LLM API lesson into a richer interaction pattern without needing a new Azure service.
Stay minimal. 5-6 nodes. Each arrow is one network hop.
user
Chat UI
api
Streaming endpoint
model
gpt-4o-mini stream=True
buffer
Token buffer
final
Final answer store
Edges
Starting prompts. Iterate. Move the system prompt into prompts/system.md so it can be versioned.
Answer in short chunks that read well while streamed. Do not reveal hidden reasoning, tool outputs, or private state. End with one clear next action. Help me rehearse a 60-second explanation of our hackathon prototype. The pattern shape. Read it, run the matching scaffold, then adapt the idea for your own team.
stream = client.chat.completions.create(
model=deployment,
messages=messages,
stream=True,
)
for chunk in stream:
token = chunk.choices[0].delta.content or ""
yield token
# ... your turn: connect yielded tokens to SSE or websocket UI
Reference: src/techniques/streaming/ in halla-ai/hackathon-sample-2026
Three screens that prove the prototype works.
User asks for interview or pitch-practice help.
Answer appears incrementally in a stable panel.
Completed answer is saved with retry and copy controls.
Same token pricing as normal gpt-4o-mini chat. Streaming changes UX, not the basic token meter.
If you finish the 1-day path early, use one question below to make the project more original.