Image upload
User selects a small public or synthetic image.
All samples · Multimodal vision
Use gpt-4o-mini image input to extract structured fields from a form, screenshot, or poster.
Problem
Teams often receive information as images. Manually copying fields slows the workflow and creates mistakes.
Users
Students submitting forms, staff checking event materials, and teams processing public screenshots.
Why this track
This practices multimodal input from the curriculum using a currently verified service path. It does not require Azure AI Vision by default.
Stay minimal. 5-6 nodes. Each arrow is one network hop.
image
Public or synthetic image
encoder
Data URI encoder
model
gpt-4o-mini vision input
schema
Extraction JSON schema
review
Human confirmation screen
log
Safe demo log
Edges
Starting prompts. Iterate. Move the system prompt into prompts/system.md so it can be versioned.
Extract only visible information from the image. Return JSON with title, detected_fields, missing_fields, confidence, and next_step. Use null when a field is not visible. Do not infer private identity details. Image: sample student project form screenshot. The pattern shape. Read it, run the matching scaffold, then adapt the idea for your own team.
response = client.chat.completions.create(
model=deployment,
messages=[{"role": "user", "content": [
{"type": "text", "text": EXTRACTION_PROMPT},
{"type": "image_url", "image_url": {"url": data_uri}},
]}],
response_format={"type": "json_object"},
)
fields = json.loads(response.choices[0].message.content)
# ... your turn: add a confirmation screen before saving
Reference: src/techniques/vision_multimodal/ in halla-ai/hackathon-sample-2026
Three screens that prove the prototype works.
User selects a small public or synthetic image.
Detected fields, missing fields, confidence, and next step.
User edits or rejects extraction before it enters the project flow.
Use the existing gpt-4o-mini deployment. Keep images small and test with 3-5 examples; do not batch private documents.
If you finish the 1-day path early, use one question below to make the project more original.