Judging rubric

Judging Criteria

Five dimensions, weighted 35 / 30 / 15 / 10 / 10. This page shows what each dimension looks like at its best and at its weakest, plus the questions tutors will ask before the final pitch.

Weight Breakdown

Technical implementation 35%
Local impact 30%
Feasibility 15%
Responsible AI 10%
Presentation 10%

Per-Dimension Detail

Technical implementation

35%

Judge questions

  • Does the prototype actually run? (Live demo or recorded backup)
  • Are Azure / Foundry resources used (not just mocked)?
  • Is the architecture clear from the slide + repo?
  • Is the code reasonably structured (no global state, secrets in .env)?

Strong example

Working RAG with citations, FastAPI + React UI, README explains data flow.

Weak example

Static screenshots only, no model call evidence, source in a single 200-line file.

Local impact

30%

Judge questions

  • Is the problem rooted in Uzbekistan / TIU context?
  • Are users specific (not "everyone")?
  • Would a real user benefit immediately?
  • Does the team explain who they talked to before building?

Strong example

TIU first-year FAQ with quotes from 5 actual student interviews.

Weak example

Generic "AI for education" pitch with no specific user.

Feasibility

15%

Judge questions

  • Can this run beyond the hackathon without major rework?
  • Is the cost model realistic?
  • Can the team explain the next 4 weeks of work?
  • Are there clear blockers / dependencies named?

Strong example

Cost estimate per user per month, named handoff partner, clear next milestone.

Weak example

"We will scale to 10 million users" without cost or distribution plan.

Responsible AI

10%

Judge questions

  • Is there a 5-line risk register in the README?
  • Is hallucination addressed (refusal patterns, citation, evaluation)?
  • Is privacy addressed (no real personal data uploaded)?
  • Is cost addressed (token budget, idle stop)?

Strong example

Risk register names prompt injection, hallucination, privacy, cost. Each row has a mitigation.

Weak example

No risk register, or generic "we will be ethical" statement.

Presentation

10%

Judge questions

  • Does the 3-minute pitch follow problem → user → demo → impact?
  • Is the slide 5 pages or fewer?
  • Does the team answer Q&A clearly?
  • Is the demo backup ready if live fails?

Strong example

3 minutes flat, clear story, demo works first try, Q&A answered without hedging.

Weak example

Overruns to 5 minutes, jargon-heavy, demo crashes with no backup.

Q&A Simulation

Tutors will run a 5-minute Q&A drill with each team at T-2 (June 18 or 19). Prepare these five questions.

Q1

Who is the actual first user, and how many of them are there at TIU?

Why it is asked: Tests user specificity. Generic answers ("any student") lose points.

Q2

Show me a query the model refuses. Explain why it refuses.

Why it is asked: Tests the responsible-AI guardrails and the prompt itself.

Q3

What would happen if your API key leaked tomorrow?

Why it is asked: Tests security awareness. Good answer: rotate, audit, change scope.

Q4

How much would it cost to serve 1,000 users per month?

Why it is asked: Tests feasibility. Good answer cites token math + Azure pricing.

Q5

If this hackathon ended now, what would you build in the next week?

Why it is asked: Tests product thinking. Good answer is concrete: one feature, one user.

Score Bands

4.5 - 5.0 Top tier Working prototype + crisp pitch + real user evidence + responsible AI.
3.5 - 4.4 Solid Working prototype + 2-3 strong dimensions but at least one is generic.
2.5 - 3.4 Promising Partial prototype or strong concept without working demo.
Below 2.5 Needs work No working demo, or pitch missed core dimensions. Coach to revise for the second semester.