Implemented authenticated demo

Prompt Cost Guard

A lightweight AI governance demo that estimates token volume and cost, returns an allow/review decision, and records the check without calling a paid model.

Status Implemented demo
AWS focus
CognitoAPI GatewayLambdaDynamoDBBudgets
AWS
Prompt text
API
Cognito + API Gateway API Gateway
L
Lambda cost estimator Lambda
DDB
DynamoDB TTL record DynamoDB
AWS
ALLOW or REVIEW decision

Problem

Generative AI demos can become expensive quickly if public users can send unlimited prompts to paid models. The safer first step is to put the budget decision in front of inference instead of after the bill arrives.

Design

  • Cognito limits access to approved reviewer identities.
  • Lambda estimates input and output tokens from the submitted prompt.
  • A model tier determines the estimated per-thousand-token cost.
  • The API returns ALLOW for low-cost requests and REVIEW when the prompt exceeds the public demo threshold.
  • DynamoDB stores a short-lived audit record with the estimate, decision, and selected tier.

Cost controls

No paid model is invoked. Reviewer accounts are limited to three estimates per day, prompts are capped at 4,000 characters, and the estimate record expires automatically. This keeps the live demo effectively free while showing how a real Bedrock gate would be designed.

Next step

A production version could add a Bedrock model allowlist, AWS Budgets alerts, per-user monthly spend records, and prompt template versioning before enabling real inference.