Most app security reviews start after the awkward moment. A customer asks what the product can access. A technical user notices a strange route. A buyer wants to know how uploads are stored. Someone finds a loose endpoint, and the founder has to explain what was checked before launch.
That sequence is backwards for AI-built apps.
The first review should happen earlier, while the claim can still be narrow. It does not need to pretend a tiny product has a mature security program. It should answer a smaller question: what can we prove about the deployed app from the outside today?
Over the last few days, Probe looked at a group of recently shipped AI and vibe-coded products from the public surface only. The scope was deliberately limited to homepages, docs, legal pages, support pages, contact pages, browser-visible copy, and HTTP response headers. We did not sign up, log in, submit forms, upload files, touch payments, test auth bypass, use credentials, clone repos, run exploits, or inspect private workflows.
That limit matters because it controls the claim. A public-surface pass cannot prove an app is secure. It should not be used to call a company vulnerable without stronger evidence. A marketing site with HSTS is not a security program. A broad-looking header on one public page is not automatically a data exposure.
But the public surface can still show whether anyone has looked at the deployed app before asking users to trust it.
The products in the set were not simple brochure sites. They included AI assistants connected to high-scope tools, Claude and MCP workflows that can trigger real work, document and CV workflows, image and file tools, SMS alerts, payment and billing surfaces, repo-connected marketing workflows, Cloudflare operations tooling, and products that turn customer or public data into generated output.
That is why the timing matters. AI coding tools can move a product into sensitive territory before the company looks serious from the outside. A solo founder can ship something that touches files, images, repositories, calendars, phone numbers, payments, model calls, or customer workflows in the same week the landing page goes live.
The review has to move earlier because the risk moved earlier.
The most useful first pass is not dramatic. It is a dated baseline that asks which routes respond from the outside, which security headers are present or missing, whether browser JavaScript exposes secret-like values, whether obvious admin or debug routes answer in production, whether endpoints that appear to trigger paid work seem reachable without a session, whether CORS looks too broad on sensitive surfaces, and whether webhook, billing, upload, storage, and LLM boundaries deserve a deeper check.
Several products in the public-surface set had signs that someone had already touched the boring parts. We saw examples of CSP, HSTS, frame blocking, nosniff, referrer policy, restrictive permissions policy, and browser isolation headers such as COOP, COEP, and CORP.
That does not make those products safe. It means the first evidence point is better than a blank page. It gives the founder something precise to say: here is what we checked on this date, here is what looked deliberate, and here is what still needs deeper review.
The next layer is where the real product risk starts.
For an agentic workflow product, the important question is what the agent can trigger, what permissions it receives, what gets logged, and what separates a suggested action from an executed one.
For a product connected to Gmail, Slack, Notion, GitHub, Calendar, Drive, Jira, Linear, or Zoom, the question is what scopes are requested, why those scopes are needed, and what happens when a customer revokes access.
For document, CV, image, audio, or file workflows, the question is where files go, how long they stay there, whether they enter an LLM prompt, who can retrieve them later, and whether deletion means anything real.
For payment, SMS, billing, and AI generation workflows, the question is who can call the expensive path, how often, with what session or signature, and what stops one bug from becoming a bill.
Those are not enterprise-only concerns. They are early founder concerns because they affect trust, support, sales, and cost before the company has a process for handling any of them.
The mistake is waiting until someone else asks first. By then, the founder is explaining from a defensive position. A security page gets written in a rush. A buyer questionnaire gets answered from memory. A support reply says the team takes privacy seriously without showing what was actually checked.
Proof-first review gives the founder better language because it starts with evidence instead of posture.
Good language is narrow. We checked the public app surface on this date. We reviewed headers, browser JavaScript, obvious production routes, and unauthenticated behavior visible from outside the app. We did not test private workflows, upload user files, touch payments, connect integrations, or attempt bypasses. These are the boundaries that need deeper review before more sensitive usage.
That kind of statement is less glamorous than calling the product secure. It is also more credible.
Before putting real users, uploads, payments, repositories, calendars, images, phone numbers, or API spend behind an AI-built app, run one outside-in pass:
- check that secret-like values are not shipped in browser JavaScript
- check that admin, debug, billing, upload, webhook, and LLM routes are guarded
- check that CORS is tight on sensitive endpoints
- check that paid API and model paths have auth and rate limits
- check that webhook signatures are verified
- check that storage and database policies are not public by default
- check that privacy, retention, deletion, and OAuth-scope claims match the product
- check that the security headers are not missing from production
None of this makes the demo feel better. That is exactly why it gets skipped. The demo can work while a secret is public, a bucket is open, a webhook is unsigned, or an expensive model endpoint has no rate-limit signal.
Probe exists for the gap after an AI-built app works and before strangers start testing the parts the builder did not know to check.
Run a scan with Probe. Better to find the boring stuff before the internet does.