AI Startup? You Need to Think about Scale from Day 1

The old playbook

The era of “move fast, fix infra later” is over for AI companies.

For early-stage startups, the playbook was straightforward: pick whatever infrastructure gets you moving fastest. That usually meant one of the big cloud providers, especially since they offered startup credit programs worth five to six figures. Spin up an EKS or GKE cluster, set up some Helm charts, and you’re good to go. Infrastructure becomes something you don’t worry about, so you can focus on your actual product until you either hit real scale, run out of credits, or both.

And yes, you’d now be locked in, and probably using inefficient infra that would bite you in the butt later, but by then you will have figured out PMF, and with that, the necessary revenue to hire a top notch infra person or team to sort out not only the technical debt but the perhaps broken unit economics (it’s a VC-based world after all right? Revenue first, profitability second).

When it breaks

The AI era has, however, radically disrupted this model: It is now very common for early stage (AI) startups to hit scale almost immediately – in months, not years, timeframes. The canonical example is Lovable, which had to worry about scale almost from day 1 (or call it month 1), and will always have (efficient) infra scale as part of its core business targets. But it’s not just Lovable of course; here’s a breakdown of the top 10 AI startups and what happened to them in the first 12 months:

CompanyRevenue (First 12 Months)

Lovable

$100M

Cursor

$100M

Mistral AI

$100M

Jasper

$90M

Perplexity

$80M

Cognition (Devin)

$73M

Harvey

$65M

Midjourney

$50M

Runway

$50M

ElevenLabs

$35M

Avg Revenue

$74M

First 12 months

Avg Team Size

185

Employees

Top Revenue/Head

$2.22M

Lovable

*Estimates based on publicly available sources. These figures may not be 100% accurate but illustrate the remarkable scale achievable within 12 months.

And if you think this is reserved only to those who hit it big, think again:

It’s not just the number one or two companies — the whole batch is growing 10% week on week. That’s never happened before in early-stage venture.

Gary Tran, YC

ARR ramps of enterprise GenAI startups in year one

$6M$5M$4M$3M$2M$1M$0M

Month 1Month 3Month 6Month 9Month 12

Bottom quartile

Median

Top quartile

Source: a16z Revenue Benchmarks for AI Apps

What AI infra must optimize for

So what should infra provide in this new frontier of AI scale? AI workloads come with 5 fundamental characteristics that AI infra must optimize for:

Unprecedented scale: millions of instances/sandboxes/projects created per month or even per week, putting huge strain on legacy infra
Ephemerality: instances are often not running or doing useful work. Legacy infra (warm pools anyone?) often forces you to leave everything on.
Persistence: most AI workloads are stateful; any scale to zero mechanisms need to be able to resume from where they left off, and to do so fast, within milliseconds, so as not to impact the UX (or these days AX, agent experience).
Security: it goes perhaps without saying that you shouldn’t trust whatever AI workload is running on your infra. In this new AI YOLO landscape of deploy first, analyze consequences second, strong isolation is table-stakes; if a provider relies on containers/isolates for multi-tenant untrusted code, understand the tradeoff.
Unit economics: because of its scale, AI workloads arguably break, or at least put severe stress on, unit economics, as most infra was not built for scale that is exponentially higher than anything that came before (or rather: it was built for this scale, as long as you have the deep pockets needed to pay for it).

What to do instead

Now you may be thinking: crap, now I have to worry about my product and about scale/infra, all at once, as an early startup without the team size needed to do so? The first instinct might be to sweep this under a carpet, burn hyperscaler credits fast and burn revenue thereafter even faster – as a founder I’d 100% empathize with this thinking and would have this totally natural tendency too.

The point is, there shouldn’t be a mutually exclusive choice between getting started quickly and painlessly and using infra that doesn’t set you up for disaster (or at the very least serious technical debt) a few months later.

Given these radically different characteristics of AI workloads, we believe that it’s time to introduce a new class of cloud infra to cater to their needs. At Unikraft we have redesigned what cloud infrastructure should look like for AI-native companies: large scale, efficiency and security baked in, from day 1, as first class citizens, not afterthoughts. Unikraft Cloud is a next generation cloud platform that provides exponentially better performance and efficiency.

Unikraft Cloud allows you to:

Start any workload or environment you can define in a Dockerfile in roughly 10 milliseconds.
Transparently scale instances to zero also in milliseconds, wake them up in milliseconds, and do so statefully.
Cram up to 1M scale-to-zero instances on modern server class hardware: cater to your large and soon to be exploding user base with minimal infra, now and forever.
Provide your clients with strong isolation: all instances run as virtual machines.
Transparent k8s integration – need k8s? No problem! Run on your own cluster, on EKS or GKE.

6 questions to ask your infra provider

If you’re building AI products, make sure to ask your infra provider these 6 questions:

What are the constraints and tradeoffs of your infra?” (e.g., runtime/language limitations, networking model, storage and snapshot semantics, max concurrency per tenant, cold-start variance at p95/p99, region coverage, quotas, noisy-neighbor isolation model)
How quickly can workloads (cold) start?
Will I need to resort to warm pools in order to make sure that response times are low?
Will my large scale result in large bills that get worse as I grow?
Is each workload running in a (strongly-isolated) virtual machine? Are there any security shortcuts being taken (containers, isolates, etc) in order to provide good performance?
Can workloads be scaled to zero to deal with idleness? If so, can they be woken up in milliseconds, and to do so statefully, so work can resume exactly where it left from?

The punchline

If you’re getting unsatisfactory answers to these questions, or you feel you don’t even need to ask them because you know you won’t be happy with what you’ll hear, drop us a line, take our platform out for a spin at unikraft.com, or let us set up a k8s or other cluster for you to show you that the numbers above are real. At Unikraft, we believe no-one should have to compromise between amazing infra and ease of set up – not anymore. Zero (infra) technical debt from day 1.

The old playbook

When it breaks

What AI infra must optimize for

What to do instead

6 questions to ask your infra provider

The punchline

Related Posts