Meet The Founding Engineer Trying To Make “AI Netflix” A Reality

Marcus White
9 Min Read

The past year has turned “AI video” into a crowded category: slick demos, short clips, and endless model launches. But shipping a product that reliably generates longer, coherent video stories is a different problem. It looks less like a single breakthrough model and more like a carefully engineered pipeline.

That’s the problem Jayesh Gaur has been focused on since joining Story.com early as a founding engineer. Story.com is aiming to build an “AI Netflix” for creation. An AI-native studio where users can generate and edit narrative movies end-to-end in a single workflow, an approach the team accelerated during HF0 Winter ’24, a selective accelerator focused on AI companies.

Gaur describes his niche as applied generative AI for media, turning fast-moving research into production systems that normal users can actually depend on. Story.com’s bet: while much of the market optimizes for a single “wow” shot, there’s a large gap between impressive clips and full, repeatable storytelling workflows.

“We’re applying the research that comes out every month,” Gaur said. “The speed was really important for us since this is an extremely competitive market.”

A different kind of “AI video” problem

Many consumer AI video products are judged by one output: a few seconds that look great on a feed. But users quickly discover the hard part is everything around that output. Latency, reliability, coherence, cost, and the ability to revise what they generated without starting over.

Gaur framed it plainly after describing his responsibilities: “I’m responsible for the end-to-end engineering backbone. Our core AI systems, the backend services that run them, and the infrastructure that keeps the product reliable at scale.”

That mix of models plus backend matters because video generation isn’t a single action. In practice, it’s a multi-stage pipeline:

See also  Even If You Don't Consider Yourself “Technical”, There's a Lot You Can Do with AI These Days 

Prompt → Story Structure → Assets (images/video/audio) → Assembly → Rendering → Delivery

Each stage introduces failure modes. A model might produce an off-tone line, a scene might drift from the plot, an asset might fail to generate, a queue might back up, or a content filter might flag a borderline frame. If a product wants users to come back weekly, the system has to handle it all. Guardrails, logging, retries, monitoring, and cost controls together without the user feeling like they’re operating a fragile research demo.

“Full-form” content, not just clips

In conversation, Gaur drew a contrast between what Story.com has aimed for and what much of the market still showcases.

“Plenty of tools can spit out a few impressive seconds,” he said. “The harder problem is producing a coherent story end-to-end. And that’s what we’ve built for.”

That choice forces a different technical architecture. A short clip can be judged shot by shot. A longer narrative has to hold together: characters stay consistent, scenes connect, pacing works, audio doesn’t feel bolted on, and the output lands as a coherent “value movie,” not a collage of unrelated moments.

Gaur says Story.com’s differentiator is less about a single proprietary model and more about a story generation pipeline that sequences components, synchronizes modalities, and repeatedly refines outputs until they meet product constraints.

That orchestration layer is where a lot of “AI video” quietly succeeds or fails. It determines whether a product feels like a studio workflow or like a slot machine that occasionally spits out something interesting.

“Applied” generative AI: shipping with what exists

Gaur is explicit that Story.com isn’t positioning itself as a research lab.

“We focus on productizing the frontier,” he said. “Taking what’s new and making it work reliably at scale.”

See also  Artificial Intelligence and Your Team: Advice From an AI Expert

That posture of pragmatic application has become a common pattern among fast-moving startups in generative AI: assemble best-available models, then build defensibility through orchestration, UX, and infrastructure that improves with each production cycle.

It also changes what “innovation” looks like. In Gaur’s framing, innovation is system design: making long-form generation faster end-to-end, making it more controllable, reducing failure cases, and keeping the cost structure viable as usage scales.

Story.com’s approach is guided by a simple internal rule: speed matters, but “fast” has to mean time-to-complete for a coherent result, not just a cherry-picked benchmark. A product can generate a pretty frame quickly and still lose users if the overall workflow is half-baked, brittle, or expensive.

Traction is a systems test

For consumer products, traction isn’t just a growth story; it’s a stress test. Story.com says the platform has crossed 500,000 monthly active users, and that scale forces engineering maturity in a way demos never do. Reliability issues become churn. Cost issues become existential. And content policy edge cases become daily operational work.

One signal of repeat usage is the presence of power users who treat the product like a workflow rather than a novelty. Story.com points to at least one user who has generated roughly 8,000 stories and spent around $4,000 on the platform. A clear indicator that, for some customers, long-form generation is already becoming a habit.

For the builders behind the system, that combination of consumer traffic plus growing expectations raises the bar. It’s no longer enough to generate something once. The pipeline has to work again and again, for many different user intents, without spiraling cost or breaking coherence.

A builder’s scope that increasingly looks like leadership

Gaur joined early, and his scope has expanded from core backend work into broader ownership of the systems that power long-form generation. While his formal title is founding engineer, his day-to-day increasingly reflects an engineering lead role driving execution, planning sprints, coordinating ownership across the team, and making key decisions across backend, orchestration, and infrastructure.

See also  Did AI Just Kill Manual Data Entry?

“I help set the technical direction and keep execution moving, planning work, coordinating ownership, and making the core architecture decisions behind the pipeline,” he said.

In many small teams, that hybrid ownership is exactly what “AI engineer” means in practice: not just working on models, but building the glue and reliability layer that turn a capability into a product. Models improve rapidly, but product trust compounds slowly through monitoring, safety checks, evaluation loops, and hard-won operational learnings that accumulate over thousands of generations.

What’s next: from outputs to systems

If there’s a through-line in Gaur’s perspective, it’s that the generative AI industry is moving from single outputs to systems. The first wave rewarded dazzling demos. The next wave may reward workflows that users return to in categories like long-form media, where “good enough” requires consistency, control, and iteration.

“There are two phases,” he said. “Model breakthroughs, and then the application layer that makes them practical for real users.”

Story.com’s pitch is that AI movies won’t become mainstream because one model got slightly sharper. They’ll become mainstream when the end-to-end pipeline feels stable: generate, edit, regenerate, and ship without the user fighting the machinery.

For Gaur, that’s the real work: building the plumbing that makes long-form AI storytelling feel less like a demo and more like a product.

 

Share This Article
Marcus is a news reporter for Technori. He is an expert in AI and loves to keep up-to-date with current research, trends and companies.