View Robot Barista
ROBOT
Espretto Introduces AI-Powered Virtual Human: Meet the Future of Coffee Service
April 2026
5 min read

Espretto Introduces AI-Powered Virtual Human: Meet the Future of Coffee Service

Evonova Group has announced the development of an AI-powered interactive virtual human as an optional feature for its Espretto robotic barista kiosks — powered by NVIDIA ACE (Avatar Cloud Engine) and displayed on a striking 55-inch freestanding screen.

The result is a new kind of customer interaction: a lifelike, voice-responsive digital host that greets customers, takes orders, answers questions, and guides the coffee experience — all in real time.

What Is NVIDIA ACE?

NVIDIA ACE (Avatar Cloud Engine) is a suite of real-time AI technologies designed to bring digital humans to life. It combines several advanced AI models into a single, seamless pipeline:

Riva ASR (Automatic Speech Recognition) converts spoken language to text with low latency, enabling natural, fluid conversation.

NeMo LLM (Large Language Model) powers the virtual human's intelligence — understanding context, answering questions, and responding with nuance.

Riva TTS (Text-to-Speech) generates a natural, expressive voice from the AI's responses in real time.

Audio2Face animates the virtual human's face in sync with speech, producing lifelike expressions, lip movements, and micro-gestures.

Together, these components create an avatar that doesn't just talk — it listens, understands, and responds like a person.

The Hardware: A 55-Inch Interactive Display

The virtual human is rendered on a 55-inch freestanding display screen positioned alongside the Espretto kiosk. The screen is powered by an RTC (Real-Time Computing) chip, which handles the low-latency processing required for fluid, real-time avatar animation and voice interaction.

The display is designed to be visually striking — tall, slim, and premium — making it a natural focal point in any environment. Whether positioned in a corporate lobby, airport terminal, or retail space, it commands attention and invites interaction.

Customers simply speak naturally. The virtual human responds within milliseconds, creating a conversational experience that feels genuinely human.

An Optional Add-On for Espretto Partners

The virtual human display is offered as an optional feature for Espretto franchise and commercial partners. It is designed to complement the robotic kiosk rather than replace any part of it.

For high-traffic or premium locations — airports, flagship corporate campuses, luxury retail environments — the virtual human adds a layer of engagement and brand presence that a standard kiosk cannot match.

For partners who prefer a simpler deployment, the core Espretto kiosk operates fully independently without it.

What the Virtual Human Can Do

The NVIDIA ACE-powered avatar is capable of a wide range of interactions:

Menu guidance — Describing drinks, ingredients, and customisation options in natural language.

Order taking — Accepting voice orders and confirming them before brewing begins.

Recommendations — Suggesting drinks based on time of day, customer preferences, or current promotions.

Brand storytelling — Sharing the Espretto story, the ORO No.01 blend, and the technology behind the kiosk.

Ambient presence — When not actively engaged, the avatar maintains a calm, welcoming presence — far more inviting than a static screen.

Why This Matters

The coffee industry has long grappled with a tension between automation and human warmth. Customers value speed and consistency, but they also value being acknowledged and guided.

The NVIDIA ACE virtual human resolves this tension. It delivers the efficiency of automation with the warmth of human interaction — at scale, with perfect consistency, and without the cost of additional staffing.

For operators, it also opens new possibilities: the avatar can be customised with different personas, languages, and personalities to suit the location and brand.

A Glimpse of the Future

Espretto's virtual human is part of a broader vision for what the café of the future looks like: intelligent, interactive, and deeply human in feel — even when no human is present.

As NVIDIA continues to advance its ACE platform and as real-time AI becomes faster and more capable, the gap between digital and human interaction will continue to narrow.

Evonova Group is proud to be among the first in the hospitality sector to deploy this technology at a commercial scale — bringing the future of coffee service to life, one conversation at a time.


The Espretto virtual human display is available as an optional add-on for commercial and franchise partners. For enquiries, contact [email protected].

Evonova Group