Original title: Why AI Characters & Virtual Influencers Are the Next Frontier in Video ft Hedra's Michael Lingelbach
Moderators: Justine Moore, Matt Bornstein, a16z
Guest: Michael Lingelbach
Compiled and edited by Janna and ChainCatcher
Editor's Note
Michael Lingelbach, founder and CEO of Hedra, a former Stanford University computer science PhD student and stage actor, combines technology with a passion for performance to lead Hedra in developing industry-leading generative audio and video models. Hedra specializes in full-body, dialogue-driven video generation. Its technology supports a wide range of applications, from virtual influencers to educational content, significantly lowering the barrier to entry for content creation. This article, adapted from the a16z podcast, focuses on how AI technology has transitioned from viral memes to enterprise-level applications, showcasing the revolutionary potential of generative audio and video technology.
The following is the conversation, compiled and edited by ChainCatcher (with some deletions).
TL&DR
- Artificial intelligence is seamlessly connecting consumer and business scenarios. For example, this technology generates baby advertisements to promote enterprise software, highlighting the enthusiasm of enterprises to embrace new technologies.
- Viral meme content has become a powerful tool for startups, such as the "Baby Podcast", which quickly increased brand awareness and demonstrated the ingenuity of marketing strategies.
- Full-body expression and dialogue-driven video generation technology fills the creative gap and greatly reduces the time and cost of content production.
- Virtual influencers like John Lawa create unique digital characters through "Moses Podcast", giving their content distinct personality and appeal.
- Content creators such as "mom bloggers" use technology to quickly produce videos, easily maintaining brand activity and connecting with audiences.
- The real-time interactive video model enables two-way dialogue with virtual characters, bringing an immersive experience to education and entertainment.
- The character-centric video generation technology focuses on individual expression and multi-subject control to meet the needs of dynamic content creation.
- A platform strategy that integrates dialogue, motion, and rendering to create a smooth generative media experience that meets the needs of high-quality content.
- Interactive avatar models enable dynamic adjustment of video emotions and elements, heralding the next wave of innovation in content creation.
1. AI Integration from Meme to Enterprise Applications
Justine: We're seeing fascinating intersections between AI applications in consumer and enterprise settings. A few days ago, I saw a Hedra-generated ad in Forbes featuring a talking baby promoting enterprise software. This also demonstrates that we're in a new era, with businesses rapidly embracing AI technology with great enthusiasm.
Michael: As a startup, our role is to draw inspiration from usage signals from consumer users and transform them into the next generation of content production tools that business users can rely on. In the past few months, some of the viral content generated by Hedra has attracted widespread attention, from early anime-style characters to "baby podcasts" to this week's hot trend - I'm not really sure what it is. Meme is a very effective marketing strategy that quickly captures the minds of users by reaching a large audience. This strategy is becoming more and more common among startups. For example, Cluey, another company invested by a16z, has gained significant brand recognition through viral spread on Twitter. The essence of meme is that technology gives people a carrier for rapid creativity, and short video content has dominated cultural consciousness. Hedra's generative video technology allows users to turn any idea into content in seconds.
2. Why creators and influencers choose Hedra
Justine: Please explain why people use Hedra to make memes and how they use it, and how this relates to your target market?
Michael: Hedra is the first company to deploy a full-body, dialogue-driven generative video model at scale. We've enabled millions of pieces of content creation, and our rapid popularity stems from filling a critical gap in the content creation technology stack. Previously, creating generative podcasts, animated character dialogue scenes, or singing videos was difficult, expensive, inflexible, or time-consuming. Our model is fast and affordable, and it's fueled the rise of virtual influencers.
Justine: CNBC recently published an article about virtual influencers powered by Hedra. Can you give some specific examples of how influencers are using Hedra?
Michael: For example, renowned actor John Lawa (who plays Taco in "The League") has used Hedra to create content from "Moses Podcast" to "Baby Podcast," characters that now possess distinct identities. Another example is Neural Viz, which built a "metaverse" centered around character identity using Hedra. Generative performances differ from simple media models in that they require individuality, consistency, and control to be instilled into the model, which is particularly important for video performance. As a result, we're seeing the unique personalities of these virtual characters take off, even though they aren't real people.
3. Virtual Influencers and Digital Avatars
Matt: I've seen a lot of Hedra videos on Instagram Reels, both featuring completely new characters like the aliens in the Neural Viz series—something only Hollywood productions could achieve in the past—and real people using these tools to expand their digital presence. Many influencers and content creators don't want to go through the hassle of dressing up, adjusting lighting, and applying makeup every time. Hedra allows people like mom bloggers to quickly create videos to convey their message without having to spend a lot of time preparing. For example, they can use Hedra to create content that speaks directly to the camera.
Michael: That's a really important observation. Maintaining a personal brand is crucial for content creators, but staying online 24/7 is incredibly difficult. If a creator stops updating for a week, they risk losing followers. Hedra's automation technology significantly lowers the barrier to entry for creators. Users can generate scripts using tools like Deep Research, then use Hedra to generate audio and video content and automatically publish it to their channels. We're seeing more and more workflows around self-sovereign digital identities, both for real people and completely fictional characters.
4. Potential and Challenges of Interactive Video
Justine: There are a lot of historical videos trending on Reels right now. In the past, we learned about history by reading history books, but that was a bit boring. If we could tell history through characters and show generative video scenes, the experience would be much more engaging.
Michael: Although we don't directly target the education sector, many education companies have developed applications based on our API. Video interactions have much higher engagement rates than text. We recently launched a real-time interactive video model, the first product to achieve a low-latency audio and video experience. From language learning to personal development applications, once the technology cost is low enough, it will completely change the way users interact with large language models (LLMs). My personal favorite project is "Chatting with your favorite book or movie character." For example, you can ask, "Why did you walk into that dark room when you knew there was a murderer?" This interactive experience is richer than traditional audiobooks because users can ask questions and revisit the content, making the experience more vivid.
Justine: The search space for video models is enormous. Generating a single frame of image is already complex, but generating a continuous video of 120 frames is even more challenging. Hedra focuses on a unique and interesting problem that stands apart from other video models. Please describe the definition of this problem and your inspiration.
Michael: That's a great question. We're seeing specialization emerging at the base model layer, with Claude becoming the benchmark for programming models, Open AI providing general-purpose assistants, and Gemini serving enterprise scenarios due to its cost-effectiveness and speed. Hedra has a similar positioning in the video model space. Our base models are highly performant, especially the next-generation models, which provide tremendous flexibility for content creation. However, we're more focused on bringing content to life, encouraging users to interact with it and experience a consistent personality and appeal. The key lies in integrating the intelligence of the characters in the video with the rendering experience. My vision is for users to be able to communicate with the characters in the video in a two-way manner, with the characters possessing unique, programmable personalities. This requires vertical integration, not only optimizing the core models but also rethinking the future user interaction experience.
(V) “Character-centered” video model and subject control
Michael: I come from a theater background. While not a professional actor, I'm passionate about character acting. Video is at the core of our daily interactions, whether it's advertising, online courses, or faceless channels powered by Hedra. A sense of connection is crucial. We're making it easy for everyday users to create content by lowering the barrier to entry and accelerating the process. In the future, the line between model intelligence and rendering will blur, and users will engage in dialogue with systems that understand their intent. We view characters as the core unit of control, not just videos. This requires collecting user feedback, optimizing character realism and expressiveness, and providing control levers for multiple agents.
Matt: I spend a lot of time creating characters for different videos, and the power of Hedra lies in its integrated character creation tools. You can create or upload character images, save them for later use, and even convert their context or clone their voices. Many of my YouTube videos and tutorials feature a Hedra clone of my voice in their opening lines. This integrated experience is particularly valuable in the fragmented generative media market.
(6) Build an integrated generative media platform
Justine: Many companies like Black Forest Labs have achieved technological breakthroughs, but they still need partners like Hedra to deliver the experience to consumers and businesses. How did you decide to build an integrated platform, rather than being limited to a single technology?
Michael: It's about focus and user needs. When I founded Hedra, I found that it was very difficult to integrate dialogue into media. In the past, users had to superimpose lip synchronization to create short videos, which lacked a sense of unity. Our technical inspiration is to unify signals such as breathing and gestures with dialogue to create a more natural video model. From a market perspective, we have observed differences in users' willingness to pay for different applications. Some popular applications may have a low willingness to pay, but certain segments (such as content creators) have a strong demand for high-quality experience. We choose to integrate the best technology, whether it is Hedra's or partners such as 11 Labs, to ensure that users get the best experience.
Matt: In the future, will AI characters have text, scripts, voice, and vision generated by a single model?
Michael: I believe the industry is moving toward a multimodal input-output paradigm. The challenge with a single model is control. Users need to precisely adjust details like voice, pitch, or rhythm. Decoupled inputs offer more control, but the future may be towards omnimodal models, where users can adjust the fit of each modality using guidance signals.
7. The Future of Interactive Video
Justine: I'm impressed by Hedra's ability to generate long videos. You can upload several minutes of audio and generate a character dialogue video, adjusting the image and voice independently, avoiding the waste of resources by generating them all at once. This level of control makes me excited about the future of interactive video.
Michael: I'm excited about the interactive avatar model we just launched. In the future, users will be able to shape video elements like a fluid canvas, for example, pausing the video and asking the character to be sadder during a particular line. This two-way communication will create a next-generation experience and will be available soon.
Matt: Is a true AI actor possible? Users interact with the created character in real time and give instructions.
Michael: Absolutely possible. However, the current limitation isn't the video model, but rather the personality realism of the large language model. Existing AI companions (such as Character AI) still bear obvious traces of modeling. To achieve truly interactive digital characters, further research is needed on configurable personalities.
(8) Hedra's audio generation and AI native applications
Justine: The Hedra's video is stunning, but the audio is sometimes lackluster. 11 Labs' latest model has improved audio quality, but the content's appeal still needs improvement.
Michael: Audio generation is an underexplored field. Currently, generative speech is mostly used for narration or dubbing, but generating natural conversations in scenarios like a noisy cafe remains challenging. We need audio models that can control ambient sound and multi-turn conversations to improve the naturalness of video creation. Video AI is still in its early stages. Just like early CGI effects seemed realistic, they now look like cartoons. Our first-generation models once amazed me, but now they seem crude. Achieving highly controllable, cost-effective, and real-time models is still a work in progress.
Matt: Would users prefer to interact with real humans, simulated humans, or cartoon characters?
Michael: We generated a lot of fluffy balls and cat characters. Hedra's unified model can handle a variety of characters, from rocks to robots, giving users the freedom to experiment and create unprecedented content. We built a unified model, rather than traditional video plus lip sync, to avoid limiting users to technical limitations. Users can try "talking rocks" or "robot-human podcasts," and the model automatically handles the dialogue and personality. This flexibility has inspired revolutionary consumer scenarios.
Justine: The crossover applications of AI are exciting. Consumer-generated content like the "Baby Podcast" is inspiring enterprise applications. I was amazed to see a Hedra-generated baby ad promoting enterprise software in Forbes. This demonstrates how quickly businesses are embracing AI, and we need to translate consumer signals into enterprise-grade solutions.
Michael: Enterprise is our fastest-growing area. Generative AI is reducing the time it takes to create content from weeks to real-time. For example, automated news anchors are changing the way information is disseminated. In the past, local news was elusive due to high costs, but now a single person can run a news channel. This "mid-scale personalization" caters to specific demographics, such as targeted advertising for local restaurants or theme parks, and is more effective than the overly personalized Google model.
9. The Founder’s Path: Challenges, Passion, and Collaborative Innovation
Justine: What has your experience been like as a founder? What challenges and rewards have you encountered?
Michael: In San Francisco, founder life is often glamorized as a romantic journey building groundbreaking technology. Coming from a small town in Florida, I never imagined I'd be on this path. But being a founder is tough 99% of the time. You have to keep pushing, and the problems never stop—from invisible development to facing a flood of support emails. It's physically exhausting, but the inner satisfaction is unparalleled. I love my users and my team and can't imagine doing anything else. It's a kind of "second-level fun"—like climbing a snowy mountain, your hands and feet hurt, but you still want to come back after reaching the top. I get into the office every morning at 7:30 am and leave at 10 pm, sometimes still discussing features at 2 am. This requires giving up the boundaries between work and life, but my passion keeps me going.
Matt: Why do you still code yourself? Is it to express your ideas or to communicate with your team?
Michael: Both. Prototyping helps me quickly validate ideas and clearly communicate expectations. As a leader, clear communication is crucial. I discuss edge cases with designers to ensure the system is scalable. Coding allows me to stay connected with the team, understand their challenges, and quickly explore product directions.