Technical Product Manager — AI Voice and Speech Technologies
Upwork

Remoto
•3 hours ago
•No application
About
We are a New York-based AI startup building a revolutionary language learning platform. Led by a founder who previously scaled a company to $100M+ in revenue, we’re creating a personalized, content-driven experience for users around the world. We’re looking for a Technical Product Manager with experience in AI voice and audio technologies to lead advanced research initiatives that will define the future of Ella’s learning experience. This role sits at the intersection of product sense, speech engineering, and experimentation. Your work will push the boundaries of how voice can be adapted, slowed, shaped, and personalized for language learners at every level. Scope of work: You’ll own research and product exploration across Ella’s voice layer designing methods that make AI-generated speech more intuitive, more learnable, and more human. You’ll turn ambiguous voice problems into structured research tracks, then drive them to concrete, testable solutions. You will: - Define audio research directions such as controllable slow speech for beginners, adaptive prosody shaping, emotional tone tuning, and personalized voice variants. - Work closely with AI engineers to investigate and validate approaches in TTS prosody manipulation, phoneme-level timing control, and model-based speech rate regulation. - Break down complex voice problems into experiments, create clear success criteria, evaluate outputs, and iterate based on both learner needs and linguistic constraints. - Translate product/user needs into technical requirements, especially around clarity, comprehension, cognitive load, and learning efficiency of AI-generated speech. - Build internal tooling requirements for controlling parameters like speaking rate, intonation, emphasis, rhythm, and timing intervals across different languages. - Benchmark different TTS models, evaluate performance on clarity, latency, linguistic accuracy, and cross-language consistency. - Document findings into repeatable frameworks, enabling Ella to scale voice research across languages, accents, and proficiency tiers. Your work will directly shape how millions of learners hear language — transforming voice from a static output into a fully adaptive, hyper-personalized learning companion. What we are looking for: - Proven experience managing technical AI/audio projects, especially around TTS, prosody, or speech processing. - Ability to break down research hypotheses and translate abstract user needs into measurable audio behaviors. - Experience collaborating with speech/ML engineers and evaluating model quality through structured experiments. - Familiarity with AI voice tooling (ElevenLabs, Coqui, OpenAI TTS, etc.) and/or low-level DSP concepts. This is a freelance/part-time, project-based role with potential for long term and full time position. Fully remote.
Adzuna



