Navigating the Future of Synthetic Speech

On Friday, March 29, 2024, OpenAI released a post concerning the challenges and opportunities facing synthetic speech. The article delves into the potential of speech synthesis technology while also highlighting the need to address societal resilience against the evolving challenges posed by increasingly convincing generative models. To tackle concerns surrounding the potential misuse of synthetic voices, they propose four steps going forward.

With over two decades of experience in the synthetic speech industry, CereProc has gained valuable insights into ethical and safety measures. In this article, I aim to demonstrate how CereProc aligns with the suggested steps outlined by OpenAI, illustrating our commitment to responsible innovation in speech synthesis technology.

1. Phasing out voice-based authentication as a security measure for accessing bank accounts and other sensitive information.
The first suggestion underscores the growing vulnerability of voice-based authentication to synthetic voice manipulation. CereProc has taken proactive measures to mitigate this risk, implementing stringent protocols to prevent the misuse of our technology. CereProc strictly prohibits users from creating voices with the intent of impersonation. Our CereVoice Me voice cloning service does not allow the upload of pre-recorded audio; rather, all data must be recorded by the user using our designated CereVoice Me recording interface. This measure helps prevent users from utilizing audio of different speakers, or audio they do not possess rightful ownership of, to construct a synthetic voice.

Furthermore, our unit selection synthesis output is equipped with watermarks, ensuring traceability, and distinguishing it as synthesis from CereProc. Continual refinement of these security measures remains a priority for CereProc to safeguard against emerging threats in voice authentication systems.

2. Exploring policies to protect the use of individuals' voices in AI
At CereProc, we adhere strictly to ethical guidelines in the development of voices for AI applications. Our voices are crafted exclusively from legally obtained audio data, and we always obtain consent from the original speaker before making any voices publicly available. While all CereProc voices are recorded for the specific purpose of building text-to-speech functionality, it is essential to acknowledge that we cannot exert full control over how users employ these voices. Consequently, the implementation of cross-border legislation is crucial to ensure the protection of individuals' voices in AI applications.

3. Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
While we haven't been as proactive in this area before, we've recently begun making a focused effort to share information about our technologies and their ethical implications through our website and blog. By sharing more and more content on our platforms, we aim to raise awareness and enhance public understanding of the capabilities of our AI technologies. Our aspiration is that this will help empower individuals to make informed decisions in their interactions with AI.

Moving forward, you can expect to see more from us when it comes to publishing educational material. Keep an eye out for our future posts here.

4. Accelerating the development and adoption of techniques for tracking the origin of audio-visual content, so it's always clear when you're interacting with a real person or with an AI.
The final suggestion involves expediting the development and adoption of techniques for tracking the origin of audio-visual content. At CereProc, we are working to improve traceability through watermarking. Currently, our unit selection voices are watermarked, enabling us to identify and track them effectively.

Looking ahead, we intend to extend watermarking to all our AI voices. I believe that discerning whether you're interacting with a real person or AI synthesis should occur at a higher level than the synthesis engine itself, such as within an application or at the initiation of a phone call to your bank. This distinction can be facilitated through the presence of a watermark.

To conclude, as pioneers in the synthetic speech domain, CereProc remains steadfast in our commitment to ethical innovation and responsible AI deployment. By aligning with the proposed steps and leveraging our expertise, we strive to foster societal resilience against the challenges posed by synthetic voices while harnessing the transformative potential of speech synthesis technology. Together, through collaboration and action, we can navigate the evolving landscape of AI-driven communication with both integrity and transparency.

If you have any further questions, please don’t hesitate to leave a comment, or contact us at info@cereproc.com.

All trademarks, registered trademarks, or service marks belong to their respective holders.

Voice Demo

Navigating the Future of Synthetic Speech

Get in touch with CereProc's speech synthesis experts today