Aizip Intelligent Audio

Aizip provides a wide range of AI models for audio-related applications, such as keyword spotting, speaker identification, and deep noise reduction.

 

Selected examples below. More available upon request

Deep Noise Reduction ZenVoice

Deep Noise Reduction ZenVoice

This deep noise reduction (DNR) model, ZenVoice, provides excellent performance in removing surrounding noise during voice calls.

This model with excellent specs requires low cost processors like ARM Cortex M7 and up, or comparable RISC-V, NPU, DSP, and FPGA devices.

This model handles various types of noises, including wind noise without a fixed source. It supports single and multiple microphones.

Key-and Wake-Words Spotting

Key- and Wake-Words Spotting

This model recognizes pre-defined Wake-Words to wake up systems and pre-defined Keywords for executing commands.

This class of highly efficient tinyML models can run on ARM Cortex M0 and up processors, or comparable RISC-V, NPU, DSP, and FPGA devices.

These models can be further developed to provide Spoken-Language Understanding (SLU) and Automatic Speech Recognition (ASR).

Speaker Identification

Speaker Identification

This model can identify a speaker from a list of registered people and perform the identification in real-time.

This efficient and robust model can run on ARM Cortex M0 and up processors, or comparable RISC-V, NPU, DSP, and FPGA devices.

Two models are available, one for fixed words and another for general speaking, for diversified applications from toy to customized service.

Baby Crying Detection

Baby Crying Detection

This model can accurately detect the sound of baby crying for baby safety and caring, with very low false-positive rate.

This efficient and robust model can run on ARM Cortex M0 and up processors, or comparable RISC-V, NPU, DSP, and FPGA devices.

This model can support the detection of other sound patterns in events, such glass breaking, snoring, and gun-shots.

Song-to-Karaoke Conversion

Song-to-Karaoke Conversion

This model can turn songs into karaoke, all being done live and local, thus provide excellent user experience.

This excellent entertainment model requires ARM Cortex A series and up processors, or comparable RISC-V, NPU, DSP, and FPGA devices.

This model can be used to extract out other types of sound, such as a guitar, to support diversified tools for music rehearsals.