-
AudioVisual Recognition
(Embedded)
(Server Based)
(Combination of Speaker, Speech, Face Recognition, and Object Detection and Recognition with a single interface)
RecoMadeEasy® (Reco Made Easy)
Server-Based AudioVisual Recognition
Platform:
RecoMadeEasy® (Reco Made Easy) Embedded AudioVisual Recognition is an embedded natural language voice and video recognition engine that offers comprehensive conversational voice interaction, voice biometrics and facial recognition. The engine has a small memory footprint and is designed to run natively on devices that seek unconstrained natural language interfaces with high recognition accuracy in the presence of service interruption or when full, uninterrupted and secure access to a cloud server is not guaranteed.
The RecoMadeEasy® AudioVisual Recognition engine is comprised of three distinct technologies: Speaker, Speech, and Facial Recognition, which have been developed in our research labs in New York. When presented with an audio, video, or audio-video stream, the engine via the API returns the following in either XML or JSON:
- Speaker Segmentation of Incoming Audio, Video, or Both (including timestamps of the location where the speakers change and tagging of each audio, video, or combined segment with the ID of the person speaking in that segment)
- Standalone engine which may be used through a very simple
C++ SDK and API. This would be most useful for integrating
the engine into current products and IVR systems.
- Audio and/or Visual Identification of speaker(s)
- Audio and/or Visual Verification of speaker(s)
- Full Transcription of the audio stream
The engine is built to allow users to speak naturally and be
understood – even in a far-field, noisy
environment. RecoMadeEasy® (Reco Made Easy) is available as an SDK with an
included API that contains all necessary components for full
integration and enables engineers to get started easily and
without any work or costs for development.
The RecoMadeEasy® AudioVisual Reocgnition engine is also available as a server-side and a standalone product.
Speaker Recognition
Language- and Text-Independence: The speaker recognition system is completely text- and language-independent. This means that a user may enroll her/his voice into the system in one language and be identified or verified in a completely different language. This allows the engine to be able to handle authentication and identification processes across any number of languages.
Large-Vocabulary Speech Recognition
The speech recognition side of the engine provides one of the
most accurate transcriptions for English, handling many
different dialects and accents in a single large-vocabulary
transcription engine, It is also capable of providing real-time
processing in a small memory footprint.
The speech recognition uses a streaming interface where the
recognizer, in the form of listeners and the client, both run on
the embedded device. Any light generic client capable of using a
websocket interface may stream audio/video to a listener and get
back real-time results of the transcript with optional
alternative results, including likelihood scores in any codec
that is supported by GStreamer-1.0, including MP3, Ogg Vorbis,
Free Lossless Audio Codec (FLAC), MP4, Pulse Code Modulation
(PCM), or other codecs such as those supported by a standard
Waveform Audio File Format (WAVE).
Face Recognition
The facial recognition side of the engine provides face
detection, face identification (open-set and closed-set), and
facial verification from still images and video streams. It
supports all standard image and video formats such as png, jpeg,
gif, mp2, mp4, .mov, etc.
Supported Operating Systems
The RecoMadeEasy® Embedded AudioVisual
Recognigtion engine is available for the following operating
systems. The C++ SDK, command-line interface, and web
services may be used in any of the following systems:
Server and Desktop Operating Systems (64-bit and 32-bit):
- CentOS 8 and 7.9 Linux (Latest)
- Previous CentOS Linux versions: 7.3, 7.2, 7.1, 7.0, 6.6, 6.4, 6.3
6.2, 5.7, 5.6, 5.4
- Fedora 40 Linux (Latest)
- Previous Fedora Linux versions: 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16,
15, 14, 13, 12, 11, 10, 9, 8, 7, 6, Core 5, Core 4, Core 3, Core 2,
Core
- Ubuntu 24.04 Linux (Latest)
- Previous Ubuntu Linux versions: 22.04, 20.04, 18.04, 16.04
- N.B.: May be made available for other Unix-Like systems upon request
-
Large-Vocabulary Speech Recognition
(Embedded)
(Server Based)
Initially available for English, Spanish, Mandarin, Arabic, and German, is now available for 100+ languages
Also includes multilinguagl support and code-switching
(Customizable domain full transcription ~ 300,000+ word vocabulary)
-
Speaker Recognition
(Embedded)
(Server Based)
(Language- and Text-Independent, aka: Speaker Biometrics, Voice Biometrics, or SIV)
Recipient: Frost & Sullivan Award 2011
-
Face Recognition
(Embedded)
(Server Based)
(Face detection and recognition)
-
Object Recognition
(Embedded)
(Server Based)
(Object detection and recognition)
|