The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 29, 2024

Filed:

Jun. 26, 2024
Applicant:

Sanas.ai Inc., Palo Alto, CA (US);

Inventors:

Lukas Pfeifenberger, Salzburg, AT;

Shawn Zhang, Palo Alto, CA (US);

Assignee:

SANAS.AI INC., Palo Alto, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 21/007 (2013.01); G06F 3/16 (2006.01); G10L 13/00 (2006.01); G10L 13/033 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/26 (2006.01); G10L 21/003 (2013.01); G10L 21/01 (2013.01); G10L 21/013 (2013.01);
U.S. Cl.
CPC ...
G10L 21/007 (2013.01); G06F 3/162 (2013.01); G10L 13/00 (2013.01); G10L 13/033 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/26 (2013.01); G10L 21/003 (2013.01); G10L 21/013 (2013.01); G10L 21/01 (2013.01); G10L 2021/0135 (2013.01);
Abstract

The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.


Find Patent Forward Citations

Loading…