The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 30, 2024

Filed:

Jan. 14, 2022
Applicant:

Baidu Usa, Llc, Sunnyvale, CA (US);

Inventors:

Hongliang Fei, Sunnyvale, CA (US);

Puxuan Yu, Amherst, MA (US);

Ping Li, Bellevue, WA (US);

Assignee:

Baidu USA LLC, Sunnyvale, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2019.01); G06F 7/00 (2006.01); G06F 16/2457 (2019.01); G06F 40/58 (2020.01); G06F 40/284 (2020.01);
U.S. Cl.
CPC ...
G06F 16/24578 (2019.01); G06F 40/284 (2020.01); G06F 40/58 (2020.01);
Abstract

Existing research on cross-lingual retrieval cannot take good advantage of large-scale pretrained language models, such as multilingual BERT and XLM. The absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are some of the key factors of this issue. Accordingly, embodiments of two novel retrieval-oriented pretraining tasks are presented herein to further pretrain cross-lingual language models for downstream retrieval tasks, such as cross-lingual ad-hoc retrieval (CUR) and cross-lingual question answering (CLQA). In one or more embodiments, distant supervision data was constructed from multilingual texts using section alignment to support retrieval-oriented language model pretraining. In one or more embodiments, directly finetuning language models on part of an evaluation collection was performed by making Transformers capable of accepting longer sequences. Experiments show that model embodiments significantly improve upon general multilingual language models in at least the cross-lingual retrieval setting and the cross-lingual transfer setting.


Find Patent Forward Citations

Loading…