The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 02, 2023
Filed:
Aug. 19, 2021
Medidata Solutions, Inc., New York, NY (US);
Mandis Beigi, White Plains, NY (US);
Jacob Aptekar, Oakland, CA (US);
Afrah Shafquat, Jericho, NY (US);
Jason Mezey, New York, NY (US);
Medidata Solutions, Inc., New York, NY (US);
Abstract
A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the seed record, generating a new record by randomly selecting features from the plurality of nearest neighbor records, and concatenating the new record into the synthetic dataset. For a synthetic dataset that contains N records, which may be the same as or different from the number of records in the original dataset, the selecting, identifying, generating, and concatenating operations operate a total of N times on the records in the embedded dataset.