The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 18, 2007

Filed:

Mar. 06, 2003
Applicants:

Andreas Arning, Rottenburg, DE;

Christoph Lingenfelder, Herrenberg, DE;

Gregor Meyer, Loeningen, DE;

Dieter Roller, Schoenaich, DE;

Swen Wohland, Bischofswerda, DE;

Inventors:

Andreas Arning, Rottenburg, DE;

Christoph Lingenfelder, Herrenberg, DE;

Gregor Meyer, Loeningen, DE;

Dieter Roller, Schoenaich, DE;

Swen Wohland, Bischofswerda, DE;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01);
U.S. Cl.
CPC ...
Abstract

A system and method determine numerical representations for categorical data fields by taking advantage of the redundancy of the data records to allow automatic discovery of an order of the categories. A categorical data field is recoded by creating separate tables for each numerical data field occurring in the data records. The separate tables are sorted according to the numerical values of the respective data fields. The recoding of the categories is performed based on the average sort order of occurrences of the category in a specific sorted table. The standard deviation of the numerical codes provided by the categories is calculated for each of the separate recoding tables. The recoding table with the maximum standard deviation is selected as the recoding table to perform the recoding of the categories contained in the respective categorical data field of the data records. A plausibility check is performed for the selected recoding table by excluding the numerical data field that has formed the basis for the sorting of the respective table and recreating the recoding table from the data records. The resulting recoding table and the original recoding table are compared. Resulting recoding tables that are similar indicate a high level of confidence that the originally selected recoding table is optimal.


Find Patent Forward Citations

Loading…