The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11218500 B1

Date of Patent:

Jan. 04, 2022

Filed:

Jul. 31, 2019

Methods and systems for automated parsing and identification of textual data

Applicant:

Secureworks Corp., Wilmington, DE (US);

Inventors:

Kyle Soeder, Chesapeake, VA (US);

Harlan Parrott, The Colony, TX (US);

Paul DiOrio, Wilmington, DE (US);

Bradley Skaggs, Wilmington, DE (US);

Assignee:

Secureworks Corp., Wilmington, DE (US);

Attorney:

Womble Bond Dickinson (US) LLP

Primary Examiner:

Beemnet W Dada

Int. Cl.

CPC ...

H04L 9/00 (2006.01); H04L 29/06 (2006.01); G06N 3/08 (2006.01);

U.S. Cl.

CPC ...

H04L 63/1425 (2013.01); G06N 3/08 (2013.01); H04L 63/20 (2013.01);

Abstract

A method and system for parsing and identifying security log message data, which can include receiving system generated unstructured or partially semi-structured security log data from a plurality of source systems and devices, including a variety of different source systems and/or devices. The message data is received from the various sources in the form of raw log message data, as a stream of bytes received by a parsing system that identifies and extracts character features of the incoming raw messages. The extracted character features are compiled into data structures that are evaluated by a model(s) to determine segmentation boundaries thereof and generate message tokens, which are further classified as including variable data field(s) or as a template text string. Template categorized message tokens are used to provide message fingerprint information for characterizing the overall form of the message, and for comparison to a collection of previously stored/evaluated message fingerprints by a classifier. If the message fingerprint is determined to match a stored fingerprint with or above a selected confidence level, the parsed message can be stored. Unidentified message forms/fingerprints can be routed to a labeling system for further analysis, the results of which are used to train and update the character identification and classification engines/models.

Find Patent Forward Citations