Built-In Data Identifiers

Umbrella provides a wide selection of built-in data identifiers you can choose to incorporate into DLP policies to classify a wide range of sensitive data, both structured and unstructured.

The built-in data identifiers can classify specific data values based on pattern matching, bloom filters, and dictionaries incorporating proximity terms.

The ML (machine learning) built-in data identifiers are based on a LLM (Large Language Model) that was trained to classify unstructured sensitive data based on its true context. The ML data identifiers work for DLP-supported file types (see Supported File and Form Types; they do not work for form data. The ML built-in data identifiers are:

Bank Statement
Consulting Agreement
CV/Resume
Employment Agreement
IRS Forms
Medical Power of Attorney
Mergers And Acquisitions

Built-in identifiers are not directly incorporated into DLP rules; you must first select and incorporate them into Data Classifications which you then apply to DLP rules. (See Manage Data Classifications ).

The built-in data identifiers are available as an Excel table here. The table is updated frequently, so be sure to download the most recent version.

Tolerances

Some built-in data identifiers have three versions, each with a different tolerance level associated with it. Tolerance levels determine how many instances of an identifier must appear within a document for that document to be considered a match. There are three tolerance levels:

Lenient tolerance requires only a single instance of an identifier to appear within a document for that document to be considered a match.
Moderate tolerance requires more occurrences of an identifier than Lenient tolerance, but fewer than Strict tolerance. (The exact number varies depending upon the identifier.)
Strict tolerance requires the most occurrences of an identifier to appear within a document for that document to be considered a match. (The exact number varies depending upon the identifier.)

For identifiers that support tolerance levels, you can choose the tolerance level that best matches your level of concern for catching every occurrence of those identifiers.

Built-In Data Classifications < Built-In Data Identifiers > Copy and Customize a Data Identifier