Manage Data Classifications

The Data Loss Prevention (DLP) policy monitors or blocks content based on the rules configured for the policy. The rules use data identifiers and data classifications, which describe the type of data to be monitored or blocked. The file and form types supported by the DLP policy are listed here.

Data identifiers describe the content the Data Loss Prevention Policy monitors or blocks. Data identifiers can describe Personal Information Identifiers (PII) that may identify an individual, such as financial account numbers, medical records, passport or government identification numbers, or credit card numbers. Data identifiers can also describe certain content an organization may wish to monitor or block within its network traffic, such as discriminatory or aggressive content. Umbrella provides a collection of built-in data identifiers (see Built-in Data Identifiers and Individual Data Identifiers), and you can create custom identifiers based on the built-in data identifiers (see Copy and Customize a Data Identifier).

Data classifications are groups of data identifiers combined for the purpose of monitoring or blocking closely related content. For example, you can create a data classification that encompasses medically related content by including the built-in identifiers for ICD codes, drug names, prescription names, health conditions, and national drug code names. The classification, when applied to a rule in the Data Loss Prevention Policy, will monitor or block content matching those identifiers.

Using the inclusion and exclusion options on the Data Classification page, you can fine-tune your classification to be more precise and reduce false positives. You can exclude specific terms and regular expression (regex) patterns by creating a custom identifier in the exclusion area, or by excluding a pre-existing built-in identifier.

The exclusion applies only to the content that's been matched, not to every document that meets the exclusion criteria.

For example, consider a data classification that targets the built-in identifier Health Condition and Person Name (US). You want to block instances of "John Smith cancer" while allowing instances of "John Smith cancer fundraising." To achieve this, you can craft a custom data identifier for "cancer fundraising" and set up an exclusion for this identifier within your data classification. As a result, matches for the Health Condition and Person Name (US) identifier will be flagged, except when the phrase "cancer fundraising" is present.

The system compares data identifiers selected for exclusion against both the keywords and the proximity terms of included data identifiers. If the Data Loss Prevention Report reveals that a particular rule or identifier is generating false positives, consider using terms and identifiers exclusion to remedy the situation.

Note If you select a data identifier for both inclusion and exclusion, exclusion will take precedence.

There are three ways to establish a data classification to apply to a data loss prevention rule:

The system provides four different built-in data classifications. You can apply these directly to your DLP rules or copy them and customize the copies to create your own data classifications. Each of the built-in data classifications has a different set of built-in data identifiers associated with it. The four pre-defined data classifications are described in Built-In Data Classifications.

View Content Categories in Reports < Manage Data Classifications > Create a Data Classification