Guides
ProductDeveloperPartnerPersonal
Guides

Create a Data Classification

You can create a data classification to help you monitor content with specific characteristics. Custom data classifications can be used in real time rules, SaaS API rules and discovery scans.

The building blocks for data classifications are data identifiers. The data identifiers that you choose for a data classification determine the type of data for which rules using that data classification will scan.

Using the inclusion and exclusion options on the Data Classification page, you can fine-tune your classification to be more precise and reduce false positives. You can exclude specific terms and regular expression (regex) patterns by creating a custom identifier in the exclusion area, or by excluding a pre-existing built-in identifier.

The exclusion applies only to the content that's been matched, not to every document that meets the exclusion criteria.

For example, consider a data classification that targets the built-in identifier Health Condition and Person Name (US). You want to block instances of "John Smith cancer" while allowing instances of "John Smith cancer fundraising." To achieve this, you can craft a custom data identifier for "cancer fundraising" and set up an exclusion for this identifier within your data classification. As a result, matches for the Health Condition and Person Name (US) identifier will be flagged, except when the phrase "cancer fundraising" is present.

The system compares data identifiers selected for exclusion against both the keywords and the proximity terms of included data identifiers. If the Data Loss Prevention Report reveals that a particular rule or identifier is generating false positives, consider using terms and identifiers exclusion to remedy the situation.

Note: If you select a data identifier for both inclusion and exclusion, exclusion will take precedence.

The system offers two types of built-in data identifiers you can choose from:

Built-In Identifiers

These identify data using pattern matching and dictionary lookups. The descriptions shown in the GUI provide details about the type of data they match. For more information, see Built-In Data Identifiers.

Machine Learning Identifiers

These identify data based on AI analysis of example documents. For example, the identifier for Patent Files has been trained to recognize documents that are likely patent applications. For more information, see Built-In Data Identifiers.

The system offers three types of data identifiers you can create yourself applying different methods of data analysis:

Custom Identifiers

You can create custom identifiers to match specific terms and pattern expressions of your choosing. See Create a Custom Identifier and Copy and Customize a Data Identifier.

Exact Data Match Identifiers

Exact Data Match Identifiers use fingerprinting to identify data in structured documents that match criteria you define. (See Create an Exact Data Match Identifier for more information.)

Indexed Document Match Identifiers

Indexed Document Match Identifiers use fingerprinting to identify data in unstructured documents that match criteria you define. See Create an Indexed Document Match Identifier for more information.

To delete or edit a data classification, see Delete or Edit a Classification.

Prerequisites

Procedure

  1. Navigate to Policies > Policy Components > Data Classification and click Add.
  1. Give your classification a meaningful name and description.
  2. Under Include Data Identifiers, select a Boolean operator to separate the identifiers included in the classification.
  • OR—At least one of the data identifiers included must match during rule evaluation.
  • AND—All of the data identifiers included must match during rule evaluation.
  1. Under Include Data Identifiers, expand Built-in Data Identifiers and select data identifiers to include in the data classification.

Choose from:

  • Built-In Identifiers
  • ML (Machine Learning) Built-In Identifiers
  1. Under Include Data Identifiers, expand Custom Identifiers and select custom identifiers to include in the data classification.

Choose from:

  • Custom Identifiers
  • Exact Data Match Identifiers
    NOTE: Exact Data Match Identifiers that are greyed out have not yet been indexed and may not be selected. (See Create an Exact Data Match Identifier for more information.)
  • Indexed Document Match Identifiers
    NOTE: Indexed Document Match Identifiers that are greyed out have not yet been indexed and may not be selected. (See Create an Indexed Document Match Identifier for more information.)

  1. Under Exclude Data Identifiers, expand Built-in Data Identifiers and choose data identifiers to exclude from the data classification.

Choose from:

  • Built-In Identifiers
  • ML (Machine Learning) Built-In Identifiers

  1. Under Exclude Data Identifiers, expand Custom Identifiers and choose custom data identifiers to exclude from the data classification.

Choose from:

  • Custom Identifiers
  • Exact Data Match Identifiers
    NOTE: Exact Data Match Identifiers that are greyed out have not yet been indexed and may not be selected. (See Create an Exact Data Match Identifier for more information.)
  • Indexed Document Match Identifiers
    NOTE: Indexed Document Match Identifiers that are greyed out have not yet been indexed and may not be selected. (See Create an Indexed Document Match Identifier for more information.)


  1. You may expand any built-in data identifier and click COPY & CUSTOMIZE to create a new customized data identifier. See Copy and Customize a Data Identifier.

  1. Click Save.

Your new classification is listed on the Data Classification page, and will be available to you when you Add a Real Time Rule to the Data Loss Prevention Policy, Add a SaaS API Rule to the Data Loss Prevention Policy, or initiate a new Discovery Scan.


Manage Data Classifications < Create a Data Classification > Copy and Customize a Built-In Data Classification