The Umbrella User Guide Developer Hub

Welcome to the Umbrella User Guide developer hub. You'll find comprehensive guides and documentation to help you start working with Umbrella User Guide as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Create an Exact Data Match Identifier

Exact Data Matching (EDM) is a data classification and matching technique that effectively detects data loss incidents involving sensitive records maintained and stored in a structured data format. EDM is designed to help protect sensitive data by fingerprinting data instead of leveraging pattern matching techniques.
Umbrella performs Exact Data Matching using an EDM template, which you create by uploading a plain text CSV file that describes the fields to be matched, and choosing an EDM type for each field described in the CSV file. You must also specify which fields are required for a match, and their priority. Once the EDM template is created, you will obtain an ID for it from the Umbrella UI.

You must then download the DLP Indexer and run the indexer against your data records using the EDM template. The DLP Indexer validates records against the template and generates hash fingerprints for those records. You will then be able to add the Exact Data Match Identifier to a Data Classification as a Custom Data Identifier.

Table of Contents

Prerequisites

Note: <edm_template-id> is the ID of the EDM identifier retrievable from the Umbrella UI, as shown in Step 7 below.

  • The template CSV you upload must meet the following requirements:

    • The file can not be larger than 10 KB.
    • The second and ensuing rows may contain sample data; be aware, however, that this data will not be indexed unless it also appears in the source data CSV file.
  • The DLP indexer supports data files with up to 55 million records. The exact records limit is determined by the total number of columns and the number of those are of Alphanumeric type. The indexer displays the exact limit when attempting to load a file that exceeds it. If your dataset is larger than the limit, you need to split the records into multiple files. For errors received when indexing a large file, see Memory Tuning for DLP Exact Data Matching Indexer.

  • Both the template and source data CSV files must meet the following requirements:

    • The file name must not include space characters.
    • A multi-term (multi-word) field can contain a maximum of 6 space-separated words.
    • The data file must contain only 1 byte or 2 byte UTF-8 encoded characters.
    • The first row of data must have between 1 and 50 fields and each row must have the same number of fields.
    • The first row of data must specify the name of each field, and each value must be unique.
    • Data in the second and ensuing rows must comply with the EDM field types and supported formats (see Exact Data Match Field Types).
    • The field names in the sample data template must match the field names in the actual data source file. The field names must appear in the same order in both files.

Caution: Do not create, edit, or view the template or source data CSV file using Microsoft Excel, as this may corrupt the file. Use a text editor.

Note: If any of the values provided in the source file to the DLP indexer fail to be validated as per the supported format, then the DLP indexer will skip that record and proceed with indexing the remaining records. The indexer also behaves in this manner for any records that may exceed the template-defined fields, and for empty rows or records with empty primary values. The position of the skipped records in the file will be provided as part of the output of the DLP indexer.

The following procedure uses the following example CSV file for both the template and indexed source data:

 Email,SSN,Passport number - US,Credit card number
 [email protected], 113011111,123456789USA1234567U1234567, 5205105105105109
 [email protected], 248257990,123456789USA1234567U1234567, 4532237384050172
 [email protected], 363265019,123456789USA1234567U1234567, 4532549977031249
 [email protected], 417279936,123456789USA1234567U1234567, 4539157470627290

Procedure

  1. Navigate to Policies > Policy Components > Data Classification > Exact Data Match Identifiers and click ADD EXACT DATA MATCH IDENTIFIER.
  1. Provide a name and description for the exact data match, then upload a sample data CSV file and click Next.
    Note: Do not upload any real data in the CSV. Use only sample data for the template.
  1. From the drop-down menu, choose the type of data for each field.
    The available field types are described in Exact Data Match Field Types.
  1. Select the Primary field that must be found with other data fields for the identifier to match and click Next. The primary field should be the field in the record to always match. The secondary fields are the number of fields in the record that should be present with the primary field to match.
  2. Enter the minimum number of secondary fields required to be found within 10 terms of the primary field for a match. You can have no secondary field, or up to as many fields in the record that are not the primary field. For example, if at least 3 secondary fields are required, then the identifier will only trigger an event when 3 or more additional fields are found with the primary field. If the .csv file has only 1 column, then the secondary option is greyed-out and automatically set to 0. Click Next.
    Note: The secondary fields must each be a different identifier.
  1. Click the download icon to download the DLP Indexer. Then click Save.
    Note: You can click on Command Line Legend to view information about the command line arguments required to run the indexer.
  1. The exact data matcher now has a status of Data Not Indexed. You can expand the listing to view the details about the EDM. When the EDM status is Data Not Indexed, you can edit the field types, the primary field selection, and the matching conditions, but the EDM cannot be added to a data classification. Make note of the ID for the exact data matcher; you will need it to run the DLP Indexer.
    Note: The display page does not display the full EDM ID; use the Copy icon to copy the ID.
  1. Create an API key and secret for the DLP indexer.

    a. Navigate to Admin > API Keys and click Add.
    b. Provide a name for the API Key.
    c. Select Policies > DLP Indexer for the scope and choose Read/Write for the permissions. Then click Create Key.

d. Copy and save the API Key and Secret somewhere safe, as you will need them to run the DLP Indexer.

  1. Click ACCEPT AND CLOSE.

  2. Use the Umbrella DLP Indexer to index your data records. For more information on the indexer, see DLP Indexer.

a. Move the indexer from the downloads folder to a convenient location, such as the folder where the data records are stored.

b. Run the indexer in a terminal window with the following command:

java -jar <directory_path>/dlp-indexer.jar -i <directory_path>/<source_file>.csv
-e <edm_template-id> -k <authKey> -s <authSecret>

where:

  • <directory_path>/dlp-indexer.jar —the relative path to the location of the DLP indexer
  • <directory_path>/<source_file>.csv —the relative path to the csv spreadsheet with the actual data records
  • <edm_template-id> —the ID of the EDM identifier retrievable from the Umbrella UI (see Step 7)
  • <authKey> —the API Key previously saved at Step 8d
  • <authSecret> —the API Secret previously saved at Step 8d

The exact data matcher now has a status of Data Indexed.

Note: When the EDM has a status of Data Indexed, you can add the EDM to a data classification but you can not edit the field types, primary field selection, or matching condition.

If you update your source file with new or changed records, you need to run the DLP indexer again on that file using the same EDM ID, API Key, and API Secret. This ensures that policies configured to use the EDM are updated to reflect the new data fingerprints. For more information, see Update the Indexed Data Set Periodically.


Delete or Edit a Classification < Create an Exact Data Match Identifier > Exact Data Match Field Types

Updated 2 days ago

Create an Exact Data Match Identifier


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.