The EDM tool creates the irreversible hash fingerprints of your critical data records and uploads them to Umbrella into the template of the configured Exact Data Match Identifier.
Prior to generating the hash fingerprints, the data indexer validates that the submitted records and their values conform to the defined and supported field types as part of the exact data match template.
- Full admin access to the Umbrella dashboard. See Manage User Roles.
- JVM version 17+
- The machine where the data indexer is downloaded must be able to connect to the following endpoints:
Note: < edm_template_id > is the ID of the EDM identifier retrievable from the Umbrella UI. (See Step 7 in Create an Exact Data Match Identifier.)
- The EDM Data Indexer must be downloaded after the template for the EDM identifier is created. See Steps 1-6 in Create an Exact Data Match Identifier. After downloading the indexer, move it to the same folder where the data to be indexed resides.
- The API Key and Secret must be generated for the EDM data indexer. See Step 8 in Create an Exact Data Match Identifier.
- The indexer supports files with up to 55 million records. The exact records limit is determined by the total number of columns and how many of those are of Alphanumeric type. The indexer will display the exact limit when attempting to load a file that exceeds it. If your dataset is larger than the limit, you need to split the records into multiple files. For errors received when indexing a large file, see Memory Tuning for DLP Exact Data Matching Indexer.
- The source data CSV file you index must meet the following requirements:
- The file name must not include space characters.
- A multi-term (multi-word) field can contain a maximum of 6 space-separated words.
- The data file must contain only 1 byte or 2 byte UTF-8 encoded characters.
- The first row of data must have between 1 and 50 fields and each row must have the same number of fields.
- The first row of data must specifying the name of each field, and each value must be unique.
- Data in the second and ensuing rows must comply with the EDM field types and supported formats (see Exact Data Match Field Types).
- The field names in the sample data template must match the field names in the actual data source file.
Caution: Do not create or edit the source data CSV file using Microsoft Excel, as this may corrupt the file. Use a text editor.
Note: If any of the values provided as part of the source file to the data indexer fails to be validated as per the supported format, then the data indexer will skip that record and proceed with the indexing of the remaining records. Similar for any records that may exceed the template defined fields and for empty rows or records with empty primary values. The position of the skipped records in the file will be provided as part of the output of the data indexer.
When you create a new EDM identifier, you need to run the EDM Data Indexer for the first time to upload the first set of data records. For the full procedure on creating an EDM identifier, see Create an Exact Data Match Identifier.
- Run the indexer in a terminal window with the following command: java -jar edm-lander.jar -i < source_file.csv > -e < edm_template_id > -k < authKey > -s < authSecret >
- < source_file.csv >—the relative path to the csv spreadsheet with the actual data records
- < edm_template_id >—the ID of the EDM identifier retrievable from the Umbrella UI as shown in the following screenshot. (See also Step 7 in Create an Exact Data Match Identifier.)
The exact data matcher now has a status of Data Indexed.
Note: When the EDM has a status of Data Indexed, you can add the EDM to a data classification but you can not edit the field types, primary field selection, or matching condition.
When your source file CSV is updated with new records, the existing EDM data indexer on your configured policy must be updated to reflect the new data fingerprints. This procedure allows you to rerun the indexer periodically to update your source data to Umbrella without performing the initial procedure over again. After you rerun the data indexer with the updated version of the source file against the EDM ID of your EDM Data Identifier, the DLP Policy configured with the this EDM Data Identifier accounts for the most recent updates to your critical records.
- In a terminal window, set the the API Key and Secret previously saved in Step 8d of the Create an Exact Data Match Identifier procedure as values to the environment variables EDM_AUTH_KEY and EDM_AUTH_SECRET.
- Run the following command as part of a periodically executed script or as needed:
java -jar edm-lander.jar -i < source_file.csv > -e < edm_template-id > -k EDM_AUTH_KEY -s EDM_AUTH_SECRET
If the indexer returns an error message that reads, "Error: A JNI error has occurred, please check your installation and try again," check the following:
Confirm you have the latest version of the Java Development Kit installed.
Confirm that you have your PATH system variable set correctly:
Check the location where you have Java installed.
- For Windows this is normally C:\Program Files\Java\jdk-<version-number>\bin
- For Linux this is normally /usr/java/jdk-<version-number>/bin
Use the instructions here to set the PATH system variable appropriately for your operating system.
Note: If the data indexer fails to process the input file and returns a base64 encoded error code, provide that code to the Umbrella Support to assist you with troubleshooting.
Updated a day ago