Salesforce Data Cleanup – Take out the trash

By David Giller • January 29, 2019

Table of Content

Of all the steps necessary to clean up your Salesforce org, this one is definitely the least glamorous.

This is the point where we need to roll up our sleeves, imagine ourselves putting on rubber gloves and getting cleaning supplies because we are going to get our hands pretty dirty here.

Cleaning up a lot of painful and dirty data in the Salesforce org.

Welcome to part five of our six part series on how to clean your Salesforce org.

In this unit I’m going to explain how you can identify and clean up your Salesforce data.

In case you missed the prior installments go check them out:

In part one, we focused on understanding your business processes.
In part two, we focused on cleaning up your Salesforce users.
In part three, we covered your data architecture.
In part four, we reviewed your Salesforce integrations.

In this unit, we’re going to focus on your garbage data, which is the most common and visible pain point that any organization has with their database.

This block renders a quote for the post drawn from the post's custom fields. Modify the quote below the content editor in the Quote fields.

Of course, everyone wants their Salesforce data to be clean until we stumble upon some spreadsheet or an integration that we want to set up in Salesforce. Because, 9 times out of 10 when someone comes across a spreadsheet of new leads that they want to import into Salesforce, everyone’s in a rush. And, no one wants to take the time to do the due diligence of reviewing and cleaning up that data before actually uploading that stuff into Salesforce.

Yes, I see this all the time and today I’m going to teach you how to clean up that garbage after the fact, because sadly no one wanted to make that upfront investment and clean up that data before importing it in the first place.

By the way, if you haven’t done so already, make sure to download the pdf handout that accompanies this section as I believe you will find it very handy to follow up with some items covered in this video.

For those of you who don’t already know me, I’m David Giller.

I’m a Salesforce MVP, consultant and trainer.

What is Dirty Data?

Well, dirty data can fall into multiple categories.

The first one would be duplicate data. Duplicate data means that same information already exists elsewhere in the database, at least more than one time.

The next one would be incorrect data. An example of incorrect data would be a street address that is outdated.

The next is inaccurate data. An example of inaccurate data would be where we have the correct street address, but we haven’t specified which floor, or office number the person, or organization is located in.

The next one is inconsistent data. An example of inconsistent data is where let’s say the state of New York is represented as NY, N.Y. the words spelled out New York, or NY-USA.

Another category of dirty data is data that violates business rules. An example of that would be, let’s say all tier one customers must have a primary contact, but we find that we do in fact have records in the database where that is not the case.

The last category of dirty data is improperly punctuated or misspelled data.

How Does Dirty Data Enter our Database in the First Place?

Well, usually dirty data gets entered in, in one of three ways.

The first is probably the most obvious, which is manual entry. Manual entry as the name implies, someone logs into the system, they’re typing away on their keyboard and they enter in a new record.

The next scenario where dirty data enters the system is batch entry, or imports. An example of this would be where someone receives an excel spreadsheet and we import the data in bulk into the system.

The third most common way that dirty data enters our database is through system integrations, where there is a connection between Salesforce and some other system, where that other system is feeding data directly into Salesforce.

What Can You, as the Salesforce Admin, Do in Order to Address the Dirty Data in Your Org?

I’m going to walk you through a very simple three step process in order to address your dirty data.

The first step is to identify the dirty data that requires attention.

So let’s define which data issues are causing the biggest problems within our organization.

The first example would be where we’re trying to pull a mailing list and we see that we have duplicate or triplicate appearing in the mailing list, or we have street addresses that are missing. We have phone numbers in the street address field or other inconsistencies that are glaringly problematic.
Another example of a problem that is bringing the data quality issues to the forefront of our attention is where we know that we have assignment rules in the system. However, those assignment rules are not working as expected. So for example, we might see that we have an assignment rule that says, “All tier one customers in the state of New York should automatically be assigned to a particular individual.” And, we are wondering, “Hey, we have a lot of tier one customers that are should be assigned to that person, but they aren’t.” And when we look closer at the data, we can see that either they are not flagged. In fact, as tier one customers, maybe they’re not flagged as a classification whatsoever. That field that’s capturing tier one is completely blank. Or the state is not populated at all on either the contact or the account record where the assignment rule is looking to automatically assign those records.
Another scenario would be where we have reports or dashboards that simply are not showing all the records that we expect to be appearing in these reports.
Another common symptom of this issue is where we have classifications for perhaps contacts, or accounts, or maybe cases that are old, outdated, or simply missing.

The first thing that we need to do in step one is pull reports in order to identify the specific problematic records that require cleansing.

What is our Internal Strategy in Order to a Address These Issues? How Are We Going to Fix Them?

We need to identify where can we turn in order to get the record of truth?
Who within our organization knows what the correct accurate value should be on all of these records that we’ve identified?
Or perhaps it’s not a particular individual, but which external systems should we be relying on in order to reference, to pull that information, so that Salesforce shows the most accurate information that we could possibly find for these records?

Next we need to figure out how can we scale the resource we’ve just identified in order to review and update all the problematic records that are appearing in our reports.

For example, is it going to be a manual stare and compare, or is it perhaps going to be a bulk data extract from another system that’s going to allow us to do a bulk upload back into Salesforce.

Update Those Problematic Records

So the first thing that we’re going to do is we’re going to bulk perform an extract from Salesforce showing the problematic records that require an update. And by the way, always make sure to leverage the Salesforce CaseSafe Record ID. That is a non cap sensitive record ID for each of the records appearing in your report. So what we’re going to do is we’re going to not only the report that shows all the problematic records, we’re going to also display the Salesforce CaseSafe Record ID for all of those records, and then we’re going to export that data into a spreadsheet format.

The next step that we’re going to follow is we’re going to do a bulk extract from the external system. If we’ve identified that there is another external system that can serve as the record of truth, we’re going to perform a bulk extract from that other system to serve as our reference table.

Next, we’re going to either perform a vlookup in excel where we are relying on that Salesforce CaseSafe Record ID, or a manual stare compare and review in order to perform a side by side comparison of your Salesforce data…and, the reference table that you extracted from the other system in order to decide which values should be updated in Salesforce and which should remain untouched. Thereby, creating a brand new, let’s call it “update” spreadsheet.

Once you’ve created your “update” spreadsheet using one of the many data import tools available, you’re going to bulk update your Salesforce records with the update spreadsheet, using the Salesforce CaseSafe Record ID to find the appropriate record that needs to be updated with the values contained in your newly created spreadsheet.

After you’ve bulk updated your Salesforce data, you should be able to leverage native Salesforce reports and dashboards in order to review and confirm that all the records that you expected to update, all the records that we identified previously as having been problematic are now properly updated, showing the most accurate data that we know of for all of those records.

By the way, there are many apps both free and paid for apps available on the Salesforce AppExchange to help you further identify and manage dirty data.

Make sure to download the PDF handout that accompanies a section as it includes many links to different resources. Again, both paid and free resources to help you managing your dirty data in Salesforce.