Shaping Data with Joins
Objectives
At the end of this module, you will be able to:
- Describe the purpose of a join.
- Differentiate between the types of joins.
- Describe the difference between unions and joins.
- Create and configure a join.
- Evaluate the results of a join and fix issues.
During data analysis, you may discover the need for additional data. For example, perhaps you have fields from another data set that you would like to add to your data. You want to combine your data with additional data sets, but you may have questions similar to these:
- What is the difference between combining data using a join versus a union?
- What options can be used to control which rows are selected from each table?
- How can mismatched values be corrected in the join results?
You can use Tableau Prep Builder to create joins between data sources that have one or more fields in common. Joins add columns from one table to another in a single data structure.
This module will demonstrate how to use joins to combine different data sources that separately have the information you want to put together for analysis.
Determine which to use: joins versus unions
Now that you understand what joins are and how they function, you may be wondering how joins differ from other data combination options. Joins and unions are the most common methods to combine data. Let's look at the differences between these methods as well as some appropriate use cases.
Evaluate the results of a join and fix issues
Now that we've created and configured the join for our soda data, we're ready to evaluate the results. If we don't see what we expect, we may need to do some additional cleaning to fix mismatched values.
Common join issues
You may be wondering what kinds of issues cause mismatched values. The following will result in Tableau Prep Builder reading the values as a mismatch and excluding them from the join:
- Inconsistent capitalization, punctuation, and use of abbreviations
- Misspellings or typos
- Extra spaces or separators
You have many options for determining whether the field values have any of these issues.