Data Preparation

Data preparation is the process of manipulating data into a form for further analysis. This manipulation includes cleaning, shaping, or combining it with other data sources. Regardless of its source, most data needs cleaning and shaping to optimize its analysis.

The benefits of clean, prepared data

  • More insightful, data-driven decisions
  • Data that needs cleaning and shaping can hinder or even block your exploration. Clean data enables more accurate analysis which leads to more insightful, data-driven decisions.
  • Easier for others to perform their analysis

Factors like human error, disparate systems, and changing business requirements can contribute to dirty data, but data prep often necessitates more than just cleaning steps. You may need to adjust the granularity of the data or transform it to align and then union or join with other data. This means the data ready for analysis often looks very different from the original data source. If preparing data for others, reshape, and combine data based on audience needs. Then it will be easier for others to perform their analysis which will lead to faster insights.

The analytics process

steps in prep

 

Tableau was founded on the idea that data preparation, analysis, and visualization should not be isolated activities but integrated parts of an analytics cycle. Usually, the goal of the analytics process is to answer important questions using data. You may have your own questions about data. Or, someone at your organization may have questions about data that they want you to answer. The process often starts with a task or question, but that might vary. The analytics cycle shows typical tasks that are done when answering questions using data.

How does Prep fits into Tableau

Iterative and non-linear

The analytics cycle is not a linear progression from one stage to the next. It may occasionally work that way, but in general, the process is iterative. You can jump back and forth among the stages in the cycle. For example, your exploration of the initial question may lead to follow-up questions, and your exploration of the data may lead you to other lines of questioning within the same data set.

Data preparation within the analytics cycle

Data preparation does not just happen once. You may return to data prep again and again during your process of answering questions about your data. For example:

  •     In many cases, data preparation is driven by a specific question. The question often prompts you to find and prepare data in order to determine an answer.
  •     As you create visualizations to answer your question, you may decide you need to find more data to add to the analysis, which can lead to more data preparation and cleanup.
  •     Follow-up questions may require additional data that needs to be cleaned and combined.

Tableau Prep Builder is a visual data preparation tool. Tableau Prep Builder empowers you to get to your analysis faster by helping you quickly and confidently combine, shape, and clean your data.

Using Tableau Prep Builder, you can:

  • Connect to data from multiple data sources.
  • Clean your data using familiar operations such as filter, split, and rename.
  • Edit values directly on rows of data.
  • Combine data sources using unions and joins.
  • Shape your data using pivots and aggregations.
  • Create an output of your cleaned data for analysis in Tableau Desktop or Tableau Server/Online.

Key advantages of Tableau Prep Builder

Visual and direct

  • Three coordinated views provide a holistic picture of your data which leads to deeper understanding.
  • Drag and drop functionality, no scripting required.
  • Direct and immediate experience helps you instantly see the result of your actions.

Smart features

  • Fuzzy-match algorithms for common data prep challenges, for example, to find and fix spelling errors.
  • One-click operations, for instance, to remove punctuation and trim spaces.
  • Simple and fast operations reduce repetitive cleaning tasks.

Integrated

  • Tableau’s data connectors, calculation language, and governance structure are the same as other Tableau products, so you can get up to speed quickly.
  • It is easy to open your flow output with Tableau Desktop to stay in the analytics cycle for faster speed to insights.
  • Share flows and flow output to reduce friction and help you bridge the gap between analytics and data preparation.

Integrated Tableau platform

Tableau Prep Builder is integrated with the other Tableau products in the Tableau platform. This enables you to prepare your data and stay within the natural iterative, non-linear flow of how you work.

  • In Tableau Prep Builder:
  • Output data extract files that can be analyzed in Tableau Desktop and published to Tableau Server/Online for use by others.
  • Seamlessly preview your cleaning in Tableau Desktop to check your data prep progress.
  • Publish Tableau Prep Builder flows to Tableau Server/Online so Tableau Prep Conductor can run them according to a schedule.
Flows

There are several tools you can use to prepare and clean your data, including Tableau Desktop and Tableau Prep Builder.

Fundamentally, Tableau Prep Builder helps you explore, clean, integrate, and reshape data. If your primary goal is to perform one or more of those tasks, that’s a good sign that Tableau Prep Builder is the right solution. Tableau Prep Builder is a tool optimized for data preparation with more advanced capabilities.

When your primary goal isn’t to explore, clean, integrate, or reshape data, remember that all the data preparation abilities of Tableau Desktop still exist. If a combination of the data interpreter, a pivot, or joins and unions gets your data into the form you need, use Tableau Desktop.

A preferred option for data preparation where consistency and repeatability is a requirement and there are dedicated user roles responsible for curating data for others to use. Tableau Prep Builder also offers more sophisticated data preparation capabilities beyond Tableau Desktop where advanced transformations are required. Best choice for when data needs to be cleaned and combined when you need to do multiple reshaping, combining, or cleaning operations to build a data source. Tableau Prep Builder allows you to profile and explore your data before analysis.

USE CASES

Author/Business User

Uses Tableau Prep Builder’s visual and direct interface to manipulate data into the form desired to answer their own analysis questions. Example: An author/business user combines data sources, creates calculations, and performs multiple pivots for survey data.  

Analyst

Uses Tableau Prep Builder’s visual and direct interface to reshape data to avoid complex table or LOD (level of detail) calculations in Tableau Desktop for themselves or others, then publishes their flows to run on a schedule using Tableau Prep Conductor. Example: An analyst calculates multiple aggregates for monthly or weekly summaries.

Data Steward

Uses Tableau Prep Builder to clean data using smart algorithms and to prepare data for others, by adding business rules in calculations, combining, and shaping data, and then publishing curated data sources for the wider organization. Example: A data steward integrates current and historical data, cleans it and adds calculations, and then shares the data source with the finance department.

Data Scientist

Uses Tableau Prep Builder to explore data visually and get data ready for their analysis while integrating more complex cleaning, machine learning, and predictive modeling using Python/R scripts. Example: A data scientist uses a Python script to add a predictive maintenance cost for a manufacturer's supply chain.

Cleaning up messy or dirty data

Messy or "dirty" data, which is data that is poorly structured, incomplete, full of inaccuracies or inconsistencies, leads to inefficient or invalid analyses.

Making analysis easier for others

By taking care of more complex data shaping and combining in Tableau Prep Builder, analytics in Tableau Desktop or Tableau Server/Online will be simpler for others. For example, creating multiple data extract outputs can be useful if you are preparing the same data, but for different audiences.

check knowledge