Enriching Data with Metadata
A glance at the workflow
Modeling metadata in Tableau
Metadata management is an extension of data source management and includes policies and processes that ensure information can be accessed, shared, analyzed, and maintained across the organization. Metadata is a business-friendly representation of data in common terms, similar to a semantic layer in traditional BI (business intelligence) platforms. Curated data sources make fields immediately understandable regardless of their source and hide the complexity of your organization’s modern data architecture.
Tableau has a simple, elegant, and powerful metadata system that gives users flexibility while allowing for enterprise metadata management. A metadata model can be embedded in a workbook or centrally managed as a published data source.
Tableau’s metadata system is a tiered system with two layers of abstraction and a run- time model (VizQL Model). When Tableau Catalog (a Data Management Add-on) is enabled, all of the data assets in your Tableau environment are organized into one central list.
Use metadata for enrichment
Metadata enables data to be analyzed by users in an end-user-friendly environment. Enrich data to give it meaning and context.
For example, to improve the experience of using the data source:
- Rename fields to be user friendly.
- Hide fields that are not useful.
- Build-in calculations that users will find helpful to:
- Describe business logic and rules.
- Match important business metrics that are already established.
- Create folders and/or field hierarchies to organize your measures and dimensions into "buckets" for analysis.
- Use LOD (level of detail) calculations to remove duplicates, to get around one-to-many relationships, to aggregate data to a specific detail, and to include and exclude dimensions.
- Add field synonyms for Ask Data.
- Use comments to add field descriptions.
Adding field comments
When you add field comments—comments in appropriate fields regarding what those fields represent—you are creating a basic data dictionary. According to the IBM Dictionary of Computing, a data dictionary is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format."
By commenting in the fields in your data sources, you can provide users with a rudimentary data dictionary that may prevent confusion about what certain fields mean, how values are calculated, and how fields relate to each other. Providing clear field definitions/descriptions can prevent misinterpretations and incorrect assumptions, especially in ambiguous situations. For example, "Ship Date" could be presumed to be the date on which shipment was booked, or the date on which shipment left the warehouse, depending on whom you ask. Having a good definition will clarify that. Providing a basic dictionary is particularly important if Tableau Catalog is not enabled. When Tableau Catalog is enabled, end users can see data details, including metadata like field comments, during analysis.
Users are able to see field comments as a tooltip when using data sources to create visualizations in either Tableau Desktop or web authoring as well as when using the Ask Data pane to explore data sources on Tableau Server.
Managing metadata
Metadata can be edited using Tableau Prep Builder, Tableau Desktop, and/or Tableau Server/Online, or a combination of these. As shown in the metadata model diagram at the beginning of this lesson, in addition to the data steward enriching data sources, users also have the flexibility to add metadata to suit their immediate analysis needs, according to the governance model. However, the more that the data sources are enriched prior to user analysis, the more consistent the analysis experience will be for everyone. Explore how to modify metadata in this four-minute video, including a brief demonstration of:
- Hiding fields.
- Creating hierarchies and folders.
- Changing data types.
- Setting default field properties.
Optimize metadata for Ask Data
Using "Ask Data"
Ask Data, Tableau’s natural language capability, works with your published data sources on Tableau Server or Tableau Online.
You can use Ask Data to:
- Use natural language processing (NLP) to ask questions of the data and automatically generate a workbook sheet with an appropriate visualization. You can add additional sheets to broaden your analysis.
- Learn more about a data source by exploring some of the metadata for each field.
- Save a workbook to a project, and then use Tableau Server's web editing functionality to refine it and create dashboards.
- View Ask Data usage analytics.
Ask Data is available for all user roles with administrator-granted direct access to data sources: Creators and Explorers. Ask Data usage analytics are available for administrators and data source owners.
With web-editing permissions enabled on Tableau Server/Online, Ask Data works with any published data source except for multidimensional cube data sources.
Ask Data is enabled by default when you publish a data source. However, you can disable Ask Data for a data source or specify how often analysis occurs on the data source.
- How does Ask Data use Natural Language Processing (NLP)?
Ask Data tries to understand intent by breaking the text entered by a user into phrases containing identifiable temporal, spatial, or numerical expressions. Ask Data then determines the relevant data type for each of these phrases. Using these relevant data types and visual best practices, Ask Data then determines the most appropriate visualization to satisfy the user’s intent. Ask Data requires English analytical phrases but works well with non-English data.
Ask Data understands the following analytical expressions:
- Aggregations—for example, “Sum of Sales,” “Average Profit,” or “Count of Customers”
- Grouping—for example, “by Region” or “by Sales”
- Sorting—for example, “Sort Products in ascending order by sum of Profit” or “Sort Customer Name in alphabetical order”
- Filtering—for example, numerical filters like “sum of Sales at least $2,000,” categorical filters like “Customer Name starts with John,” or time filters like "Sales in the last 10 years."
- Limits—for example, “top 5 Producers by sum of Sales” or “bottom Category by average Profits”
- When working with measures, Ask Data also understands the operators for addition (+), subtraction (-), and division (/)—for example, "avg Sales / avg Profit."
For more information about using analytical expressions within Ask Data, see the online help topic Supported Analytical Functions for Ask Data.
Asking a question
To use Ask Data NLP capabilities, simply enter your question into Ask Data. As you type, Ask Data interprets your question into the corresponding fields in the data source and presents you with potential visualization options.
The various options generated by asking the question,"What are the total sales by state"? Potential visualization options from which the user can choose (map or bar chart; exclude a field, list by highest total, list alphabetically)."What are the total sales by state?" in Ask Data
Saving the results as a workbook
After you have used Ask Data to create a new visualization, you can save the new workbook on Tableau Server (as in the video below). You can also add sheets for additional analysis:
- Click the "new sheet" icon in the lower-left corner (next to "Sheet 1").
- In the new sheet, enter your question into Ask Data (e.g., "What is the total sum of sales by city in Michigan?").
- Refine the visualization (e.g., delete extraneous data fields, choose a visualization type).
- Save your new visualization (e.g., "Sales in Michigan by City").
If you do not wish to keep the new visualization or want to change direction, click Clear All in the upper right corner and start over. Animation showing how to refine and save an Ask Data-generated workbook.
Refining and saving an Ask Data-generated workbook
How does Ask Data use metadata?
Ask Data analyzes the data source to gather metadata about each of the fields. Users see the results of this analysis in the field's tooltip when hovering over a field.
What is the best way to prepare data sources for NLP interaction?
Ask Data, Tableau’s natural language capability, is built to work with all your published data sources on Tableau Server or Online. But in order to take advantage of all the benefits of Ask Data, your data sources need to be curated to support an optimal analytical conversation.
In the whitepaper below, Vidya Setlur, a Development Manager on the natural language team at Tableau, describes how best to curate data sources for successful user experience when using Ask Data.
Follow best practices for metadata
Follow the checklist below to add metadata with the end-user in mind:
- Prepare your data. Try to anticipate the types of questions users will want to answer. Data shaping, joins and unions, and related data prep functions will help get the data into a suitable shape. When aggregating data, allow for the deepest analysis possible.
- Filter and size the data to the analysis at hand.
- Hide unused fields.
- Set up appropriate field defaults. For example, SUM may be an appropriate default for "Sales," but AVERAGE might be a better default for "Test Score."
- Set up percent and currency number formats.
- Apply formatting for dates.
- Set fiscal year start date, if applicable.
- Set data types.
- Set up logical hierarchies.
- Create meaningful binned fields (with appropriate bin sizes) for quantitative variables that a user might want to display as a histogram. For instance, creating a bin for "Age" enables the user to quickly use the binned data in a histogram.
- Create meaningful aliases for field values.
- Rename fields to use standard, user-friendly naming conventions.
- Differentiate attributes. If your data has multiple attributes containing a specific word (such as "Sales"), assign them unique names that are consistent with each other (such as "Sales (forecast)" and "Sales (actual)").
- Avoid naming fields as values. For example, avoid using field names like "Average," "Sales in 2015,"or "Most Products Sold."
- Geocode geographic fields.
- Add relevant calculated fields and remove duplicate or test calculations.
- Add field synonyms for Ask Data.
- Add comments to appropriate fields to provide a basic data dictionary. This is particularly important if Tableau Catalog is not enabled.