Monitoring and Maintaining Data

Publishing data within a governance framework

What does it mean to publish data within a governance framework? When an organization adopts a governance model, content (data and workbooks) publishing adheres to a predefined process that encompasses these five basic questions.

  • Who? -> Only certain groups of users have publishing permissions.
  • Where? -> Those publishing permissions pertain to a specific set of projects and data sources.
  • How? / What? / Where? ->There is a workflow that defines when content can be published to production projects.

Types of published data sources
As described in the Selecting Data to Meet Business Needs module, published data sources use a live query or an in-memory extract.

Extract creation
Extracts can be created using Tableau Desktop, the Hyper API, the Data Extract API 2.0 (for Tableau versions 10.5 and later), or the Tableau SDK (for Tableau versions 10.4 and earlier). Tableau also creates an extract of the JSON data returned by a Web Data Connector.

In Tableau versions prior to 10.5, extracts were saved as Tableau Data Extract (.tde) files. In version 10.5, Tableau transitioned to a Data Engine powered by Hyper technology, an in-memory technology optimized for fast data ingests and analytical query processing on large or complex data sets. For information on how to upgrade a .tde extract to a .hyper extract, see the online help topic Extract Upgrade to .hyper Format.

When saving an extract in Tableau Desktop, your options for which data to extract, how much of it to extract, to how to save what you extract are shown in the Extract Data dialog box.

  • Single table versus Multiple tables—This option is not available in versions prior to 2018.3. See "Single versus Multiple Tables" below for more information.
  • Filters—To make the extract as small as possible and thereby improve performance, filter out any data that you don't need.
  • Aggregation—Pre-aggregate measures using their default aggregation and roll up date values. This consolidates data rows and minimizes the size of the extract file.
  • Number of Rows—When working with large data sources, specify a smaller sample during the development phase. When you're ready to save the final extract, return to All rows.
  • Hide All Unused Fields—Unused fields are not included when you create the extract, making it smaller, which improves performance.

Single table versus multiple tables
You can choose to have Tableau store the data in your extract using one of two structures, or schemas: single table ("denormalized" schema) or multiple tables ("normalized" schema). Single-table extracts apply any joins at the time the extract is created and enable you to limit the amount of data in your extract with additional extract properties (such as filters, aggregation, and row counts) or pass-through functions (such as RAWSQL). This is the default extract structure that Tableau uses, and should be used in most cases. Multiple-table extracts may save space in situations where a single-table extract is larger than expected (for example, its total row count is greater than the row count of the single tables being joined). With this choice, only the Hide All Unused Fields option is available. Other restrictions apply (see the online help topic "Extract Your Data").


Publish content to Tableau Server/Online

Publishing data sources
The process for publishing data sources varies depending on if you’re using Tableau Server or Tableau Online, and if you’re working with a live data source or an extracted, in-memory data source. The governance model describes the data source management processes related to the distribution of data within your organization.
Best practices when you're ready to publish data sources

  • Clean up and customize the data for efficient use.
  • Create an extract (if appropriate).
  • Use/develop a data source naming convention—A well-considered naming convention can help users determine which data source to use.
  • Just remember that after you publish a data source, you'll no longer be able to rename it directly. Instead, you must publish the renamed copy, and then update all workbook connections.
  • Centralized data source management helps to avoid data source proliferation and to increase user trust. At a minimum, it is a best practice to designate the following roles:
  • Data stewards, who create and publish the data sources that meet your organization’s data requirements.
  • Site administrators, who manage published content, extract refreshes, and permissions on the server you publish to (Tableau Server or Tableau Online).
  • Use the About field of the published data source to provide some basic data lineage information.

What is data lineage?
Data lineage identifies where data has come from and the transformations it has gone through with time. Data lineage answers user questions, such as:

  • What source of data does this field come from?
  • How was the calculation made?

Lineage is particularly helpful when monitoring and maintaining data sources. There are a few options available to add data lineage.

Tableau Catalog
Tableau Catalog, a Data Management Add-on, discovers and indexes all of the content on Tableau Server, including workbooks, data sources, sheets, and flows. Indexing is used to gather information about the metadata, schemas, and lineage of the content. The lineage feature in Tableau Catalog indexes both internal and external content.

Tableau Server/Online
In Tableau Server, a simple approach is to use the data source's About field to convey basic lineage information such as the following:

  • The physical location of the original data source
  • Database tables or local file worksheets used
  • Filters used on the data source
  • Calculated fields created for the data source
  • Parameters created for the data source

The About field is displayed as part of a tooltip on the Data Sources page as well as when connecting to the data in Tableau Desktop.

Third-party tools
To provide complete data lineage, there are third-party data management tools that can be used with Tableau, or solutions created by the Tableau community. How are data sources shown in Tableau Server/Online?

  • Published and embedded data sources
  • Data Sources in the Show as menu
  • Published and embedded data sources can be viewed from the Data Sources page on Tableau Server.

To view a data source, select Data Sources in the Show as menu in the top-right corner of the page. (Data Sources is the default view for the Data Sources page.)

Data Sources drop-down menu
By default, only published data sources are displayed, but you can change the filter to show All data sources or to show Embedded in workbooks only.

Multi-connection data sources
Connection information in the Connects to column: Multi-connection data sources (such as cross-database joins) have a +N appended to the data connection that is listed in the Connects to a column. The +N indicates how many additional data connections the data source has.To see a tooltip that lists all the data connections associated with a multi-connection data source, hover over the connection information in the Connects to column.

Connections in the Show As menu
All data connections to published and embedded data sources can be viewed from the Data Sources page in Tableau Server. To view data connections, select Connections in the Show as menu.

Data connections in a multi-connection data source
The Connections view not only lists all the data connections available on the server, but also identifies the connection type, the authentication method and username used by the connection, and the data source that uses that connection. To quickly group all the data connections used by a multi-connection data source, click on the Data Source column heading to sort the list. All connections used by a multi-connection data source will then be listed together.

Configure Ask Data on a data source
Ask Data is enabled for most data sources by default, but you can disable it for data sources it won't be used with. For enabled data sources, you can change how often analysis occurs, optimizing system performance. The configuration options for Ask Data are on the Details page for the data source.

The configuration options for Ask Data
You control how often Ask Data analyzes the data source.

  • Automatic—checks for changes every 24 hours and analyzes the data source if it is live, has had an extract refreshed, or has been republished. Choose this option for a data source frequently used with Ask Data, so it will be ready before users query it.
  • Triggered by user request—analyzes the data source only if it has changed since a user last accessed Ask Data. Choose this option if the data source changes frequently but users query it with Ask Data only occasionally.
  • Disabled—analyzes only field names, not values.

You may want to disable Ask Data in the following circumstances:

The data source has user filtering or row-level permissions—The limitations listed previously are in place so that Tableau Server does not reveal any information to users who should not have access to it. If you believe that these limitations will cause confusion or that Ask Data is not worth using when the metadata analysis is not available, you may want to disable it. There is a lot of user traffic to views—Ask Data relies on the VizQL processes on the server. If those processes are already handling a high volume of user traffic, having additional users casually accessing those processes via Ask Data may cause performance issues.

Controlling which data users can see

Methods of control
Data security ensures that the appropriate data is seen by the appropriate people. Examples of control might be a doctor seeing only her patients’ data or a sales manager seeing only the information about her territory. There are a few ways data security can be implemented with Tableau Server or Tableau Online. This can be done solely in the database, solely in Tableau, or a hybrid approach, where user information in Tableau Server or Tableau Online corresponds with data elements in the database.

When a user logs in to Tableau Server, they are not logging into the database. This means that if you implement security in the database, Tableau Server users will also need to have credentials to log in to the database in order to see views. These login credentials can be passed using Windows Integrated Security (NT Authentication), embedding the credentials into the view when published, or prompting for specific user credentials.

Tableau also provides a user filter capability that enables row-level data security with the username, group, or other attributes of the current user. The filter appends all queries with a "where" clause to restrict the data and can be used with all data sources. The following three options work together to achieve different results:

t

The level of security needed for each data source is described in the governance model based on the user requirements.

If access for users can be allowed/restricted to the whole data source table, then you can control access by leveraging existing data security implemented in the database. The control that is possible varies with the combination of database login account and authentication mode. If you need to allow/restrict which specific rows that users can see in a data source table, then you can apply user filters.

What are user filters?
User filters allow you to limit the data that a specific person can see in a published view. These filters are created in Tableau Desktop and applied to a workbook or data source. For example, in a sales report for regional managers, you may want to only allow the western regional manager to see the western sales, the eastern manager to see the eastern sales, and so on. Rather than create a separate view for each manager, you can define a user filter that allows each manager to see the data for a particular region.

A user filter is defined for an individual field. Users or groups are given permission to see a subset of the members in that field. The user list comes from Tableau Server. When you publish to Tableau Server, the view is adjusted based on who is signed in and looking at it. Restricting access to data in this way is referred to as row-level security (RLS). Tableau offers different filter options to provide row-level security. In this five-minute video, learn how to create a manual user filter and an automatic user filter with a calculated field.

Manual
You can manually create a user filter that defines the specific data each user or group can access.
This method is convenient, but not automated. For instance, if your filter is based on individual users and the list of users changes, you have to manually update the filter. This can be done by creating a set or calculated field that may be used as a filter in Tableau Desktop, and can be published to Tableau Server.

Automatic
You can create a calculated field that automatically defines whether a user can access the data. This method requires that your data already contain the security information you want to use for filtering. If your users change, your filter automatically updates. This method can also improve performance as the number of users increases. Because filtering is defined at the data level and automated by the calculated field, this method is more secure than mapping users to data values manually.

Row-level security with extracts
Prior to version 2018.3, Tableau was unable to support row-level security (RLS) workflows with extracts because of complications around row duplication and performance. Now, RLS workflows with extracts are faster to create and have better performance than RLS with live data connections.

To use RLS with an extract, be sure to store the extract using the multiple-tables option. To effectively perform RLS with extracts, Tableau recommends limiting your extract to two tables:

  •     A data table—this is the "object" table that contains all the data you want to show.
  •     A reference table—this is the "look-up" or "entitlements" table that contains the user information and the security groups the users belong to.

Also, because the multiple-table extract option does not support data filtering or aggregation during extract creation, consider connecting to your data using custom SQL or a database view to achieve the desired level of data filtering before creating your extract.

Limitations of user-filtered data sources
If a data source has user filters (row-level permissions), those permissions will also apply to Ask Data, which won’t recognize secure values or make related statistical recommendations. You can still use user-filtered data sources in Ask Data. However, Ask Data's inability to profile or index such data sources or store metadata for the fields in the semantic model means that Ask Data cannot provide filter defaults, recognize currency concepts such as “cheap” or “expensive,” or show profile data in tooltips.

How can the other methods be used?
When publishing, you have the option to control what the user can see using the following options:

  • Permissions, which are usually set by the site administrator.
  • When you start the publishing process, the dialog box shows the permissions that will be applied. By default, the content you publish takes the capabilities that are already set on the server, typically as they’re set on the project you are publishing to.
  • If you think your data source is an exception, work with your admin to determine the best course of action.
  • Authentication, which is the process of verifying a user's identity. This requires an identity store (specified during installation) and the authentication mechanism used by your organization. When you publish a data source or a workbook with a live database connection, you can choose an authentication mode. The options available for accessing the data source depending on the type of data you publish and whether you are publishing to Tableau Server or Tableau Online.

The combination of the database login account and authentication mode gives you options on how to control the data that users can see. User filters, the embedded password option, and the impersonation modes all have similar effects: When users click a view, they see only the data that pertains to them, and they are not prompted for database credentials. However, user filters are applied in the workbook by authors, and the impersonation authentication modes rely on security policies defined by administrators in the database itself.