dbnet.home Bookmark and Share

MEET FEATURE REQUIREMENTS, SCHEDULE, and BUDGET! Consulting available to teach your organization to apply this methodology and tools for powerful quantitative management of your business processes.

  • Instruction of your staff at your site, using courseware, application example, and a functional template.
  • Mentoring your key staff on an ongoing or periodic basis, on these tools or on Quantitative Program Management.
  • Contracting or Employment in your organization on specific topics.

Brochure  ...  Contact for Details

Check out my YouTube Channel: Power Operational Intelligence

Now Live! Overview, Structure, Task Data, Table Design, SQL Server, and Re-Linking now showing.

Video courses covering material on this website, and more, are presented in the playlists.

Code snippet links at YouTube Code Snippets. Twitter at @poweroperation1, #poweropi, #poweroperationalintelligence.

Subscribe on YouTube, and click the "Notification" Bell icon to be notified as content is published.

The Zen of Excel Pivot Tables

Zen

As a Program Manager, you will control key processes and underlying data. If a PM is in control of the project schedule, the project budget, and the project materials (including build plan, distribution plan, test plan, and configurations), that will provide excellent mechanical grasp of the program. Pivot Tables are a powerful tool for gathering data and analyzing it and turning it into information for driving operations.

Most books on pivots focus on construction mechanics. I use Pivots every day to organize and report on data that is critical to planning and operation in my function. That data changes constantly, new kinds of data are added, new data structures are needed, as well as new reporting - all to help me manage complex areas within my primary function. I have found there is a level of design above the technical construction of the Pivot Table that helps make the Pivot Tables themselves more straightforward to create, and helps with better evolution and reporting. I think this embodies the Zen of Excel Pivot Tables.

Pivot Tables: Two underlying principles

  • It is useful to spend as much time designing the data underlying a Pivot Table, as on designing the Pivot Table itself. The structure of the data records, and of fields within records, has great influence on usability Pivot Tables providing views of the data.

  • A Pivot Table provides essentially the functionality of a relational database SQL SELECT statement complete with field selection, grouping, subtotals, and filtering. It is a SELECT in two dimensions, allowing you to select, sort, filter, and combine records just like a SQL database, but also to select, sort and combine fields in report columns. Because of this relationship to SQL SELECT, it is useful to apply database design principles – in particular, data normalization – when designing a Pivot.

A well-designed Pivot Table can graduate from a static report included in a Powerpoint presentation or MSWord report, into a dynamic entity useful to manage operational processes with constantly changing conditions. I enjoyed some eureka moments as I found I could continuously manage fabs, builds, assemblies, budgets, and bug reports (to name a few!) using these tools, and customize them to processes and needs specific to my organization and project.

Designing the Data Table

SQL SELECT

  • The designers of Excel Pivot Tables obviously drew much from database design and reporting. A SQL SELECT statement is used in a relational database to extract data from a set of database records, sort and subset those records, combine or subtotal them, and to pick particular fields from the records for use or display. That’s exactly what a Pivot Table does. (Here’s a description of SELECT from Microsoft, and another from mySQL.) So the leap is to realize that database design principles apply to Excel.

Data Organization

  • Thoughtful article in The American Statistician capturing many excellent rules for data, learned over countless implementations: Data Organization in Spreadsheets.

  • Think of data underlying a Pivot Table as a table in a relational database, and apply thinking similar to such a design and how a Select query would use it.

  • Organization of data in a table of data paints your view of the world. Since Pivots are usually used to describe real entities or actions (builds; manufactured objects; bugs), data is usually self-organizing since the world is inherently organized already - but it falls to you to discern the patterns. It’s usually good practice for the data to model real-world entities, much like the “is a” or “has a” concept used when designing programming objects. Then when objects are specified, work on identifying real-world facts/values/attributes. For example, identify and distinguish among properties of use of an object, versus properties related to building it; values around it such as dates of an operation, or financial metrics. Associate attributes and actions with well-defined objects, and design your data table around this principle. The organization of the data should remain fairly simple. The data will represent complexities of the real-world objects you’re working with, or of situation; but the design of the records themselves containing the data should remain simple. This will give the greatest flexibility and power to reporting on the data with a Pivot.

Data Normalization

  • Much is taught about how to design data to eliminate redundancy, ambiguity and dysfunctional relationships among data items to eliminate errors and to improve flexibility. An important concept for organizing data is called Data Normalization. Here’s an informal description, and a more formal description with references to deeper sources. In short,

    • Don’t repeat the same data among multiple records. (Common example: it’s better to reference a single address from records for multiple people living at an address than it is to repeat the same address in records for each of the multiple people.)

    • Each record should specify a data object, identify a fact about it, and specify the value of the fact. The value must depend on the full object specification, not on a subset of the specification. The record may provide values for one or more facts about the object specified, as long as every value in the record is dependent on the full object specification.

    • In each record, value fields must be about the specified data object and not about some other object that may be associated. Every field in each record should either be part of specifying the specific object, or provide a value for the object.

    • In particular, no relationships should exist among records – one record shouldn’t reference another. Also, order of the records should not matter.

    • These basically describe the first three normal forms, and there are several more in common practice. If you can design your Pivot Table data to come close to conforming to First, Second and Third Normal Form as above, you will be way ahead of most Pivot Table designers.

  • Since Excel doesn't provide native relational capabilities you may only be able to partially achieve elimination of replicated data, although you can use VLOOKUPs and INDEX/MATCH constructs to a degree. (And this article on this website describes how Excel can be used with a database, for which data normalization would certainly be important.)

  • It can be useful to calculate some fields in a pivot record that may be used to aggregate data in the pivot. For example, providing fields that translate a date to calendar year, fiscal year, and calendar and fiscal quarter are very useful to aggregate data in Column Labels. This isn't compliant with the data normalization rules just stated, but can still be useful. Just make sure that such aggregations don't introduce ambiguity. A month clearly belongs to exactly one fiscal quarter for example, and the calculation is based solely on information already contained in the record (that is, the date field).

  • Excel isn't a database. But awareness of normalization principles, and striving toward conformance, will provide more powerful and more flexible pivots. And if you are designing data structure for use with a database driving pivots to report as described in other articles on this website, you should certainly normalize your data.

  • Normalization techniques and tutorials




Data Table Design

  • A difference between a Pivot Table and a SELECT statement is that a Pivot allows you to select, sort and filter records as does a SELECT, but also allows similar operations on value fields from the selected records. So besides normalizing data among records, consider how to define specification of the value (fact) associated with the object specified in each record.

  • When designing data records to be used in Pivot Tables, think of the form of the resulting report. A set of fields in each record will specify an object (example of hierarchy: product, project, assembly, part name, design revision, build identifier, operation, etc. – hierarchy fields are increasingly specific), likely represented as row in the Pivot Table (or multiple of them may be aggregated by specifying fewer levels of the specification hierarchy). More fields may define facts about the object (like quantity, cost, dates, locations, users, test, results, etc). Some may be numeric and these may be aggregated in Pivot Table value columns; others may be textual and can be listed as attributes detailing the object in its row fields. The set of fields specifying the object should be considered as a hierarchy defining the rows of the Pivot Table; sorting, sub-totaling, filtering, organizing among the objects. The set of fields specifying facts about the objects should also be considered a hierarchy, defining the value columns of the pivot, allowing the same organization, selection and combinational operations on the data. This is what will allow you to subtotal among object types in horizontal rows, and to show data meaningfully in such aggregations as time/quarter/year or build or revision order among the data columns.

  • A nicely-constructed hierarchy can automatically populate rows and columns in a sensible way as field values are added. The data itself takes a role in determining the report structure. See the example below on normalized data self-organizing a Pivot Table.

  • Of course you will find that different organizations of the data, beyond the initial concept described above, will become useful. You can easily arrange Pivot Tables to provide such additional views. But starting with a base view and implementing it in both the data records and in a basic Pivot Table view of those records will aid you in deriving those additional views.

  • Stating all the above points succinctly, a value is stated for a fact about an object; and a key value for that specific fact is specified. The key consists of hierarchical sets of fields that identify a specific object and identify a specific fact about that object. This hierarchy of fields facilitates useful sorting, combination, and filtering. The full key specifying the object and value can be broken into two sub-set keys, one used to specify row organization and the other to specify value column organization of a Pivot Table. Multiple useful views can be constructed using varying combinations and orders of fields within the key.

  • Mechanics: In the data records, create a field called RowNum. Put the row number in that row using a formula like "=row()". This will help with debug if you can’t tell which fields and rows are summing into a value. You can also include the RowNum field in the pivot report while constructing it to make sure you’re collecting the records you intend.

Pivot from Excel Structured Table

  • A Pivot Table data source may be a range specified as A1C1, or by a named range. It can also be specified as a named Table, which provides very useful capabilities.

  • Adding records or fields within the scope of a data range is easy: select a row or column, Insert, and enter the new data record or field values. Remember to name the new field. But inevitably, some records are going to want to be added to the end of data range, or fields added to the right of the range. Using a range is awkward, because either the Pivot has to be re-pointed to the expanded range area, or the range area of the named range has to be updated (although dynamic range definition can be used to help with this.)

  • If a Table is used, you just enter data below the existing table or to the right of the existing table, and Excel automatically adds that row or column to the table. (As always, remember to name the new field.) Added rows and columns will be included automatically in the Pivot Table data source if the Pivot Table specifies the table name as its data source. You can re-name the table in the Table Tools : Design : Properties ribbon. You must refresh the Pivot Table to see the added rows and columns available in the data source.

  • When adding a field to a table, Excel may copy down the cell contents to all rows already existing in the table. This is of course a nice time-saver, but be careful if you have different entries in some rows that they are not over-written.

  • Mechanics: Name the Table. Name the tab containing the Table to match the name of the Table. Then name the columns of the Table some version or abbreviation of “TableName-fieldname”. Helps keep track of where data is coming from when debugging.

Table to Pivot

  • Pivot Tables then are views of that data. One data table may drive multiple Pivots with different uses and users. Use the data table over time to collect data and organize it, always fitting it into your data-organizational context. The Pivot Tables then can present different data organizations, sub-sets and combinations of the data. The same data table may drive a build plan, a budget, a schedule. Those are all related by logic and data, but are useful to different audiences and use different subsets of data describing a project. I’ve taken examples from my experience, but I’m sure every Program Manager will see similarities to their work.

Design of the Pivot Table

Pivot Table Construction

  • Making a professional Pivot Table requires good report organization, and also some formatting and functionality.

  • Some notes on mechanics of creating Pivot Tables from the data.

    • Layout: Start with “Tabular Form”. I almost always stay with that.

    • Outer (leftmost) Row Labels have broadest granularity, grouping sets of objects. Once the specified object level is reached, you can add object value columns as Row Labels if they don’t need to be sub-totaled. This adds detailed description to the report.

    • Start with the vanilla pre-configured Light Format (top left); or one of the simple formats. Don’t over-format.

      • Select and color-fill rows containing sub-totals, and make these rows Bold typeface. A click at the left of a subtotal row selects all subtotals at that level. Graduate the colors for total/subtotal rows as they go from inner to outer totals.

      • Make Headings Bold typeface.

      • Leave the Heading column filters enabled.

    • Subtotals. By default, Excel will subtotal Values associated with Row Label fields as you add Row Labels in the Wizard. I turn them off while building the report, so I can watch the organization of the report as fields and data are added. Turn on the ones you want after the structure is in place.

    • You can represent Values hierarchically using the Column Labels just like the rows. For example, a “Fiscal Year” field at the top level, “FQ” field below that. Then you can specify subtotaling for “Fiscal Year”.

      • Here’s an example showing an effect of data normalization:

      • First, let’s say you are building a budget, and an item plans to incur expenses in 2013FQ1, FQ2, FQ3, FQ4. At first glance, it seems you would have a record specifying ItemName, and then a value for the expense in each quarter in columns 2013FQ1, 2013FQ2, 2013FQ3 and 2013FQ4. When building the Pivot Table, you put the fields 2013FQ1, 2013FQ2, 2013FQ3 and 2013FQ4 into the Values field. If you have two fiscal years, you will have to manually add 8 quarter fields.

      • Here’s a better method, made available by normalizing the data: the format of each record should be ItemName, FY, FQ, and Expense, and data entered would be of the form “item name”, “FY2013”, “FQ2”, $3022.76 . Instead of one record for each item with a value column for each quarter, you will have a record for each item for each quarter, each specifying a value for the quarter specified by the record. Then when building the Pivot Table, the field ItemName specifies the Row Label, the field FY is specified at the top of Column Labels and field with field FQ below it in Column Labels (the hierarchy) and Expense is summed in Values (it will be Sum of Expense). Then the FY and FQ fields will populate automatically.

      • If there are numerous row fields in your report, Defer Updates in the wizard until fields are selected into the Pivot Table.

    • Most Value entries will want to be “Sum”. If Excel thinks the field you put into the Value area is text, it will set it to “Count” instead of “Sum” even if it looks like a number. Make sure you have “Sum”s where you need them.

    • I think the amount of ink in the report should be proportional to the information conveyed, so I use the numeric “Accounting” format extensively for numeric fields. If the value isn’t currency I turn off the currency symbol and set the decimals appropriately. Then “0” values show as “-“.

    • Excel wants to name value columns “Sum of X” causes wide columns and is frequently redundant (e.g., "FY Total" is obviously a sum of the quarter totals to the left…). It then won’t let you eliminate the “Sum of”. I usually make that into “_X” - for example: “_FY Total”

    • Table Options

      • Turn off display of expand/collapse buttons

      • Turn on Print Settings

    • Filters and Options

      • Filter each Row Field column as needed to make the report display what you want. Also you can filter Value columns. For example, you could report current year and next year, but not prior year or two years out.

      • Sort row entries. The scope of any sort is within the field next highest in the hierarchy, so don’t expect to sort a field in column K and to have it sort anything in columns A-J.

      • Value columns can be sorted too. So you can achieve order such as FQ1-FQ2-FQ3-FQ4, or sort in fiscal month order like “Aug-Sep-…-Jan-Feb-…-Jul” within a fiscal year. Companies set up the FY like this to confound Excel noobs.

      • You can drag row and column order, as well as sort them.

    • Add custom page headers and footers. In the header, provide path-filename-tab.

    • Freeze top rows including filter rows, and heading rows so you can scroll the table up and down without losing displayed headers.

    • Set print area (Values area), rows to show at top (down through headers), rows to show at left if the table is wide.

    • One Pivot Table per tab. Then copy a tab to create a new Pivot Table from the same data table, with headers, footers, orientation, page size, view freeze etc., and modify it as needed.

  • When in doubt of which records are included in a value in the pivot, double-click the value in the Values area, and a new tab will be created listing the rows included. If you have a RowNum field that helps even more here; and the column name may provide clues to problems, if the column name includes identification of the data table.

Connection to a separate data source

  • A Pivot Table has a relationship to SQL SELECT, but can’t really do JOINs as they can be done in a SQL SELECT statement. But Excel can use a SQL database to provide this capability.

  • Excel can be a source of data into a database, and an Excel Pivot Table can take its data from a table or query in an ODBC or OleDb database. Other articles on this website show how this can be done. Details are given for Linking Excel with MSAccess.

  • All of the techniques described here apply equally to any Excel Pivot Table using a database as its data source.