Tools for Multi-Dimensional Problems Using Excel with Access or SQL Server
Excel gives computer users access to great computational and data handling capability, with no real programming knowledge. So there is a wide range of problems that people have aimed Excel at - and a wide range of quality of solutions.
I think Excel is best at handling "flat" problems, ones that can be represented as an M x N table, and that often Excel solutions get cumbersome and complex as people try to deal with real life problems that aren't well represented by such tables. Excel is so flexible and powerful that usually it can be done, but the solution may be over-complex, cumbersome or brittle, or inextensible, difficult to support, or may not use the true power and beauty of Excel.
In many cases, the problem to be solved may change over time, and the size and shape of tables may change including more records or more data fields, or may be incrementally expanded for use in additional situations as they evolve. Excel solutions tend to become brittle as this occurs, as extensive maintenance is needed to keep existing calculations pointing to cell ranges correctly, or as additional calculations cascade to multiple worksheets or even to inter-connected workbooks. Using Excel with a database provides quite cleanly for expansion, contraction, re-structuring, adding and customizing the solution implementation.
Interestingly, you can build really powerful and elegant solutions to non-flat problems by using Excel with a database. Access comes as part of Office, and SQL Server Express is free - so databases are accessible. I haven't used MySQL but that should be easily available too.
Most people don't have a very good idea what Access is or how to use it and may be a bit intimidated by it. Turns out Excel isn't hard to use with a database, and following are some notes about how to do it. Access is the easiest to use because it has built-in integration features, includes built-in UI features, and is programmable with VB if you want that - but you can do a lot using Excel as the UI, and without needing to use VB.
SQL Server needs a bit more knowledge and at least basic SQL to build queries, but again you can get a lot of power without getting too deep into SQL. I've given book references that will help.
The example below uses Excel for data input. Nearly all users are familiar with Excel, and this can save you a lot of effort creating a UI in Access or .NET. Local calculations can be made and controlled by the input users. Excel can give your users too much control, but if your target users are program core team members or line managers this is almost always manageable.
The database query fetches data from Excel, combines the multiple data sets (for example, test configurations, wafer and lot designations, package information, and yield data) to create build instructions (for example).
Excel pivot tables driven by query output provide powerful reporting capability. Excel pivot tables are quite functional and have excellent formatting for useful reports.
Please note that use of a named range and a line number field as shown in this tutorial are my preferred approach, but they are not the only way to use Excel and Access in the way shown in this tutorial. The use of line numbers provides helpful debugging information for complex designs, and helps identify the area of data records. However, data provided from another source without these constructs can be used directly as well by linking a whole worksheet into Access instead of linking a named range, and by identifying a field in the input data that is non-blank in all valid records and can thus be used to delineate valid data from blank space. Once you understand the basic method shown here, the approach lends itself straightforwardly to many variations.
Data analysis for the Energy Economics pages on this website is done using the techniques described here, illustrating data analysis capabilities. I've also used this structure to actively plan, track, update, and manage organizational budget and PO tracking, and also to manage custom chip development test planning and materials including die and wafer configuration and inventory, module builds, test platform configuration and builds, and test sequencing and scheduling.
Excel: Set up input data
The approach I prefer is to set up a named range in Excel to hold input data. The range is delineated to remind users to insert new records between two rows of colored cells. Each line in the range has a Line Number field, shown at left, containing a formula to give the row number. Name that field using the intended name of this table, resulting in something like "ConsumptionLineNo". In the example below it is named "Sheet1LineNo". In fact, I recommend using this practice for all fields in each table to make the names unique among tables. This will avoid name conflicts in subsequent steps, as well as help clarify the source of output data in queries. The formula is shown below the data, to illustrate how this is done. The Line Number of the range bottom orange cells, is set to 0 instead of using the row number formula.
It would be really nice to use a Dynamic Range, or even better an Excel Table, to hold the input data, since those would define range references that expand and contract to hold just the data present. However, Access appears to only see absolute addresses so you can't link directly to a nice dynamic range. The delineation row approach described here approximates that functionality. Also, you can create an Excel Table within that delineated range so within Excel you can use the table functions while maintaining absolute references in the link to Access.
At this step is an opportunity to clean up the data. Decide how you're going to handle blanks; out-of-range, spurious, or malformed data; numeric vs. character data; dis-ambiguation (e.g., North Korea; Korea, North; and DPRK are no doubt the same entity. Are Bill Gates and William Gates the same person?). Also, structure your data so you can take in a wholesale update when needed. If your project is successful, you'll want to put in data for the next fiscal year; or the next project; or another organization. I make it a rule to never "touch" original data input. Rather, I accept data in whatever form it is given, and create and run programmatic clean rules to clean or structure it, in Excel or in Access. Then when I get a bulk update, I can insert it into my project and run the cleanup rules against it.
The named range itself includes all data in all rows of data and includes both top and bottom orange delineation rows. First select the data and orange rows as shown in the figure below. Click "Formulas > Name Manager" in the ribbon, and click the "New" button to create the named range.
I named this range in this example "sheet1DB", but you should name it something useful that describes the data, like "ConsumptionDB" to make it easier to use from Access.
Access: Create tables linked to input Excel tables
Close the Excel file, and open Access. Create a new database and open it.
Click "External Data > Excel" on the Access ribbon. Enter the Excel file name, and choose "link" on the panel shown below. When you add subsequent tables you will be given a choice to append data to an existing table in Access, but you should continue to use the "Link" choice for each table you add.
Access: Choose the Excel data source.
Select "Show Named Ranges", and select the range specifying the table you are inserting.
Check the box for "First row contains column headings" on the following pane.
Name the Access linked table. Then repeat all the steps above and link all other Excel data source tables.
Access: Create a Query to Link Tables and Create Output Table
First, let's get rid of the delineation rows from the Excel data. They are in Excel to make data entry easier without having to take care to insert data within the range, but are not needed from here on. They'll be removed by a query. Select "Create > Query Design" in the Access ribbon. Select one of the tables of imported data set up just above.
Drag the * symbol from the table pane into a column of the design pane, to include all input data fields. Also drag down the appropriately-named "line number" field to another column of the design pane. Click in the "Field" cell in the column for that field, and prefix the name with something like "LineTest: ". In the "Criteria" cell for that column, enter ">0", which will eliminate the bottom orange delineation row from the Excel input. (The top orange row will be included, because it specifies the field names.) Un-check the "Show" checkbox for this "LineTest" field. The figure below shows the resulting query design. The "LineTest" field will not show in the query result, but will filter out the bottom delineation row.
You could do other selection criteria and manipulations here, such as selecting years greater than 1980 for example, if you had a year field and wanted to eliminate those earlier years. You should also complete any data clean-up within this step, so that you can reference this query in subsequent "work" queries knowing the data is cleaned and structured in final form.
Now save this query. Right click the tab above the design pane, select "Save" and give it a name. I suggest using the name of the table, prefixed by a "q". From here on, this query will be used instead of the linked table, to reference data for this table linked to Excel.
Repeat these steps for all input tables linked to Excel. The result is a query representing each linked input table, but filtering out the artifacts included in the Excel table design that were helpful for data input in Excel but which will not participate in the following data analysis.
Now to create a useful data analysis query. Click "Create > Query Design" in the Access ribbon. The "Show Tables" dialogue shows. Select the "Queries" tab, and select appropriate queries for the linked Excel data. You can also select other tables that you keep in Access if your query design needs that. In this example, we will link two tables from Excel so we insert both queries into the new query. See just below.
Drag fields needed for query output, down to field output columns in the design table. Use the "*" symbol to include all fields from an input query or table. (The reason we named the "LineNumber" fields in the Excel data to be unique among the linked tables, was to avoid naming conflicts if we do this to include all fields in the query design.) Link fields in the tables by dragging from one to the other. This is a pretty simple query to link two tables, but more complex queries can be built in this step. Refer to the Access books referenced. Access provides a graphical interface that simplifies building queries. If you're using SQL Server you write these queries in T-SQL. You can also use SQL in Access. However, simple queries like this example will be sufficient for many practical problems.
Here's where the Access book will come in handy. Simply put, if you use the default link shown by the straight line in the figure above, query records will be produced only if there is a match between the linked fields in qSheet1DB and qSheet2DB. Below, I've right-clicked on the link. The selection shown will produce records where qSheet1DB.Column1 field entry matches the qSheet2DB.Column1 entry, and also for all records in qSheet1DB that don't have matching entries in qSheet2DB. You'll see the result below in the finished query.
I just linked the line numbers in this example since it's meaningless data... But in practice, one table might contain a collection of budget line items with associated data (description, project, account, department, responsible person etc.), and another table could contain expense dates and data associated with the expense (capex, or expense; hours and rate; PO number and PO line number; planned and actual amount, date; etc.) that are associated with the budget items For such a structure, an arbitrary budget line item number could be put in both tables and used to link dates to descriptions.
You can compute fields in this step, and set criteria to limit records based on field values. For example, you could only produce a record if the value of field qSheet1DB.heading2 is ">10". To create a calculated field, right click in a Field entry, and select the "Build..." option. From that dialogue you can select fields from tables and queries, use built-in or user-created (VB) functions and so forth to build complex expressions. An expression might look like "exprName: [table1]![field1] + abs([tableX]![fieldY])".
Save the Query just designed. Right click its tab above, select "Save", and enter a name. You will access this query from Excel to drive a pivot table to further process and display the data.
You can examine the query result in Access. In the figure below, we're looking at Table view of the query to see the output.
Excel: Insert a Pivot Table
Now use Excel to query the Access result, and produce a pivot table for the results. Re-open the Excel file. On the "Insert" ribbon, select "Insert PivotTable". Select "Use an external data source", and click "Choose Connection...".
(Of course you could instead insert a QueryTable or PivotChart using nearly the same procedure as will be described.)
If you have previously set up a connection to the Access table you're going to use to drive thispivot table, you can select it in this pane.
If you need to set up a new connection, click "Browse for More..." and go browse for the Access database.
Select the appropriate table or query to drive the pivot table. In this example, we select the analysis query created in recent steps above.
Note that an Action Query in Access, such as a Union query, will not show here for selection by Excel. If your solution contains such a query, you can build a SELECT query to wrap the Action query (select "*" to put all Action query fields into the Access query fields, as in "SELECT * FROM hiddenQuery" where hiddenQuery is the Action Query you're wrapping). The wrapper SELECT query should then be selectable from Excel and give you access to the hidden query.
Also, I've found it sometimes difficult in Access to use Outer Joins using a multi-field compound key. So in the Excel input table, you can create a field to synthesize a compound key, concatenating the fields desired for the compound key. For example, create key values such as "itemname:revision:test" (where in this example colons are inserted to maintain readability). Then use this synthetic key in a compound join in a query.
You can edit the properties of the connection by clicking "Data > Connections" in the Excel ribbon. Select the connection you just created and click the "Properties..." button and then the Definition tab to show the pane below. It's useful to add the name of the Access table associated with a query, to the name of the connection. This makes it easier to find and re-use the connection if you design multiple pivot tables from a common Access query.
Excel: Design the Pivot Report
An Excel pivot table is a great way to organize and use data. After completing the steps above, Excel displays its usual Pivot Table design elements. The fields are shown for the external Access query output created. Select them as usual when designing a pivot table. There's a book referenced on Excel Pivot Tables that will be helpful to you. Make sure you choose "Sum of..." or "Count of..." in the Values columns at bottom right. I used Design > Report Layout > Tabular Design for the example shown.
Pivot reports typically take a lot of tweaking to become persistently useful, particularly in sort order and in fields included. So don't be shy about evolving them. Often, the underlying query will contain a lot of fields and one query may support multiple pivot reports, for example sorted by program; by department; by user; by responsibility; by PO; by item number. And of course you can also build a pivot chart.
Above I gave examples of tables that might be used to handle budget and expense data. In that case, reports might show expenses by project; by Program Manager; by Date; by Planned vs. Actual; by Account; and permutations of these. Other implementations I've used have managed custom chip material by Lot, Wafer, and Process skew; Module Configuration (die, skew, package, substrate) by engineering test, qual test, characterization procedure; die yield and inventory by lot, wafer, base revision and metal revision, and skew; system configuration and silicon module configuration by system test and date and many more operational material planning and management needs. This data structure can support solutions to a wide variety of problems.
To debug the pivot, you can double-click Value entries (not Column Headings or Row Headings), and Excel will create a new worksheet containing all the query records that contribute to that value. The added sheet will be named something like "Sheet1". Make sure it shows that all the proper records are showing up and that there aren't records included that you're not expecting. It's a good idea to include in the query output all the line numbers from each table the query uses, so you can trace back through your queries to underlying data from the input tables. When you're through debugging, delete the added sheet.
- Input data into the the data tables in Excel. You can insert and delete rows between the orange lines, to add or delete records. You can add fields too, but the example shown doesn't take care to include them in the named range so you will have to do this manually. But it's probably much more likely you'll add records and pretty infrequent you'll add new fields.
- When all data has been updated in the Excel tables, save the workbook.
- In Excel, select Refresh All on the Data ribbon.
- Did you notice that you didn't have to open Access? (As long as you use Select queries, you won't have to open Access. If you use an Action Query (Make Table, Append, Delete, Update), you will have to open Access to execute that query so avoid that.)
This structure by itself is powerful in linking tables and handling variable-sized or non-flat problems. You can build much more sophisticated queries, calculate fields. This structure gets you entree into a very powerful world. The books listed will take you far.
If you've put the Excel client worksheet and Access database file into directories that have previously been set up as trusted, you are all set.
To set up a trusted directory, Office Button (top left) > Excel (Access) Options (bottom right) > Trust Center > Trusted Locations > Add New Location...
If not in a Trusted Directory: When you re-open either the Excel file or Access files linked as above, a yellow security shield will be shown in the bar near the top of the app window. Click the associated button, and enable external access so the files can be linked.
The example files are available: Excel Workbook and Access Database (right click and "save target as..."). The Excel file is a ".xslm" file. Internet Explorer, and possibly other web clients, will save this to your computer as a ".zip" file. So change the file extension to ".xslm" in the download dialogue, or you can change it on the file on your computer once the download has completed. The note below is about how to re-link references between the Excel file and Access database on your computer once you have downloaded them. The workbook also contains code to refresh a pivot table on worksheet tab selection described below, and re-fitting of all columns to spec when a pivot is refreshed.
I usually create a macro to run queries. An example is shown in the example Access Database. SQL Server is a little more complex: the query output table must be deleted in the T-SQL script prior to running the make-table query.
One small complication with linked Excel and Access, the only one I've found that's complicated. If you drive a pivot table from a table in an Access database or SQL Server database, Excel provides SQL for the link. However, the addresses in the automatic link are absolute so if you move the database, the Excel pivot still points to the old location.
If you move the Access database, you have to re-link the Excel pivot table to point to the new database location. Here's some code that will change the links to point to the new location, for pivot tables created in Excel2007, and here is code to re-link a pivot table created in Excel2003. Use the 2003 code for a pivot created in Excel2003 even if it is used from Excel2007, since it will run in Compatibility Mode in the newer version of Excel. I add a worksheet to manage this operation; it's structure is shown in the example Excel Workbook.
If you move the Excel workbook client, you'll have to re-link the linked tables in Access. On the ribbon "Database Tools" tab, click "Linked Table Manager" and use the Linked Table Manager dialogue box. Make sure the box "Always prompt for new location" at bottom is checked so you can browse for the new location of the Excel client workbook.
Of course if you move both the Excel workbook client and the Access database, you have to perform both re-link operations described just above.
Data used by Excel in an application linked as above uses a hidden Pivotcache in Excel to locally store the result of the database query. What if you need to perform local calculation in Excel using data from the query? Use a Query Table. The Query Table is visible and you can access it in calculations. The Query Table can be placed on a worksheet that you hide from the user to simplify the user interface. Linking to a Query Table is described well in many texts including the Excel 2007 VBA Programmer's Reference. However, Query Tables have the same problem with absolute addresses as described above.
Here is code to re-link a Query Table to a new database location. The notes above about re-linking a pivot cache apply the same to a query table, except that this code should be used instead to manage the query connection object rather than the pivotcache object.
I would prefer to have pivot tables update each time they become visible. This avoids using data in a table for which you forgot to click the "Refresh" button after updating the source data. Here's a link to code to update the pivot tables on a worksheet when you click the worksheet tab.