A Cartesian product is a set of data that is constructed from different sets of information and comprises all pairs of elements. This can cause an inflation/duplication of data in the report in some cases.

For example, if A = {1, 2} and B = {3, 4, 5}, then the Cartesian Product of A and B is {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}.

This can happen within Results when multiple kernels of data interact and are pulled together into a single table from a single query. Participant focused universes include several different kernels that are not related to each other. In these universes, the Participant Site Identifier is the "anchor." The report understands that each of these kernels (TouchPoints, Program Enrollment, Caseload, etc.) are linked to the participant but they are not linked to each other in any way. For example, the program enrollment kernel is not related to the caseload kernel, and the caseload kernel is not related to TouchPoint kernels. Because of this, the report will retrieve a row of data for every possible combination of these records for each participant.

This can create sluggish queries as a Cartesian product is created, and the report pulls in more and more rows of data.

Below are screenshots of different examples.

Example 1: 1 TouchPoint

This table contains the participant information of a single TouchPoint that has been taken twice. With different response IDs and dates taken, each TouchPoint response is given its own row.

Adalaide also has 3 caseworkers, as seen on each row in this table.

Example 2: Mixing Caseload and TouchPoints

If we need to include Caseload and TouchPoint information in the same table, the report doesn't know how the two are related: it creates a row for every possible combination, and the table has now increased to 6 rows because it is duplicating the TouchPoint responses 3 times, once for each caseworker.

Example 3: TouchPoints, Program Enrollments, and Caseworkers

In this final table, program enrollment has been added, Adalaide has been enrolled in 2 different programs. The report is now created a row for every possible combination of TouchPoint response, and program, and caseworker. Our 2 responses have multiplied to bring in 12 rows of data for a single participant: 2responses x 2enrollments x 3caseworkers = 12 rows of data

Individual flattened TouchPoints will behave the same way if you put responses from more than 1 TouchPoint into a single table, the data will be inflated in a similar way as above. This can be avoided with unflattened TouchPoint data.

Each report is a case by case basis, and the goal of the report determines if the data should be flattened, unflattened, or if multiple queries would be a benefit.

Did this answer your question?