With Adobe Analytics Data Workbench, analysts can query visitor activities across different channels. This article shows how they merge into a single visitor data for queries.

Note:

This article is strictly a simplified illustration. For example, each visitor data consists of references/pointers instead of actual strings, and data from different sources may require pre-processing to map out visitor's identifier.

Abstract view

This diagram shows how multiple sources merge into one dataset.  Let us go through each section.

1-01

Raw Log Data

On the left, we have log sources. We use Webserver logs and in-store transactions (POS) in this example. Data Workbench can take any event log data as long as they have visitor identifier and timestamp.  

2b-01

Note that they are siloed at this point, web log shows what visitors do on the website while POS log tells in-store data only.

Data Architecture

In the middle, we have Dataset Architecture. This defines how each log source should fit. It also instructs how they should be transformed into a more readable form.

3-01-2

Processed Visitor Data (Dataset)

Finally, event data from all channels are stored into one visitor data, this is akin to a card holding all event data for one anonymized visitor. It contains what one visitor has done on the web in addition to what she or he has purchased at the store locations.

4a-01

Simply put, Raw Input Data on the left are flowing into Processed Visitor Data on the right using Data Architecutre as a template.

Closer look

Let us walk though this process again using example data.

From a web server log, each log entry is decoded and then placed onto relevant dimensions on the schema. In this example, visitor "anonymous001" purchased one product, and selected in-store pickup.

5a-5-01

In-store transactions are also decoded for the same visitor. This person picked up the product on the following day and decided to add two more items at the cash register.

6a-6-01

Activity data are then transformed into format suitable for queries.  In this case, product SKU is replaced with its product name using lookup transformation.

7a-01

Once the data from all channels have combined together, we get a visitor data on a single card.

8-01

This card shows that one anonymous customer ordered one item online, picked it up at the store, and bought additional items.  Unlike original input data, one card provides holistic insight into visitor activities across different channels.

Running queries

Now that one visitor data is processed, let us consider the following analytical query.

"Among the customers who made pick-up orders on the web, how many bought additional items at the store? Also, which products were popular as an in-store add on?"

By looking at the card above, the query engine should tally:

+1 visitor for in store pickup metric
+1 product for “USB Cable” for additional purchase dimension
+1 product for “Kale Chips” for additional purchase dimension

Repeating this micro query for all other cards will eventually yield the answer to the entire dataset.

Performance consideration

This approach provides the following advantages:

  1. Cards are self-contained, so micro queries complete with fewer costly external references.
  2. These cards can easily be distributed for parallel processing.

DataSet

All cards are stored on a card holder known as DataSet, which is commonly referred to as "temp.db" because of its file name.

9-4-01

As this wheel keeps spinning, each card can be evaluated one by one. When the same card was shown again, you know one sweep has finished and entire dataset has been evaluated.

We cover overall life cycle of the dataset on the next article here.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy