Cluster for Data Workbench

Hae

Päivitetty viimeksi 6. toukokuuta 2021 | Koskee myös seuraavaa Insight

This article is a simplified illustration of the product. It is intended to aid analysts so they can visualize which entity fulfills what role. Readers are encouraged to check our documentations for detailed information on what is covered on this article.

Main Actors

Previously, we discussed how individual visitor data is constructed here, and how visitor data form the dataset here. We are stepping back further and look at the whole cluster in this article.

DPU - Custodian of Dataset -

DPU (Data Processing Unit) is the custodian of a cluster. When playing the role of a Processing Server, the DPU generates queryable dataset on local temp.db (card holder).

DPU also plays the role of Query Server where it acts as a single point of contact from a client application to other DPUs.

FSU as Primary Server - Coordinator of Dataset -

A team of servers needs a supervisor, and one FSU (File Server Unit) fulfils that role as Primary Server. It stores a master copy of the dataset architecture (schema), configurations, and performance parameters. The Primary Server is also known as Synch Server.

FSU for Non-Primary Roles

To lighten the workload of the Primary Server, some of the roles can be delegated to additional FSUs. We will briefly cover indexing decoded data (Normalization), exporting dataset (Segment Export), and converting raw log files (Transform).

For a small cluster, one FSU is often sufficient to fulfill all management roles. As your cluster grows, Primary Server can be overwhelmed. Adobe recommends the use of an additional FSU if it becomes the bottleneck.

Huomautus:

For other FSU roles (Logging Server, Source List Server, and File Server) and components (Sensor and Repeater) please check our documentation here.

Client

The Data Workbench Client is the front end application. Analysts uses it to run a query against its Query Server, an architect uses it to configure the schema on Primary Server, and administrator uses it to manage various servers.

Report Server

Report Server automates reporting functionality of client application. It runs queries from a report set and deliver the output through various channel.

Building Queryable Dataset

Synchronization - Sharing Instructions -

At first, DPUs do not know where to find various material or how to process them. Through synchronization, they will fetch the instructions, schema, and resource map from the Primary Server (FSU).

Instructions are now provided, but the card holder is still empty.

Log Process - Building Dataset -

Using the synchronized instructions and map, each DPU finds the log files and starts decoding them.

If a decoded event belongs to an existing visitor, it is appended to their cards. If it happens to belong a visitor on another DPU, it is forwarded. Finally, if the event belongs to no existing visitor, new card is created.

Visitors are evenly distributed across all DPUs. For example, on a 10-DPU cluster with 5 million visitors, each DPU holds data for 500,000 visitors. However, because each visitor has different data size, the size of temp.db will not be equal across DPUs (though they tend to be very close).

Normalization - Indexing Decoded Data -

As each DPU process the input, dimension elements are indexed on Normalization Server simultaneously.

Typically, one FSU fulfills the role of both the Primary Server and the Normalization Server.

Once the log process is completed, we will have a query-able dataset.

Interactive Analysis with Data Workbench Client

Finally, the dataset is ready for analysts. For in-depth analysis, a series of queries are executed as one question leads to another, and it is best done using the Data Workbench Client.

1. Assign Query Server

First, Data Workbench Client connects to a Primary Server, and one of the DPUs will be assigned as its Query Server.

Client application is now "Online" with the dataset profile. From this point on, this DPU becomes a point of contact for this Client.

2. Running Queries

The client will send query string to its query server. Once the query is received, Query Server forwards the same request to other DPUs. It also runs the query against its own temp.db (card holder) and return the result back to Client along with results from other DPUs.

Back on Client side, query results will be translated into various form of visualizations as they come in stream. The finished workspace can also be saves as template for Report Server.

Scheduled Report with Report Server

Report Server automates the query executions. At a specified time, it picks up a report set from the Primary Server and executes them, and then deliver them through various methods such as emails.

Exporting Dataset with Segment Export

Specific segments of the dataset can be exported as delimited text files. Based on the export definition file (*.export), each DPU filters out its share of data and sends them to the Segment Export Server.

Exported data from DPUs are combined into one file on the Segment Export Server, and uploaded onto a specified location. The file is typically imported into various third party analysis tools or custom applications for further analysis.

Huomautus:

Just like Normalization Server role, one FSU can fulfill both Primary and Segment Export role on a small cluster. However, as it involves a batch of large data transmissions, Segment Export can easily strain FSU's resource.

Converting log sources with Transform

FSU can work as a Transform Server. Unlike segment exports (which is dataset-to-text export), Transform is a simple text-to-text conversion. It also takes one or more types of text inputs and merge them into a single text file such as Sensor data (.vsl), log files, XML files, and ODBC (text) data. It is often used to pre-process the raw data before feeding them to a dataset.

As you see, the Transform Server (FSU) operations can run without dataset or DPUs. For this reason, the Transform server often runs independently.

Pyydä apua nopeammin ja helpommin

Oletko uusi käyttäjä?