Multi-domain management AEM mappings for url shortening | AEM 6.x

Objective

It is quite common that a single AEM instance handles multiple sites. Usually, each page has a separate domain assigned and can have a few language versions. Having such a multi-domain installation requires some additional, nontrivial configuration. This post is intended to serve as a quick referecnce document wherein you can learn how to configure 3 language versions of the well-known Geometrixx site to work on 3 domains: geometrixx.com, geometrixx.de and geometrixx.fr.

Steps


		
	





First, let's configure AEM itself (with the Sling Mappings engine), create Apache rewrite rules and VirtualHosts, eliminate the cross-domain cache injection threat and finally we will perform refactoring of the Apache configuration to make it more concise.

Sling Mappings engine

The best way to map a domain name to a web site in AEM is to use Sling Mappings. Mappings provide two useful features:

  • Long links in page content are shortened to a friendly form,
  • Short links are resolved to a full content path.

To rewrite the ingoing URLs back to the long format of /content/sitename, we leverage mod_rewrite on the Apache level. However, you cannot shorten links in outgoing HTML without modules like mod_subsitute. So, to rewrite the HTML output (such as <a href=...> attributes) we use Sling Mappings under /etc/map.publish to map the URLs.  Then AEM's out-of-the-box link rewriter uses ResourceResolver.map to rewrite those mappings into the HTML.  By default, mappings are placed in the JCR under the /etc/map/http node.  The /etc/map/http node affects author instances in addition to publish instances.  So instead, we will use /etc/map.publish so it only affects publish instances.  This way you can use one common package with mappings for both author and publish instances.

Note:

In order for the different /etc/map location of /etc/map.publish to be loaded on publish instances we need to update an OSGi configuration.  Change the resource.resolver.map.location property in the configuration of the org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl OSGi configuration to /etc/map.publish.

{ 
  jcr: primaryType: "sling:OrderedFolder", 
  geometrixx_com: { 
    sling:internalRedirect: ["/content/geometrixx/en.html"], 
    jcr:primaryType: "sling:Mapping", 
    sling:match: "geometrixx.com/$" 
  }, 
  geometrixx.com: { 
    sling:internalRedirect: ["/content/geometrixx/en"], 
    jcr:primaryType: "sling:Mapping", 
    redirect: { 
      sling:internalRedirect: ["/$1","/content/geometrixx/en/$1"], 
      jcr:primaryType: "sling:Mapping", 
      sling:match: "(.+)$" 
    } 
  }, 
  ... 
}

After three dots in line 16 there are similar entries for .de and .fr domains: geometrixx_de with geometrixx.de and geometrixx_fr with geometrixx.fr.

Mapping geometrixx_com (lines 3-7) is responsible for redirecting to the root page. So, if the user enters geometrixx.com she or he will receive the page /content/geometrixx/en.html. The dollar sign at the end of sling:match (6) is a regexp control character meaning "end of the string", which results in the fact that this mapping will not be applicable if the user enters any path after the slash.

Mapping geometrixx.com (8-16) is more complex. It consists of the parent (8-16) and the child (11-5). The parent does not contain the sling:match property, so the node name (geometrixx.com) is used as a URL pattern. This entry is responsible for shortening long links to a shorter form with a domain name, e.g. /content/geometrixx/en/products will be shortened to geometrixx.com/products.html.

A child entry is responsible for URL resolution. In order to match this mapping, a URL has to begin with geometrixx.com (a domain inherited from the parent mapping) and after that it has to contains non-empty path string (regular expression (.+)$ at line 14). sling:internalRedirect at line 12 is a list containing two entries: /$1 and /content/geometrixx/en/$1. If the user enters geometrixx.com/etc/designs/geometrixx.css, the first entry will be used. If the user enters geometrixx.com/products.html, Sling will choose the second one and return /content/geometrixx/en/products.html.

You can play with mappings using the Apache Felix web console. Just click the Sling Resource Resolver link in the menu.

Apache mod_rewrite

After defining mappings (and probably adding an appropriate domain to the hosts file) we can enjoy our multi-domain AEM installation with short links. There is only one problem: a dispatcher. If we use some standard dispatcher configuration, there will be one cache directory for all sites. If the user requests the page geometrixx.com/products.html, a dispatcher will create the file /products.html in the cache dir. Now, if some other user requests the page geometrixx.de/products.html, a dispatcher will find its cached English version and will serve it to the German user. In order to avoid such problems we should reflect the JCR directory structure in a dispatcher. The easiest way to expand shortened paths is to use the Apache rewrite engine. Basically, we will try to simulate the Sling resolving mechanism. The following rules will do the job:

RewriteEngine On 
RewriteRule ^/$ /content/geometrixx/en.html [PT,L] 
RewriteCond %{REQUEST_URI} !^/apps 
RewriteCond %{REQUEST_URI} !^/bin 
RewriteCond %{REQUEST_URI} !^/content 
RewriteCond %{REQUEST_URI} !^/etc 
RewriteCond %{REQUEST_URI} !^/home 
RewriteCond %{REQUEST_URI} !^/libs 
RewriteCond %{REQUEST_URI} !^/tmp 
RewriteCond %{REQUEST_URI} !^/var 
RewriteCond %{REQUEST_URI} !^/dispatcher 
RewriteRule ^/(.*)$ /content/geometrixx/en/$1 [PT,L]

At the begining (1) we check if the entered URL contains an empty path (e.g. http://geometrixx.com/). If so, the user will be forwarded to the homepage. Otherwise, we check if the entered path is shortened (it does not begin with apps, content,  home, etc. - lines 2-8). If it is, the rewrite engine will add /content/geometrixx/en while creating the absolute path (9).

Apache VirtualHost

As you can see, this rule is valid only for the geometrixx.com domain, so we need similar rules for each domain and some mechanism for recognizing a current domain. Such a mechanism in Apache is called VirtualHost. A sample configuration file of the Apache2 VirtualHost looks as follows:

<VirtualHost *:80>
ServerAdmin webmaster@localhost
ServerName geometrixx.com

DocumentRoot /opt/aem/dispatcher/publish
<Directory /opt/aem/dispatcher/publish>
Options FollowSymLinks
AllowOverride None
</Directory>

<IfModule disp_apache2.c>
SetHandler dispatcher-handler
</IfModule>

[... above rewrite rules ...]

LogLevel warn
CustomLog ${APACHE_LOG_DIR}/access-geo-en.log combined
ErrorLog ${APACHE_LOG_DIR}/error-geo-en.log
</VirtualHost>

All VirtualHosts can use a shared dispatcher directory. Create similar files for each domain.

Cross-domain injection threat

Because users are able to enter a full content path after a given domain name, e.g. geometrixx.com/content/geometrixx/en/products.html, they may as well get a page that belongs to some other domain, e.g. geometrixx.com/content/geometrixx/fr/products.html. In order to avoid such a situation, we need to check all requests for path beginning with /content and reject these which are not related to any campaign, DAM or a current domain:

RewriteCond %{REQUEST_URI} ^/content
RewriteCond %{REQUEST_URI} !^/content/campaigns
RewriteCond %{REQUEST_URI} !^/content/dam
RewriteRule !^/content/geometrixx/en - [R=404,L,NC]

Macros

Our rewrite configuration has become quite complicated and (what is worse) has to be included in each Apache VirtualHost configuration. Fortunately, we can avoid repetitions using the Apache macro module. Add the following expand-aem-paths file to your conf.d directory:

<Macro ExpandAEMPaths $path>
RewriteEngine On

RewriteRule ^/$ $path.html [PT,L]

RewriteCond %{REQUEST_URI} ^/content
RewriteCond %{REQUEST_URI} !^/content/campaigns
RewriteCond %{REQUEST_URI} !^/content/dam
RewriteRule !^$path - [R=404,L,NC]

RewriteCond %{REQUEST_URI} !^/apps
RewriteCond %{REQUEST_URI} !^/content
RewriteCond %{REQUEST_URI} !^/etc
RewriteCond %{REQUEST_URI} !^/home
RewriteCond %{REQUEST_URI} !^/libs
RewriteCond %{REQUEST_URI} !^/tmp
RewriteCond %{REQUEST_URI} !^/var
RewriteRule ^/(.*)$ $path/$1 [PT,L]
</Macro>

After that you can include a macro in each VirtualHost with the Use directive:

Use ExpandAEMPaths /content/geometrixx/en

Because the Macro module is an external Apache2 library, you might need to install it separately. On Debian you can install and enable it using two commands:

# apt-get install libapache2-mod-macro
# a2enmod macro

If you use any other Linux distribution or Windows, please find the appropriate version of the module and the installation instruction on the mod_macro homepage.

Dispatcher configuration

You can use the out-of-the-box dispatcher configuration. The only assumption is that its docroot is set to /opt/aem/dispatcher/publish.

Summary

Now, you have configured a AEM installation with 3 domains, using Sling Mappings, Apache 2 mod_rewrite and VirtualHost mechanisms. We have also prevented cross-domain injection attacks and performed Apache 2 configuration refactoring using mod_macro. The configuration described above should be enough to prepare a custom multi-domain installation from scratch.

 Adobe

Get help faster and easier

New user?

Adobe MAX 2024

Adobe MAX
The Creativity Conference

Oct 14–16 Miami Beach and online

Adobe MAX

The Creativity Conference

Oct 14–16 Miami Beach and online

Adobe MAX 2024

Adobe MAX
The Creativity Conference

Oct 14–16 Miami Beach and online

Adobe MAX

The Creativity Conference

Oct 14–16 Miami Beach and online