Article summary

Summary

Discusses how to create an Adobe Experience Manager component that uses both HTLM Template Language (HTL- formerly known as Sightly) and Sling Models together. 

A special thank you to Praveen Dubey, a member of the AEM community for contributing AEM code that is used in this article. 

This article uses an Adobe Maven Archetype project to build an OSGi bundle. If you are not familiar with an Adobe Maven Archetype project, it is recommended that you read the following article: Creating your first AEM Service using an Adobe Maven Archetype project.

Note: HTL is the AEM template language that can be used to replace use of JSP when developing an AEM component. HTL helps you to separate your design from your application logic. For more information, see Introduction to the HTML Template Language.

Digital Marketing Solution(s) Adobe Experience Manager (Adobe CQ)
Audience
Developer (intermediate)
Required Skills
Java, Sling, HTML
Version AEM 6.0/6.1

Note:

This is an AEM 6.0, 6.1 article. To read the AEM 6.2 version, click Creating a HTML Template Language and Sling Model DOM parser component for Experience Manager 6.2.

Note:

You can download an AEM package that contains code and the OSGi bundle that are used in this article. Download the package and deploy using package manager. The purpose of this code is to show the community these concepts in action. That is, it's to illustrate how to use HTL and Sling Models together to create a DOM parser component. This community code is for teaching purposes only and not meant to go into production as is.

You can view the application by using the following URL: http://localhost:4502/content/sample.html (assuming you deploy on author).

* aemslingmodel-htl-jsoup.zip
An AEM 6.x package that contains the HTL and Sling Model DOM Parser component
* OSGi_bunldes.zip
If you deploy the above package and there are not two OSGi bundles installed (JSOUP and My Project Bundle), then download this ZIP and install the two OSGi bundles manally.

Note:

Ensure that both OSGi bundles are in Active state; otherwise, this sample AEM application does not work.

Introduction

You can create an Adobe Experience Manager (AEM) custom component using HTL and Sling Models that is able to parse a Document Object Model (DOM) located in a web page and write the results to an AEM web page. For example, assume you have a requirement to parse a web page and write out the images in an AEM web page. Using a custom DOM parser component, you can implement this requirement.

This AEM development article walks you though how to use HTL and Sling Models to create a custom AEM component that is able to parse a DOM. The component has a dialog that let you specify an URL to a web page to parse. 

Dialog
A dialog for the HTL component

Then the component parses the corresponding web page. Notice in the above example, the AEM forums page was specified: http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manager.html.

The component parses the web page and writes out the results. In this example, because the images check box was selected in the dialog, the component parses the page for images. 

dom
Images retrieved from the AEM Dom Parser component

To create a custom AEM DOM parser, you use the JSOUP library. That is, the application logic required to parse HTML is developed by using the JSOUP API. For information about this API, see:

http://jsoup.org/

In addition to using the JSOUP API, you also use Sling Models and HTL. You define a model object, which is a Java object that is mapped to a Sling object, typically resources, but also request objects. A Sling Model is implemented as an OSGi bundle .A Java class located in the OSGi bundle is annotated with @Model and the adaptable class. The data members (Fields) use the @Inject annotation. For more information, see Sling Models.

Note:

If you install the package shown at the start of this article, you do not need to perform these steps. You can read the article to understand the concepts and view the code in AEM as a result in installing the package. 

Create an AEM application folder structure 

Create an AEM application folder structure that contains templates, components, and pages by using CRXDE Lite.

CQAppSetup

The following describes each application folder:

  • application name: contains all of the resources that an application uses. The resources can be templates, pages, components, and so on.
  • components: contains components that your application uses. 
  • page: contains page components. A page component is a script such as a JSP file. 
  • global: contains global components that your application uses.
  • template: contains templates on which you base page components. 
  • src: contains source code that comprises an OSGi component (this development article does not create an OSGi bundle using this folder). 
  • install: contains a compiled OSGi bundles container.

To create an AEM application folder structure:

  1. To view the CQ welcome page, enter the URL http://[host name]:[port] into a web browser. For example, http://localhost:4502.

  2. Select CRXDE Lite (if you are using AEM 5.6, click Tools from the left menu). 

  3. Right-click the apps folder (or the parent folder), select Create, Create Folder.

  4. Enter the folder name into the Create Folder dialog box. Enter myproject

  5. Repeat steps 1-4 for each folder specified in the previous illustration. 

  6. Click the Save All button.

Note:

In this example, you do not have to place a page folder under components. The page component is created later in this development article under /apps/myproject.

Note:

You have to click the Save All button when working in CRXDE Lite for the changes to be made.

Create a template 

You can create a template by using CRXDE Lite. A CQ template enables you to define a consistent style for the pages in your application. A template comprises of nodes that specify the page structure. For more information about templates, see Templates.

To create a template, perform these tasks:

  1. To view the CQ welcome page, enter the URL http://[host name]:[port] into a web browser. For example, http://localhost:4502.

  2. Select CRXDE Lite (if you are using AEM 5.6, click Tools from the left menu).

  3. Right-click the template folder (within your application), select Create, Create
    Template.

  4. Enter the following information into the Create Template dialog box:

    • Label: The name of the template to create. Enter contentpage
    • Title: The title that is assigned to the template.
    • Description: The description that is assigned to the template.
    • Resource Type: The component's path that is assigned to the template and copied to implementing pages. Enter myproject/page/contentpage.
    • Ranking: The order (ascending) in which this template will appear in relation to other templates. Setting this value to 1 ensures that the template appears first in the list.
  5. Add a path to Allowed Paths. Click on the plus sign and enter the following value: /content(/.*)?.

  6. Click Next for Allowed Parents.

  7. Select OK on Allowed Children.

Create a render component that uses the template

Components are re-usable modules that implement specific application logic to render the content of your web site. You can think of a component as a collection of scripts (for example, JSPs, Java servlets, and so on) that completely realize a specific function. In order to realize this functionality, it is your responsibility as a CQ developer to create scripts that perform specific functionality. For more information about components, see Components.

The following illustration shows the file structure created in this section.

sling
AEM files under myproject/page

By default, a component has at least one default script, identical to the name of the component. To create a render component, perform these tasks:

  1. To view the CQ welcome page, enter the URL http://[host name]:[port] into a web browser. For example, http://localhost:4502.

  2. Select CRXDE Lite (if you are using AEM 5.6, click Tools from the left menu).

  3. Right-click /apps/myproject, then select Create, Create Component.

  4. Enter the following information into the Create Component dialog box:

    • Label: The name of the component to create. Enter page.
    • Title: The title that is assigned to the component.
    • Description: The description that is assigned to the template.
    • Super Type: foundation/components/page (in AEM 6, you specify this value for page components. In previous versions of AEM, this was not required.)
  5. Select Next for Advanced Component Settings and Allowed Parents.

  6. Select OK on Allowed Children.

  7. Under /apps/myproject/page, add the following files.

    • author.html
    • body.html
    • head.html
    • page.html
  8. Right-click /apps/myproject/components, then select Create, Create Component.

  9. Enter the following information into the Create Component dialog box:

    • Label: The name of the component to create. Enter contentpage.
    • Title: The title that is assigned to the component.
    • Description: The description that is assigned to the template.
    • Super Type: myproject/page (in AEM 6, you specify this value for page components. In previous versions of AEM, this was not required.)
  10. Select Next for Advanced Component Settings and Allowed Parents.

  11. Select OK on Allowed Children.

author.html

The following represents the author.html file.

<!--/* Outputs the WCM initialization code.If WCM mode is disabled nothing is rendered */-->
<meta
	data-sly-use.wcmInit="/libs/wcm/foundation/components/page/initwcm.js"
	data-sly-use.clientLib="${'/libs/granite/sightly/templates/clientlib.html'}" 
	data-sly-test="${!wcmmode.disabled && wcmInit.isTouchAuthoring}" data-sly-call="${clientLib.all @ categories='cq.authoring.page'}" data-sly-unwrap></meta>
<meta data-sly-test="${!wcmmode.disabled && !wcmInit.isTouchAuthoring}" data-sly-call="${clientLib.all @ categories='cq.wcm.edit'}" data-sly-unwrap></meta>
<script data-sly-test="${!wcmmode.disabled && !wcmInit.isTouchAuthoring}" type="text/javascript">
    (function() {

        var cfg = ${wcmInit.undoConfig @ context='unsafe'};
        cfg.pagePath = "${currentPage.path @ context='uri'}";

        if (CQClientLibraryManager.channelCB() != "touch") {
            cfg.enabled = ${wcmmode.edit @ context="scriptString"};
            CQ.undo.UndoManager.initialize(cfg);
            CQ.Ext.onReady(function() {
                CQ.undo.UndoManager.detectCachedPage((new Date()).getTime());
            });
        }
    })();

    CQ.WCM.launchSidekick("${currentPage.path @ context='uri'}", {
        propsDialog: "${wcmInit.dialogPath @ context='uri'}",
        locked: ${currentPage.locked @ context="scriptString"},
        previewReload: "true"
    });
</script>

body.html

The following code represemts the body.html.

<div class="content" data-sly-include="content.html">
This will be removed.
</div>

head.html

The following code represents the head.html file.

<meta data-sly-test="${!wcmmode.disabled}" data-sly-include="author.html" data-sly-unwrap></meta>


<title> Simple Parser</title>

<!--/** needed for cloudservices like DTM  **/-->    
<meta data-sly-include="/libs/cq/cloudserviceconfigs/components/servicelibs/servicelibs.jsp" data-sly-unwrap/>   

page.html

The following code represents page.html. 

<html>
	<head data-sly-include="head.html">
		<script>
			All of the inner elements will be removed during rendering...
		</script>
	</head>
	<body data-sly-include="body.html">
	</body>
</html>

content.html

The following code represents code located at /apps/myproject/page/contentpage/content.html.

<div data-sly-resource="${ 'quote' @ resourceType='myproject/components/parser'}">
When using things like data-sly-resource, this content will be replaced by the output of the component.
</div>

Setup Maven in your development environment

You can use Maven to build an OSGi bundle that uses the JCR API and is deployed to Experience Manager. Maven manages required JAR files that a Java project needs in its class path. Instead of searching the Internet trying to find and download third-party JAR files to include in your project’s class path, Maven manages these dependencies for you.

You can download Maven 3 from the following URL:

http://maven.apache.org/download.html

After you download and extract Maven, create an environment variable named M3_HOME. Assign the Maven install location to this environment variable. For example:

C:\Programs\Apache\apache-maven-3.0.4

Set up a system environment variable to reference Maven. To test whether you properly setup Maven, enter the following Maven command into a command prompt:

%M3_HOME%\bin\mvn -version

This command provides Maven and Java install details and resembles the following message:

OS name: "windows 7", version: "6.1", arch: "amd64", family: "windows"

 

Note:

For more information about setting up Maven and the Home variable, see: Maven in 5 Minutes.

Next, copy the Maven configuration file named settings.xml from [install location]\apache-maven-3.0.4\conf\ to your user profile. For example, C:\Users\scottm\.m2\.

You have to configure your settings.xml file to use Adobe’s public repository. For information, see Adobe Public Maven Repository at http://repo.adobe.com/.

The following XML code represents a settings.xml file that you can use.

<?xml version="1.0" encoding="UTF-8"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!--
 | This is the configuration file for Maven. It can be specified at two levels:
 |
 |  1. User Level. This settings.xml file provides configuration for a single user, 
 |                 and is normally provided in ${user.home}/.m2/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -s /path/to/user/settings.xml
 |
 |  2. Global Level. This settings.xml file provides configuration for all Maven
 |                 users on a machine (assuming they're all using the same Maven
 |                 installation). It's normally provided in 
 |                 ${maven.home}/conf/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -gs /path/to/global/settings.xml
 |
 | The sections in this sample file are intended to give you a running start at
 | getting the most out of your Maven installation. Where appropriate, the default
 | values (values used when the setting is not specified) are provided.
 |
 |-->
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ~/.m2/repository
  <localRepository>/path/to/local/repo</localRepository>
  -->

  <!-- interactiveMode
   | This will determine whether maven prompts you when it needs input. If set to false,
   | maven will use a sensible default value, perhaps based on some other setting, for
   | the parameter in question.
   |
   | Default: true
  <interactiveMode>true</interactiveMode>
  -->

  <!-- offline
   | Determines whether maven should attempt to connect to the network when executing a build.
   | This will have an effect on artifact downloads, artifact deployment, and others.
   |
   | Default: false
  <offline>false</offline>
  -->

  <!-- pluginGroups
   | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.
   | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers
   | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.
   |-->
  <pluginGroups>
    <!-- pluginGroup
     | Specifies a further group identifier to use for plugin lookup.
    <pluginGroup>com.your.plugins</pluginGroup>
    -->
  </pluginGroups>

  <!-- proxies
   | This is a list of proxies which can be used on this machine to connect to the network.
   | Unless otherwise specified (by system property or command-line switch), the first proxy
   | specification in this list marked as active will be used.
   |-->
  <proxies>
    <!-- proxy
     | Specification for one proxy, to be used in connecting to the network.
     |
    <proxy>
      <id>optional</id>
      <active>true</active>
      <protocol>http</protocol>
      <username>proxyuser</username>
      <password>proxypass</password>
      <host>proxy.host.net</host>
      <port>80</port>
      <nonProxyHosts>local.net|some.host.com</nonProxyHosts>
    </proxy>
    -->
  </proxies>

  <!-- servers
   | This is a list of authentication profiles, keyed by the server-id used within the system.
   | Authentication profiles can be used whenever maven must make a connection to a remote server.
   |-->
  <servers>
    <!-- server
     | Specifies the authentication information to use when connecting to a particular server, identified by
     | a unique name within the system (referred to by the 'id' attribute below).
     | 
     | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are 
     |       used together.
     |
    <server>
      <id>deploymentRepo</id>
      <username>repouser</username>
      <password>repopwd</password>
    </server>
    -->
    
    <!-- Another sample, using keys to authenticate.
    <server>
      <id>siteServer</id>
      <privateKey>/path/to/private/key</privateKey>
      <passphrase>optional; leave empty if not used.</passphrase>
    </server>
    -->
  </servers>

  <!-- mirrors
   | This is a list of mirrors to be used in downloading artifacts from remote repositories.
   | 
   | It works like this: a POM may declare a repository to use in resolving certain artifacts.
   | However, this repository may have problems with heavy traffic at times, so people have mirrored
   | it to several places.
   |
   | That repository definition will have a unique id, so we can create a mirror reference for that
   | repository, to be used as an alternate download site. The mirror site will be the preferred 
   | server for that repository.
   |-->
  <mirrors>
    <!-- mirror
     | Specifies a repository mirror site to use instead of a given repository. The repository that
     | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
     | for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
     |
    <mirror>
      <id>mirrorId</id>
      <mirrorOf>repositoryId</mirrorOf>
      <name>Human Readable Name for this Mirror.</name>
      <url>http://my.repository.com/repo/path</url>
    </mirror>
     -->
  </mirrors>
  
  <!-- profiles
   | This is a list of profiles which can be activated in a variety of ways, and which can modify
   | the build process. Profiles provided in the settings.xml are intended to provide local machine-
   | specific paths and repository locations which allow the build to work in the local environment.
   |
   | For example, if you have an integration testing plugin - like cactus - that needs to know where
   | your Tomcat instance is installed, you can provide a variable here such that the variable is 
   | dereferenced during the build process to configure the cactus plugin.
   |
   | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles
   | section of this document (settings.xml) - will be discussed later. Another way essentially
   | relies on the detection of a system property, either matching a particular value for the property,
   | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a 
   | value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'.
   | Finally, the list of active profiles can be specified directly from the command line.
   |
   | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact
   |       repositories, plugin repositories, and free-form properties to be used as configuration
   |       variables for plugins in the POM.
   |
   |-->
  <profiles>
    <!-- profile
     | Specifies a set of introductions to the build process, to be activated using one or more of the
     | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>
     | or the command line, profiles have to have an ID that is unique.
     |
     | An encouraged best practice for profile identification is to use a consistent naming convention
     | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc.
     | This will make it more intuitive to understand what the set of introduced profiles is attempting
     | to accomplish, particularly when you only have a list of profile id's for debug.
     |
     | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.
    <profile>
      <id>jdk-1.4</id>

      <activation>
        <jdk>1.4</jdk>
      </activation>

      <repositories>
        <repository>
          <id>jdk14</id>
          <name>Repository for JDK 1.4 builds</name>
          <url>http://www.myhost.com/maven/jdk14</url>
          <layout>default</layout>
          <snapshotPolicy>always</snapshotPolicy>
        </repository>
      </repositories>
    </profile>
    -->

    <!--
     | Here is another profile, activated by the system property 'target-env' with a value of 'dev',
     | which provides a specific path to the Tomcat instance. To use this, your plugin configuration
     | might hypothetically look like:
     |
     | ...
     | <plugin>
     |   <groupId>org.myco.myplugins</groupId>
     |   <artifactId>myplugin</artifactId>
     |   
     |   <configuration>
     |     <tomcatLocation>${tomcatPath}</tomcatLocation>
     |   </configuration>
     | </plugin>
     | ...
     |
     | NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to
     |       anything, you could just leave off the <value/> inside the activation-property.
     |
    <profile>
      <id>env-dev</id>

      <activation>
        <property>
          <name>target-env</name>
          <value>dev</value>
        </property>
      </activation>

      <properties>
        <tomcatPath>/path/to/tomcat/instance</tomcatPath>
      </properties>
    </profile>
    -->
  

<profile>

                <id>adobe-public</id>

                <activation>

                    <activeByDefault>true</activeByDefault>

                </activation>

                <repositories>

                  <repository>

                    <id>adobe</id>

                    <name>Nexus Proxy Repository</name>

                    <url>http://repo.adobe.com/nexus/content/groups/public/</url>

                    <layout>default</layout>

                  </repository>

                </repositories>

                <pluginRepositories>

                  <pluginRepository>

                    <id>adobe</id>

                    <name>Nexus Proxy Repository</name>

                    <url>http://repo.adobe.com/nexus/content/groups/public/</url>

                    <layout>default</layout>

                  </pluginRepository>

                </pluginRepositories>

            </profile>

</profiles>

  <!-- activeProfiles
   | List of profiles that are active for all builds.
   |
  <activeProfiles>
    <activeProfile>alwaysActiveProfile</activeProfile>
    <activeProfile>anotherAlwaysActiveProfile</activeProfile>
  </activeProfiles>
  -->
</settings>

Create an Experience Manager archetype project for 6.1 

You can create an Experience Manager archetype project by using the Maven archetype plugin. In this example, assume that the working directory is C:\AdobeCQ.

plugin1

To create an Experience Manager archetype project, perform these steps:

  1. Open the command prompt and go to your working directory (for example, C:\AdobeCQ).

  2. Run the following Maven command:

    mvn archetype:generate -DarchetypeRepository=https://repo.adobe.com/nexus/content/groups/public/ -DarchetypeGroupId=com.day.jcr.vault -DarchetypeArtifactId=multimodule-content-package-archetype -DarchetypeVersion=1.0.2 -DgroupId=com.mycompany.myproject.components -DartifactId=domparser -Dversion=1.0-SNAPSHOT -Dpackage=com.mycompany.myproject.components -DappsFolderName=myproject -DartifactName="My Project" -DcqVersion="5.6.1" -DpackageGroup="My Company"

  3. When prompted for additional information, specify Y.

  4. Once done, you will see a message like:
    [INFO] Finished at: Wed Mar 27 13:38:58 EDT 2013
    [INFO] Final Memory: 10M/184M

  5. Change the command prompt to the generated project. For example: C:\AdobeCQ\domparser. Run the following Maven command:
    mvn eclipse:eclipse

After you run this command, you can import the project into Eclipse as discussed in the next section.

Add Java files to the Maven project using Eclipse 

To make it easier to work with the Maven generated project, import it into the Eclipse development environment, as shown in the following illustration.

project
Eclipse Project dialog

The next step is to add these Java files to the com.mycompany.myproject.components package.

  • Parser - uses the org.jsoup.Jsoup API to parse web pages. 
  • DataParser  - uses org.apache.sling.models API to define a sling model

Note:

Delete all other Java files and packages from the Maven generated project. Make sure that these two Java files are the only two files in the project. 

Parser class

The Java class uses these Sling Model annotations.:

  • @Model
  • @Inject
  • @Named

For information about these Sling Model annotations, see Sling Models

The following Java code represents the Parser class.

package com.mycompany.myproject.components;

import com.adobe.cq.address.api.AddressException;
import com.adobe.cq.address.api.location.Coordinates;
import com.adobe.cq.address.api.location.GeocodeProvider;
import java.util.ArrayList;
import java.util.List;
import javax.annotation.PostConstruct;
import javax.inject.Inject;
import javax.inject.Named;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.models.annotations.Default;
import org.apache.sling.models.annotations.Model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Model(adaptables={Resource.class})
public class Parser
{
  Logger logger = LoggerFactory.getLogger(Parser.class);
  public static final String DEFAULT = "http://jsoup.org/cookbook/input/load-document-from-url";
  public static final String TRUE = "true";
  public static final String FALSE = "";
  @Inject
  @Named("address")
  @Default(values={"http://jsoup.org/cookbook/input/load-document-from-url"})
  protected String addressDescription;
  @Inject
  @Named("imp")
  @Default(values={""})
  protected String imp;
  @Inject
  @Named("link")
  @Default(values={""})
  protected String link;
  @Inject
  @Named("img")
  @Default(values={""})
  protected String img;
  @Inject
  private GeocodeProvider geocode;
  public Coordinates coordinates;
  private List<String> file;
  private List<String> imports;
  private List<String> images;
  
  @PostConstruct
  public void activate()
    throws AddressException
  {
    this.file = new ArrayList();
    this.imports = new ArrayList();
    this.images = new ArrayList();
    
    this.logger.info("URL is {}", this.addressDescription);
    if (this.link.equals("true"))
    {
      this.file = new DataParser().parseLinks(this.addressDescription);
      this.logger.info("file size {}", Integer.valueOf(this.file.size()));
    }
    if (this.img.equals("true"))
    {
      this.images = new DataParser().parseImages(this.addressDescription);
      this.logger.info("Images size {}", Integer.valueOf(this.images.size()));
    }
    if (this.imp.equals("true"))
    {
      this.imports = new DataParser().parseImports(this.addressDescription);
      this.logger.info("Imports size {}", Integer.valueOf(this.imports.size()));
    }
    this.coordinates = this.geocode.geocode(this.addressDescription);
  }
  
  public List<String> getLinks()
  {
    return this.file;
  }
  
  public List<String> getImages()
  {
    return this.images;
  }
  
  public List<String> getImports()
  {
    return this.imports;
  }
}

DataParser class

The DataParser class uses JSOUP APIs to parse web pages referenced by a given URL. This class contains a method named parseLinks the parses the links that appear on the web page specified in the URL. 

public List<String> parseLinks(String url)
{
List<String> hyperLinks = new ArrayList();
try
{
Elements links = docParse(url).select("a[href]");
for (Element link : links)
{
hyperLinks.add(link.attr("abs:href"));
this.logger.info(link.attr("abs:href"));
}
}
catch (Exception e)
{
this.logger.info("Something went wrong for parsing link.. {}", e);
}
return hyperLinks;
}

In addition to this method, this class also contains these methods: 

  • parseImports - parses links ([href]) that appear in the web page. All links are placed into a Java Collection object.
  • parseImages - parses images (img) that appear in the web page. All images are placed into a Java Collection object. 

The following Java code represents the entire DataParser class. 

 

package com.mycompany.myproject.components;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DataParser
{
  Logger logger = LoggerFactory.getLogger(DataParser.class);
  
  public Document docParse(String url)
  {
    try
    {
      return Jsoup.connect(url).get();
    }
    catch (IOException e)
    {
      e.printStackTrace();
    }
    return null;
  }
  
  public List<String> parseLinks(String url)
  {
    List<String> hyperLinks = new ArrayList();
    try
    {
      Elements links = docParse(url).select("a[href]");
      for (Element link : links)
      {
        hyperLinks.add(link.attr("abs:href"));
        this.logger.info(link.attr("abs:href"));
      }
    }
    catch (Exception e)
    {
      this.logger.info("Something went wrong for parsing link.. {}", e);
    }
    return hyperLinks;
  }
  
  public List<String> parseImports(String url)
  {
    List<String> imports = new ArrayList();
    try
    {
      Elements imp = docParse(url).select("link[href]");
      for (Element i : imp) {
        imports.add(i.attr("abs:href"));
      }
    }
    catch (Exception e)
    {
      this.logger.info("Something went wrong for parsing imports.. {}", e);
    }
    return imports;
  }
  
  public List<String> parseImages(String url)
  {
    List<String> images = new ArrayList();
    try
    {
      Elements img = docParse(url).select("[src]");
      for (Element i : img) {
        if (i.tagName().equals("img"))
        {
          images.add(i.attr("abs:src"));
          this.logger.info(i.attr("abs:src"));
        }
      }
    }
    catch (Exception e)
    {
      this.logger.info("Something went wrong for parsing images.. {}", e);
    }
    return images;
  }
}

Add the JSOUP JAR to AEM

Add the JSOUP JAR file to AEM within an OSGi bundle fragment. The reason is because the DataParser class uses the JSoup API to parse the HTML that is located at the specific URL. If you do not add this class to AEM, then you are unable to place the OSGi bundle that contains the DataParser class into an Active state.

To add the JSOUP JAR to AEM, add it to a bundle fragment and then deploy the bundle fragment to AEM, as discussed in this section. First, download the JSOUP JAR from the URL shown at the beginning of this development article.

To create an OSGi bundle fragment that contains the JSOUP API, perform these tasks:

  1. Start Eclipse (Indigo). The steps below have been tested on Eclipse Java EE IDE for Web Developers version Indigo Service Release 1.

  2. Select File, New, Other.

  3. Under the Plug-in Development folder, choose Plug-in from Existing JAR Archives. Name your project jsoupBundle.

  4. In the JAR selection dialog, click the Add external button, and browse to the JSOUP JAR file that you downloaded.

  5. Click Next.

  6. In the Plug-in Project properties dialog, ensure that you check the checkbox for Analyze library contents and add dependencies.

  7. Make sure that the Target Platform is the standard OSGi framework.

  8. Ensure the checkboxes for Unzip the JAR archives into the project and Update references to the JAR files are both checked.

  9. Click Next, and then Finish.

  10. Click the Runtime tab.

  11. Make sure that the Exported Packages list is populated.

  12. Make sure these packages have been added under the Export-Package header in MANIFEST.MF. Remove the version information in the MANIFEST.MF file. Version numbers can cause conflicts when you upload the OSGi bundle.

  13. Also make sure that the Import-Package header in MANIFEST.MF is also populated, as shown here (notice that Export-Package is org.jsoup).

    Bundle-ManifestVersion: 2
    Bundle-Name: JSoupOSGi
    Bundle-SymbolicName: JSoupOSGi
    Bundle-Version: 1.0.0
    Export-Package: org.jsoup,
    org.jsoup.examples,
    org.jsoup.helper,
    org.jsoup.nodes,
    org.jsoup.parser,
    org.jsoup.safety,
    org.jsoup.select
    Bundle-RequiredExecutionEnvironment: JavaSE-1.6

  14. Save the project.

  15. Build the OSGi bundle by right-clicking the project in the left pane, choose Export, Plug-in Development, Deployable plug-ins and fragments, and click Next.

  16. Select a location for the export (C:\TEMP) and click Finish. (Ignore any error messages).

  17. In C:\TEMP\plugins, you should now find the OSGi bundle.

  18. Login to Apache Felix Web Console at http://server:port/system/console/bundles (default admin user = admin with password= admin).

  19. Sort the bundle list by Id and note the Id of the last bundle.

  20. Click the Install/Update button.

  21. Check the Start Bundle checkbox.

  22. Browse to the bundle JAR file you just built. (C:\TEMP\plugins).

  23. Click Install.

  24. Click the Refresh Packages button.

  25. Check the bundle with the highest Id.

  26. Your new bundle should now be listed with the status Active.

  27. If the status is not Active, check the error.log for exceptions. If you get “org.osgi.framework.BundleException: Unresolved constraint” errors, check the MANIFEST.MF for strict version requirements which might follow: javax.xml.namespace; version=”3.1.0”

  28. If the version requirement causes problems, remove it so that the entry looks like this: javax.xml.namespace.

  29. If the entry is not required, remove it entirely.

  30. Rebuild the bundle.

  31. Delete the previous bundle and deploy the new one.

Modify the Maven POM file 

Modify the POM files to successfully build the OSGi bundle. In the POM file located at C:\AdobeCQ\domparser\bundle, add the following dependencies.

  • org.apache.felix.scr
  • org.apache.felix.scr.annotations
  • org.apache.jackrabbit
  • org.apache.sling

The following XML represents this POM file. In the following POM file, notice this plugin element.

<configuration>
<instructions>
<Bundle-SymbolicName>com.mycompany.myproject.components.domparser-bundle</Bundle-SymbolicName>
<Sling-Model-Packages>com.mycompany.myproject.components</Sling-Model-Packages>
</instructions>
</configuration>

This is required to ensure the model is adaptable. In this example, the Parser class is located in the com.mycompany.myproject.components package. Without this plugin element, the model returns null.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd ">
    <modelVersion>4.0.0</modelVersion>
    <!-- ====================================================================== -->
    <!-- P A R E N T P R O J E C T D E S C R I P T I O N -->
    <!-- ====================================================================== -->
    <parent>
        <groupId>com.mycompany.myproject.components</groupId>
        <artifactId>domparser</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <!-- ====================================================================== -->
    <!-- P R O J E C T D E S C R I P T I O N -->
    <!-- ====================================================================== -->

    <artifactId>domparser-bundle</artifactId>
    <packaging>bundle</packaging>
    <name>My Project Bundle</name>

    <dependencies>
        <dependency>
            <groupId>org.osgi</groupId>
            <artifactId>org.osgi.compendium</artifactId>
        </dependency>
        <dependency>
            <groupId>org.osgi</groupId>
            <artifactId>org.osgi.core</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.felix</groupId>
            <artifactId>org.apache.felix.scr.annotations</artifactId>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
        </dependency>
             
        <dependency>
         <groupId>org.apache.felix</groupId>
      
         <artifactId>org.osgi.core</artifactId>
      
         <version>1.4.0</version>
      </dependency>
          
    <dependency>
        <groupId>org.apache.sling</groupId>
        <artifactId>org.apache.sling.commons.osgi</artifactId>
        <version>2.2.0</version>
    </dependency>
                 
           
             
    <dependency>
    <groupId>org.apache.jackrabbit</groupId>
    <artifactId>jackrabbit-core</artifactId>
    <version>2.4.3</version>
    </dependency>
          
    <dependency>
    <groupId>org.apache.jackrabbit</groupId>
    <artifactId>jackrabbit-jcr-commons</artifactId>
    <version>2.4.3</version>
    </dependency>
      
    <dependency>
        <groupId>org.apache.sling</groupId>
        <artifactId>org.apache.sling.jcr.api</artifactId>
        <version>2.0.4</version>
      </dependency>
   
       <dependency>
        <groupId>org.apache.sling</groupId>
        <artifactId>org.apache.sling.api</artifactId>
        <version>2.0.2-incubator</version>
      </dependency>    
      
     
<dependency>
 
    <groupId>com.adobe.aem</groupId>
    <artifactId>uber-jar</artifactId>
    <version>6.1.0</version>
    <classifier>obfuscated-apis</classifier>
    <scope>provided</scope>
</dependency>
            
      <dependency>
         <groupId>javax.jcr</groupId>
         <artifactId>jcr</artifactId>
         <version>2.0</version>
      </dependency>
       
      <dependency>
            <groupId>org.apache.sling</groupId>
            <artifactId>org.apache.sling.models.api</artifactId>
            <version>1.0.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
    <groupId>org.apache.sling</groupId>
    <artifactId>org.apache.sling.api</artifactId>
    <version>2.7.0</version>
    <scope>provided</scope>
</dependency>
   
<dependency>
    <groupId>javax.servlet</groupId>
    <artifactId>servlet-api</artifactId>
    <version>2.5</version>
</dependency>
               
         
         
          <dependency>
            <!-- jsoup HTML parser library @ http://jsoup.org/ -->
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.7.3</version>
        </dependency>
                  
    </dependencies>

    <!-- ====================================================================== -->
    <!-- B U I L D D E F I N I T I O N -->
    <!-- ====================================================================== -->
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.felix</groupId>
                <artifactId>maven-scr-plugin</artifactId>
                <executions>
                    <execution>
                        <id>generate-scr-descriptor</id>
                        <goals>
                            <goal>scr</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.felix</groupId>
                <artifactId>maven-bundle-plugin</artifactId>
                <extensions>true</extensions>
                <configuration>
                    <instructions>
                        <Bundle-SymbolicName>com.mycompany.myproject.components.domparser-bundle</Bundle-SymbolicName>
                        <Sling-Model-Packages>com.mycompany.myproject.components</Sling-Model-Packages>
                    </instructions>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.sling</groupId>
                <artifactId>maven-sling-plugin</artifactId>
                <configuration>
                    <slingUrl>http://${crx.host}:${crx.port}/apps/myproject/install</slingUrl>
                    <usePut>true</usePut>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-javadoc-plugin</artifactId>
                 <configuration>
                    <excludePackageNames>
                        *.impl
                    </excludePackageNames>
                 </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Build the OSGi bundle using Maven 

To build the OSGi component by using Maven, perform these steps:

  1. Open the command prompt and go to the C:\AdobeCQ\domparser folder.

  2. Run the following maven command: mvn clean install.

  3. The OSGi component can be found in the following folder: C:\AdobeCQ\domparser\bundle\target. The file name of the OSGi component is domparser-bundle-1.0-SNAPSHOT.jar.

Deploy the bundle to AEM 

Once you deploy the OSGi bundle, you are able to create a HTL component that interacts with it. After you deploy the OSGi bundle, you will be able to see it in the Apache Felix Web Conole.

OSGi

Deploy the OSGi bundle to AEM by performing these steps:

  1. Login to Apache Felix Web Console at http://server:port/system/console/bundles (default admin user = admin with password= admin).

  2. Click the Bundles tab, sort the bundle list by Id, and note the Id of the last bundle.

  3. Click the Install/Update button.

  4. Browse to the bundle JAR file you just built using Maven. (C:\AdobeCQ\domparser\bundle\target).

  5. Click Install.

  6. Click the Refresh Packages button.

  7. Check the bundle with the highest Id.

  8. Click Active. Your new bundle should now be listed with the status Active.

  9. If the status is not Active, check the error.log for exceptions.

Note:

If you have an issue starting the OSGi bundle because there is a JSOUP version issue, open the OSGi bundle JAR file and locate the Manifest file (located here: domparser-bundle-1.0-SNAPSHOT.jar\META-INF). Then remove JSOUP version information so it appears as:

Import-Package: com.adobe.cq.address.api;version="[1.0,2)",com.adobe.cq. address.api.location;version="[1.0,2)",javax.annotation,javax.inject;ve rsion="[0.0,1)",org.apache.sling.api.resource;version="[2.5,3)",org.apache.sling.models.annotations,org.jsoup,org.jsoup.nodes,org.jsoup.select,org.slf4j;version="[1.5,2)"

Create an AEM 6 HTL component

Perform these tasks using CRXDE Lite:

cq:noDecoration
  1.  Right click on /apps/myproject/components and then select New, Component.

  2.  Enter the following information into the Create Component dialog box:

    • Label: The name of the component to create. Enter parser.
    • Title: The title that is assigned to the component. Enter parser.
    • Description: The description that is assigned to the template. Enter parser.
      Super Resource Type: Enter foundation/components/parbase.
    • Group: The group in the side rail or side kick where the component appears. Enter General. (The parser component is located under the General heading in the Touch UI side rail. Also appears in General in the classic view sidekick.)
    • Allowed parents: Enter */*parsys.
  3.  Add the following properties to this node:

    • cq:isContainer (Boolean) - false
    • cq:noDecoration (Boolean) - false
  4. Click Ok.

Add a dialog to the HTL component

A dialog lets an author click on the component in the Touch UI (or Classic UI) view during design time and enter values that are used by the component. The component created in this development article lets the AEM author specify the URL to the HTML page to parse. 

Dialog
Dialog for the HTL DOM Parser component

The following illustration shows the JCR nodes for this component. 

DialogNodes
JCR nodes that create the dialog

Build the dialog by performing these tasks:

  1. Select /apps/myproject/components/parser.

  2. Right click and select Create, Create Dialog.

  3. Enter the following values:

    • Label: DOM Parser
    • Title: DOM Parser
  4. Add the following properties to the cq:dialog node.

    • xtype (String) - panel
  5. Delete all nodes under /apps/myproject/components/parser/dialog.

  6. Select /apps/myproject/components/parser/dialog.

  7. Right click and select Create, Create Node. Enter the following values:

    • Name: items
    • Type: cq:WidgetCollections.
  8. Click on the following node: /apps/myproject/components/parser/dialog/items.

  9. Right click and select Create, Create Node. Enter the following values:

    • Name: address
    • Type: cq:Widget
  10. Add the following properties to the address node.

    • fieldLabel (String) - URL
    • name (String) - ./address
    • xtype (String) - textarea
  11. Click on the following node: /apps/myproject/components/parser/dialog/items.

  12. Right click and select Create, Create Node. Enter the following values:

    • Name: link
    • Type: cq:Widget
  13. Add the following properties to the link node.

    • fieldLabel (String) - Hyper Links
    • name (String) - ./link
    • xtype (String) - selection
    • type (String) - checkbox
  14. Click on the following node: /apps/myproject/components/parser/dialog/items.

  15. Right click and select Create, Create Node. Enter the following values:

    • Name: img
    • Type: cq:Widget
  16. Add the following properties to the img node.

    • fieldLabel (String) - Images
    • name (String) - ./img
    • xtype (String) - selection
    • type (String) - checkbox
  17. Click on the following node: /apps/myproject/components/parser/dialog/items.

  18. Right click and select Create, Create Node. Enter the following values:

    • Name: img
    • Type: cq:Widget
  19. Add the following properties to the img node.

    • fieldLabel (String) - imp
    • name (String) - ./imp
    • xtype (String) - selection
    • type (String) - checkbox

Modify the parser.html file 

Add an HTML file, named parser.html, to the following JCR location:

/apps/myproject/components/parser/

You enter the client portion of the HTL component to this file. This code interacts with the Parser class that was created earlier.

Add the following HTML to this file.

Note:

You can delete the parser.jsp file from /apps/myproject/components/parser/.

Next notice these lines of HTL code:

<b>Here are the Images</b>
<ul data-sly-list="${img.images}">
<li>${item}</li>
</ul>
</div>

Here, img is an instance of Parser class developed earlier in this article. The code

data-sly-list="${img.images}

is how you handle a collection. In this example, img.images maps to this data member defiend in the Parser class:

private List images;

The code:

<li><img src="${item}"></li>

displays each data item located in the Java collection that is returned by the parseLinks method defined in the DataParser class. 

Create an AEM web page based on the contentpage template 

The final task is to create a page that is based on the contentpage template (the template created earlier in this development article). When you open this web page, you can enter an URL to a web page that the component parses using the JSOUP API.   

Create an AEM web page based on the contentpage template by performing these tasks:

  1. Go to the CQ welcome page at http://[host name]:[port]; for example, http://localhost:4502.

  2. Select Websites. (If you are using AEM 5.6, click Tools from the menu on the left.)

  3. From the left hand pane, select Websites.

  4. Select New Page.

  5. Specify the title of the page in the Title field.

  6. Specify the name of the page in the Name field.

  7. Select contentpage from the template list that appears. This value represents the template that is created in this development article. If you do not see it, then repeat the steps in this development article. For example, if you made a typing mistake when entering in path information, the template will not show up in the New Page dialog box.

  8. Open the new page that you created by double-clicking it in the right pane. The new page opens in a web browser.  

See also

Congratulations, you have just created a Sighly component that uses the JSOUP API to parse HTML documents. Please refer to the AEM community page for other articles that discuss how to build AEM services/applications.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy