Sie sehen sich Hilfeinhalte der folgenden Version an:
- 6.5
- 6.4
- 6.2
- Ältere Versionen
In Adobe Experience Manager (AEM), binary data can be stored independently from the content nodes. The binary data is stored in a data store, whereas content nodes are stored in a node store.
Both data stores and node stores can be configured using OSGi configuration. Each OSGi configuration is referenced using a persistent identifier (PID).
-
First, configure the node store by creating a configuration file with the name of the node store option you want to use in the crx-quickstart/install directory.
For example, the Document node store (which is the basis for AEM's MongoMK implementation) uses the file org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config.
-
Create a configuration file with the PID of the data store you want to use. Edit the file to set the configuration options.
Hinweis:
See Node Store Configurations and Data Store Configurations for configuration options.
Vorsicht:
Newer versions of Oak employ a new naming scheme and format for OSGi configuration files. The new naming scheme requires that the configuration file be named .config and the new format requires values to be typed and is documented here.
If you upgrade from an older version of Oak, ensure that you make a backup of the crx-quickstart/install folder first. After the upgrade, restore the contents of the folder to the upgraded installation and modify the extension of the configuration files from .cfg to .config.
In case you are reading this article in preparation for an upgrade from an AEM 5.x installation, ensure that you consult the upgrade documentation first.
The segment node store is the basis of Adobe's TarMK implementation in AEM6. It uses the org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService PID for configuration.
You can configure the following options:
- repository.home: Path to repository home under which repository-related data is stored. By default, segment files are stored under the crx-quickstart/segmentstore directory.
- tarmk.size: Maximum size of a segment in MB. The default maximum is 256MB.
- customBlobStore: Boolean value indicating that a custom data store is used. The default value is false.
The following is a sample org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config file:
#Path to repo repository.home="crx-quickstart/repository" #Max segment size tarmk.size=I"256" #Custom data store customBlobStore=B"false"
The document node store is the basis of AEM's MongoMK implementation. It uses the org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService PID. The following configuration options are available:
- mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017
- db: Name of the Mongo database. The default is Oak. However, new AEM 6 installations use aem-author as the default database name.
- cache: The cache size in MB. This is distributed among various caches used in DocumentNodeStore. The default is 256
- changesSize: Size in MB of capped collection used in Mongo for caching the diff output. The default is 256
- customBlobStore: Boolean value indicating that a custom data store will be used. The default is false.
The following is a sample org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config file:
#Mongo server details mongouri="mongodb://localhost:27017" #Name of Mongo database to use db="aem-author" #Store binaries in custom BlobStore customBlobStore=B"false"
When dealing with large number of binaries, it is recommended that an external data store be used instead of the default node stores in order to maximize performance.
For example, if your project requires a large number of media assets, storing them under the File or S3 Data Store will make accessing them faster than storing them directly inside a MongoDB.
The File Data Store provides better performance than MongoDB, and Mongo backup and restore operations are also slower with large number of assets.
Details on the different data stores and configurations are described below.
Hinweis:
In order to enable custom Data Stores, you need to make sure that customBlobStore is set to true in the respective Node Store configuration file (segment node store or document node store).
This is the implementation of FileDataStore present in Jackrabbit 2. It provides a way to store the binary data as normal files on the file system. It uses the org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore PID.
These configuration options are available:
- repository.home: Path to repository home under which various repository related data is stored. By default, binary files would be stored under crx-quickstart/repository/datastore directory
- path: Path to the directory under which the files would be stored. If specified then it takes precedence over repository.home value
- minRecordLength: The minimum size in bytes of a file stored in the data store. Binary content less than this value would be inlined.
Hinweis:
When using a NAS to store shared file data stores, make sure you use only high performing devices in order to avoid performance issues.
AEM can be configured to store data in Amazon's Simple Storage Service (S3). It uses the org.apache.jackrabbit.oak.plugins.blob.datastore.S3DataStore.config PID for configuration.
In order to enable the S3 data store functionality, a feature pack containing the S3 Datastore Connector needs to be downloaded and installed. Go to the Adobe Repository and download the latest version from the 1.4.x versions of the feature pack (for example, com.adobe.granite.oak.s3connector-1.4.x.zip).
Once downloaded, you can install and configure the S3 Connector as follows:
-
If AEM is already configured to work with the Tar or MongoDB storage, remove any existing configuration files from the <aem-install>/crx-quickstart/install folder before proceeding. The files that need to be removed are:
- For TarMK: org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config
- For MongoMK: org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config from crx-quickstart/install
-
Return to the temporary location where the feature pack has been extracted, and copy the contents of the following folder:
- jcr_root/libs/system/config
to
- <aem-install>/crx-quickstart/install
Make sure you only copy the configuration files needed by your current configuration. See the details below:
Node Store
- If AEM is configured to use SegmentNodeStore, copy the org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config file
- If AEM is configured to use DocumentNodeStore, copy the org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config file.
Data Store
- If a dedicated data store setup copy the org.apache.jackrabbit.oak.plugins.blob.datastore.S3DataStore.config file.
- If a shared data store setup copy the org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore.config file.
Hinweis:
In a cluster setup, perform above steps on all nodes of cluster one by one. Also, make sure to use same S3 settings for all nodes.
You can use the configuration file with the following options:
- accessKey: The AWS account ID.
- secretKey: The AWS password.
- s3Bucket: The bucket name.
- s3Region: The bucket region.
- path: The path of the data store. The default is <AEM install folder>/repository/datastore
- minRecordLength :The minimum size of an object that should be stored in the data store. The default is 16KB.
- maxCachedBinarySize: Binaries with size less than or equal to this size will be stored in memory cache. The size is in bytes. The default is 17408 (17 KB).
- cacheSize: The size of the cache. The value is specified in bytes. The Default is 64GB.
- concurrentUploadsThreads: This parameter configures the number of parallel threads used to migrate binary files from the filesystem data store to Amazon S3 concurrently. The default value is 10.
- continueOnAsyncUploadFailure: If an error or exception occurs in the upload process but you still wish to proceed you can use this parameter. It will ignore the error, log all missing files and resume all and incomplete asynchronous uploads concurrently in multiple threads.
- cachePurgeTrigFactor: This is the trigger factor that decides the purging of local cache. The purge will be triggered if the current size of the cache is bigger than the amount of cachePurgeTrigFactor multiplied by cacheSize.
- asyncUploadLimit: This parameter limits the number of asynchronous uploads slots to the backend. The default is 100.
- uploadRetries: The number of retries for failed upload. Default value is 3.
- secret: Only to be used if using binaryless replication for shared datastore setup.
Hinweis:
The same configuration options are available for the shared S3 data store configuration file under the name of org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore.config. For more information, see Configuring a Shared Data Store.
US East (N. Virginia) | us-east-1 |
US West (Oregon) | us-west-2 |
US West (Northern California) | us-west-1 |
EU (Frankfurt) | eu-central-1 |
EU (Ireland) |
eu-west-1 |
Asia Pacific (Mumbai) | ap-south-1 |
Asia Pacific (Seoul) | ap-northeast-2 |
Asia Pacific (Singapore) |
ap-southeast-1 |
Asia Pacific (Sydney) |
ap-southeast-2 |
Asia Pacific (Tokyo) | ap-northeast-1 |
South America (Sao Paulo) |
sa-east-1 |
AWS partition-global endpoint | aws-global |
-
Download the latest 1.4.x version of the feature pack from the Adobe Repository.
-
If AEM is already configured to work with the Tar or MongoDB storage, remove any existing configuration files from the <aem-install>/crx-quickstart/install folder before proceeding. The files that need to be removed are:
- For TarMK: org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config
- For MongoMK: org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config from crx-quickstart/install
-
Return to the temporary location where the feature pack has been extracted, and copy the contents of the following folder:
- jcr_root/libs/system/config
to
- <aem-install>/crx-quickstart/install
Make sure you only copy the configuration files needed by your current configuration. See the details below:
Node Store
- If AEM is configured to use SegmentNodeStore, copy the org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config file
- If AEM is configured to use DocumentNodeStore, copy the org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.config file.
Data Store
- If a dedicated data store setup copy the org.apache.jackrabbit.oak.plugins.blob.datastore.S3DataStore.config file.
- If a shared data store setup copy the org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore.config file.
-
java -jar aem-quickstart-6.2.0.jar -v -x crx2oak
Local Cache
After installing the connector for the first time, all the data will be uploaded from the local datastore. After this first run, the local datastore will be used as a local cache. All operations (with the exception of writes) will first check for the item in the local cache and if no record is found, it will be accessed from S3.
The local cache has a size limit, specifiable by the cacheSize parameter. When the size of the cache exceeds the limit, it will automatically undergo purging in order to clear older items and reclaim space. During purging, the local cache makes sure that it doesn’t delete any in-progress asynchronous uploads.
Things to note about the local cache mechanism:
- It can be disabled by setting the cacheSize parameter to 0. In this case, all operations will be performed directly in the S3 cloud and the local cache completely ignored.
- If the size of a file exceeds the size of the local cache, it will be served directly from S3.
- When the cache is being purged, it will not be available to the S3 data store. Files from the local cache that have pending uploads will still be available.
- Files deleted from the S3 data store will also be deleted from the local cache.
Multi-Threaded Content migration from FileSystem DataStore to S3
Multi-threading can be configured in order to speed up file operations to or from the S3 data store. This can be particularly useful for initial migrations from a local datastore where large amounts of data need to be uploaded.
Asynchronous Upload to S3
The asyncUploadLimit parameter limits the number of asynchronous uploads to the S3 data store. Once this limit is reached, the next upload will be synchronous until one of asynchronous uploads completes. To disable this feature the asyncUploadLimit parameter can be set to 0. The default value is 100.
Asynchronous Upload Cache
The connector also uses a upload cache for asynchronous uploads. It tracks their status and removes finished uploads or adds new ones to the cache when necessary.
-
First, create the data store configuration file on each instances that is required to share the data store:
- If you are using a FileDataStore, create a file named org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.config and place it in the <aem-install>/crx-quickstart/install folder.
- If using S3 as the data store, create a file named org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore.config in the <aem-install>/crx-quickstart/install folder as above.
-
Modify the data store configuration files on each instance to point to the same data store. For more information, see this article.
-
If the instance has been cloned from an existing server, you need to remove the clusterId of the new instance by using the latest oak-run tool while the repository is offline. The command you need to run is:
java -jar oak-run.jar resetclusterid < repository path | Mongo URI >
Hinweis:
If a Segment node store is configured then the repository path needs to be specified. By default, the path is <aem-install-folder>/crx-quickstart/repository/segmentstore. If a Document node store is configured you can use a Mongo Connection String URI.
Hinweis:
The Oak-run tool can be downloaded from this location:
http://mvnrepository.com/artifact/org.apache.jackrabbit/oak-run/
Be aware that different versions of the tool need to be used depending on the Oak version you use with your AEM installation. Please check the version requirements list below before using the tool:
- For Oak versions 1.2.x use the Oak-run 1.2.12 or newer
- For Oak versions newer than the above, use the version of Oak-run that matches the Oak core of your AEM installation.
- For Oak versions 1.2.x use the Oak-run 1.2.12 or newer
-
Lastly, validate the configuration. In order to do this, you need to look for a unique file added to the data store by each repository that is sharing it. The format of the files is repository-[UUID], where the UUID is a unique identifier of each individual repository.
Therefore, a proper configuration should have as many unique files as there are repositories sharing the data store.
The files are stored differently, depending on the data store:
- For the FileDataStore the files are created under the root path of the data store folder.
- For the S3DataStore the files are created in the configured S3 bucket under the META folder.
Hinweis:
This feature is only available in AEM 6.1 running on Apache Oak versions 1.2.x and higher.
The data store garbage collection process is used to remove any unused files in the data store, thus freeing up valuable disk space in the process.
You can run data store garbage collection by:
With AEM 6.2, data store garbage collection can also be run on data stores shared by more than one repository. In order to be able to run data store garbage collection on a shared data store, take the following steps:
-
Run the steps mentioned in Binary Garbage Collection individually on all repository instances sharing the data store. However, make sure to enter true for the markOnly parameter before clicking the Invoke button:
-
After completing the above procedure on all instances, run the data store garbage collect again from any of the instances:
- Go to the JMX console and select the Repository Manager Mbean.
- Click on the Click startDataStoreGC(boolean markOnly) link.
- In the following dialogue, enter false for the markOnly parameter again.
This will collate all the files found using the mark phase used before and delete the rest that are unused from the data store.