CachingContentStore

cancel
Showing results for 
Search instead for 
Did you mean: 

CachingContentStore

resplin
Intermediate
0 0 3,506

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



Core Repository ServicesContent Store
Back to Server Configuration




What is the CachingContentStore?


The CachingContentStore (CCS) is a decorator that adds transparent caching to any ContentStore implementation. Wrapping a slow ContentStore (the 'backing store') in a CachingContentStore can improve access speeds in many use cases. Example use cases include document storage using a XAM appliance, or cloud-based storage such as Amazon's S3.

The diagram below shows the architecture of the CCS:

ccs_architecture.png

The major classes and interfaces that form the caching content store are:

CachingContentStore — the main class, this implements the ContentStore interface and can therefore be used anywhere that a ContentStore could be used. The CachingContentStore handles all the high level logic of interaction between the cache and the backing store — the caching itself is provided by a collaborating ContentCache object.

ContentCache — responsible for putting items into and getting items from the cache. The single supplied implementation (ContentCacheImpl) for this uses a lookup table to keep track of what files are being managed by the cache and a directory on the local file system to store the cached content files. The lookup table itself is an EhcacheAdapter instance.

QuotaManagerStrategy — quota managers implement this interface and control how disk usage is consumed for cached content storage. There are two implementations supplied with Alfresco — UnlimitedQuotaStrategy (does not restrict disk usage, effectively disabling the quota function) and StandardQuotaStrategy (will attempt to keep usage below a maximum specified in bytes or MB)

The CachingContentStore is highly configurable and many of these components could be swapped out for other implementations — for example, the lookup table could be easily replaced with a different implementation from the Ehcache-based class supplied, or the ContentCacheImpl could be replaced with an implementation that uses some alternative to the local file system for storage of cached content.

The cached content cleaner (CachedContentCleaner) periodically traverses the directory structure containing the cached content files and deletes content files that are not in use by the cache. Files are not in use by the cache if they have no entry in the lookup table managed by ContentCacheImpl. The content cache cleaner is not strictly part of the architecture but is a helper object for the ContentCacheImpl and allows it to operate more efficiently — a different ContentCache implementation would not necessarily require a cleaner, but as there is only the one implementation, the cleaner is shown here for completeness.


Configuration


Sample XML file


The supplied example spring context file caching-content-store-context.xml.sample can be used as a starting point to adding caching to a slow content store. Activate the sample file by removing the '.sample' file extension and placing it in the Alfresco extension directory. The backing store defined in the file is not a good candidate to add caching to as it is already using the local file system — this definition should be edited to provide details of a genuinely slow content store such as XAMContentStore or S3ContentStore. It is however possible (if not preferable) to leave this in place when first trying out the cache mechanism.


Step by step configuration


The top level bean that ties together the CCS as a whole defines an instance of the CachingContentStore class:

<bean id='fileContentStore' class='org.alfresco.repo.content.caching.CachingContentStore' init-method='init'>
  <property name='backingStore' ref='backingStore'/>
  <property name='cache' ref='contentCache'/>
  <property name='cacheOnInbound' value='${system.content.caching.cacheOnInbound}'/>
  <property name='quota' ref='standardQuotaManager'/>
</bean>

In this case the fileContentStore bean is overridden — the ContentService bean uses fileContentStore, so the CCS will be used automatically. We could have equally specified a different name and supplied an overridden contentService bean (this is done in the sample file).  Note the main collaborators of backingStore, cache and quota, which refer to beans for the Backing Store, Content Cache and Quota Manager as shown in the diagram. Each CachingContentStore should have its own 'dedicated instances' of these collaborators, they should not be shared across other CachingContentStore beans, should you have any defined.

Next, define a backing store. This is the ContentStore that will be decorated by the CCS, so to provide caching for a TenantRoutingS3ContentStore, the following XML could be used:

<bean id='tenantRoutingContentStore' 
         class='org.alfresco.module.org_alfresco_module_cloud.repo.content.s3store.TenantRoutingS3ContentStore'
         parent='baseTenantRoutingContentStore'>
      
  <property name='defaultRootDir'   value='${dir.contentstore}' />
  <property name='s3AccessKey'      value='${s3.accessKey}' />
  <property name='s3SecretKey'      value='${s3.secretKey}' />
  <property name='s3BucketName'     value='${s3.bucketName}' />
  <property name='s3BucketLocation' value='${s3.bucketLocation}' />
      
  <property name='s3FlatRoot' value='${s3.flatRoot}' />
      
      
  <property name='globalProperties'>
    <ref bean='global-properties' />
  </property>    
  
</bean>

Note, that it would be necessary to change this bean's ID to backingStore for use with the preceding XML snippet, or change the ref attribute in the fileContentStore bean definition to refer to the correct ID (tenantRoutingContentStore).

Now we must define a ContentCache — the object that responsible for placing content into (and retrieving content from) the cache:

<bean id='contentCache' class='org.alfresco.repo.content.caching.ContentCacheImpl'>
  <property name='memoryStore' ref='cachingContentStoreCache'/>
  <property name='cacheRoot' value='${dir.cachedcontent}'/>
</bean>

The ContentCacheImpl uses a fast lookup table provided by Ehcache (for determining whether an item is currently cached by the CCS, controlling the maximum number of items in the cache, their TTL and so on) and specified here by the memoryStore property and also a directory on the local filesystem for storing binary content data (the actual content being cached) specified by the cacheRoot property.

Shown below is the bean satisfying the memoryStore reference above:

   <bean id='cachingContentStoreCache' class='org.alfresco.repo.cache.EhCacheAdapter'>
       <property name='cache'>
           <bean class='org.springframework.cache.ehcache.EhCacheFactoryBean'>
               <property name='cacheManager'>
                   <ref bean='internalEHCacheManager' />
               </property>
               <property name='cacheName'>
                   <value>org.alfresco.cache.cachingContentStoreCache</value>
               </property>
               <property name='eternal' value='false'/>
               <property name='timeToLive' value='${system.content.caching.timeToLiveSeconds}'/>
               <property name='timeToIdle' value='${system.content.caching.timeToIdleSeconds}'/>
               <property name='maxElementsInMemory' value='${system.content.caching.maxElementsInMemory}'/>
               <property name='maxElementsOnDisk' value='${system.content.caching.maxElementsOnDisk}'/>
               <property name='overflowToDisk' value='true'/>
               <property name='diskPersistent' value='true'/>
           </bean>
       </property>
   </bean>


Now that we have the key components of CachingContentStore, backing store (ContentStore) and ContentCache configured we can optionally specify a quota manager — if this step is skipped then the UnlimitedQuotaStrategy will be used. The example CCS bean above expects this bean to be defined:

<bean
       id='standardQuotaManager'
       class='org.alfresco.repo.content.caching.quota.StandardQuotaStrategy'
       init-method='init'
       destroy-method='shutdown'>
           <property name='maxUsageMB' value='4096'/>
          
           <property name='maxFileSizeMB' value='0'/>
          
           <property name='cache' ref='contentCache'/>
           <property name='cleaner' ref='cachedContentCleaner'/>
</bean>

Lastly, to ensure that the disk space isn't used uncontrollably, a CachedContentCleaner should be configured to clean up cached content files that are no longer being used by the cache.

   <bean id='cachingContentStoreCleanerJobDetail' class='org.springframework.scheduling.quartz.JobDetailBean'>
       <property name='jobClass'>
           <value>org.alfresco.repo.content.caching.cleanup.CachedContentCleanupJob</value>
       </property>
       <property name='jobDataAsMap'>
           <map>
               <entry key='cachedContentCleaner'>
                   <ref bean='cachedContentCleaner' />
               </entry>
           </map>
       </property>
   </bean>
  
   <bean id='cachedContentCleaner'
       class='org.alfresco.repo.content.caching.cleanup.CachedContentCleaner'
       init-method='init'>
       <property name='minFileAgeMillis' value='${system.content.caching.minFileAgeMillis}'/>
       <property name='maxDeleteWatchCount' value='${system.content.caching.maxDeleteWatchCount}'/>
       <property name='cache' ref='contentCache'/>
       <property name='usageTracker' ref='standardQuotaManager'/>
   </bean>
  
   <bean id='cachingContentStoreCleanerTrigger' class='org.alfresco.util.CronTriggerBean'>
       <property name='jobDetail'>
           <ref bean='cachingContentStoreCleanerJobDetail' />
       </property>
       <property name='scheduler'>
           <ref bean='schedulerFactory' />
       </property>
       <property name='cronExpression'>
           <value>${system.content.caching.contentCleanup.cronExpression}</value>
       </property>
   </bean>


Note that the cleaner and the quota manager whilst both being present to limit usage of disk space, do not perform the same function. The cleaner's job is to remove files that have fallen out of use from the cache due to some parameter such as Time To Live (TTL - the maximum time an item should be used by the CCS). The quota manager however exists to set specific requirements in terms of allowed disk space. The cleaner is similar to the other content cleaner's that asynchronously remove orphaned content.

A number of property placeholders are used in the above definitions. You can replace them directly in your configuration with the required values, or you can use the placeholders as they are and set the values in the repository.properties file — sensible defaults exist in the embedded default file. An advantage of using the property placeholders is that the sample file can be used with very few changes and the appropriate properties can be modified to get the CCS running with little effort.


Spring configuration properties


The following properties are used in the sample context file caching-content-store-context.xml.sample and may be set in alfresco-global.properties. Default values are provided in repository.properties and are included in the definitions shown:

system.content.caching.cacheOnInbound=true
Enables write-through caching. If true then an an attempt to write content to the backing store will result in the item being cached - therefore the first time the item is read (provided the item hasn't been removed from the cache in the mean time) the file is already cached locally for faster access times. It is recommended that this property is set to true for most usage scenarios.

system.content.caching.maxDeleteWatchCount=1
This property allows for a safety period to be defined to protect against the following possibility from happening:

A process obtains a reader R for a cached content file F
The item is removed from the CCS lookup table due to some condition being met (e.g. max. number of items in cache)
The cached content cleanup job observes that the item is no longer in the cache and deletes file F.
The original process attempts to read data from reader R but fails.

This property defines the number of times the file must have been observed as being available for deletion by previous cleanup runs before it is actually deleted. For example, if this is set to 2 and a content file has been expired from the cache's lookup table prior to the clean up job running, then the following will happen:











Cleaner run #
What happens...
Action taken
1 Cleaner discovers that the file is no longer in use by the cache (it has expired, perhaps the Time To Idle was exceeded) Deletion watch count set to 1
2 Cleaner observes that deletion watch count set but not reached maximum Deletion watch count incremented to 2
3 Cleaner observes that maximum allowed deletion watch count has been met File is deleted

It is very unlikely that this setting needs to be anything other than 1, but could be increased if readers obtained from the cache could not be used due to the underlying file being deleted.

system.content.caching.contentCleanup.cronExpression=0 0 3 * * ?
This property governs how often the cached content cleanup job will run. The example above will run the cleaner at 3am every morning. The supplied value is a quartz expression and is similar to a Unix cron expression.

system.content.caching.timeToLiveSeconds=0
The maximum time in seconds that an item can exist in the cache - after this time elapses the item will no longer be cached and a request for the content URL will result in the item being fetched from the backing store and cached afresh. A value of 0 means that items will not have a TTL applied to them.

system.content.caching.timeToIdleSeconds=60
The maximum time an item in the cache can exist without being requested - each time the item is accessed the Time To Idle will be refreshed and the item will remain in the cache.

system.content.caching.maxElementsInMemory=5000
This applies to the lookup table in the ContentCache. Each content URL requires two entries in the lookup table - so a setting of 5000 can allow 2500 content items to be held in-memory for the lookup table.

system.content.caching.maxElementsOnDisk=10000
This applies to the lookup table in the ContentCache. As mentioned above, two elements are used per content URL, so a setting of 10000 will allow 5000 items to be held on disk (not to be confused with the actual content data).

system.content.caching.minFileAgeMillis=2000
Files must be at least this age before they will be evaluated for deletion - this stops unnecessary checks such as loading and examining the associated properties file.

system.content.caching.maxUsageMB=4096
Used by the StandardQuotaStrategy as configured in the caching-content-store-context.xml.sample file and specifies the maximum disk usage in MB that cached content should consume - in other words, this property defines the disk space quota allocated to the ${dir.cachedcontent} directory.

system.content.caching.maxFileSizeMB=0
Used by the StandardQuotaStrategy as configured in the caching-content-store-context.xml.sample file and specifies the maximum size in MB of any individual file of cached content. Content larger than this size can still be retrieved using the CachingContentStore - but the content will not be cached. If this property is set to zero, then no size limit will apply to individual files (the maxUsageMB property will still apply to space as a whole of course).