Alfresco Content Services - Blog

cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco Content Services - Blog

shazada
Partner

So you want to create an ACA extension and your own libraries?

We'll show you how with an awesome Community Repo

Old article: https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-community-admin-tools/bc-p/31552...

Read more...

Read more
2 0 1,199
abhinavmishra14
Advanced

Alfresco repository performance tuning checklist

Read more...

Read more
4 0 2,364
asirika
Partner

Overview

Alfresco Content Services supports the Elasticsearch(ES) platform for searching within the repository using Alfresco Search Enterprise 3.X. While Elasticsearch is a robust technology, it has number of limitations whereas in some situations new index creation and re-indexing is required.

  • ES does not allow dynamic sharding therefore adding new shards to existing index when required is not possible.
  • ES does not allow modifying the existing field mappings, therefore modifications to existing contentModel fields will not be picked up from ES mapping. For Example, tokenisation FALSE to TRUE or field type changes.

However, considering the re-indexing speed this can be achieved within hours without downtime.

This documentation describes the steps which need to follow to build the indexes offline by connecting to same metadas store(Database).

Steps.

Elasticsearch

  1. Create the Elasticsearch index with new name (Example ‘alfresco-new’) with desired number of shards, replicas, total fields etc.

curl -XPUT 'http://ESHOST:PORT/alfresco-new?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" :{
        "number_of_shards":5,
         "number_of_replicas":0,
          "index.mapping.total_fields.limit":2000
   }
}'

Alfresco Content Service (ACS)

  1. Start up new ACS instance in read only mode connecting to same database pointing to the new index created in previous step.
        elasticsearch.indexName=alfresco-new
        server.allowWrite=false

Note: If contentModel changes being made, then make sure to start the repo with new changes deployed.

2. Perform a search to start up the search subsystem, which will then automatically create the relevant mappings in newly created index.

Verification steps

a. Use curl command to Elasticsearch index and validate few of the property mappings and "dynamic" : "false",

curl -XGET 'http://ESHOST:ESPORT/alfresco-new?pretty'

b. Below loggers will appear in catalina.out

2023-06-23 12:21:47,403 INFO [elasticsearch.contentmodelsync.ContentModelSynchronizer] [elasticsearch-initializer] Successfully loaded analysers.

2023-06-23 12:21:47,543 INFO [elasticsearch.contentmodelsync.ContentModelSynchronizer] [elasticsearch-initializer] Successfully loaded basic mappings.

2023-06-23 12:21:47,553 INFO [elasticsearch.contentmodelsync.ElasticsearchInitialiser] [elasticsearch-initializer] Successfully connected to Elasticsearch index.

3. Generate the reindex.prefixes-file.json

4. Execute the re-indexing commands by passing the additional parameter ‘elasticsearch.indexName’ set to the new index. By default, this is set to alfresco.

java -Xmx4G -jar alfresco-elasticsearch-reindexing-3.3.0.1-app.jar \
     --server.port=9090 \
     --alfresco.reindex.jobName=reindexByIds \
     --spring.elasticsearch.rest.uris=http://localhost:9200/alfresco-new \
     --spring.elasticsearch.rest.username=username \
     --spring.elasticsearch.rest.password=password \
     --alfresco.accepted-content-media-types-cache.enabled=false \
     --spring.activemq.broker-url=nio://localhost:61616 \
     --alfresco.reindex.fromId=0 \
     --alfresco.reindex.toId=5000000 \
     --alfresco.reindex.multithreadedStepEnabled=true \
     --alfresco.reindex.concurrentProcessors=30 \
     --alfresco.reindex.metadataIndexingEnabled=true \
     --alfresco.reindex.contentIndexingEnabled=false \
     --alfresco.reindex.pathIndexingEnabled=false \
     --alfresco.reindex.prefixes-file=file:reindex.prefixes-file.json \
     --alfresco.reindex.pagesize=10000 \
     --alfresco.reindex.batchSize=1000\
     --elasticsearch.indexName=alfresco-new
  > reindexing.log &

 5. Validate the document count.

curl -XGET 'http://ESHOST:ESPORT/alfresco-new/_count?pretty'

6. Point ACS cluster nodes to new index: alfresco-new

7. Optional: Destroy the newly created ACS instance and old ElasticSearch index.

8. Happy searching.

Read more
1 0 1,707
asirika
Partner

1. Overview

Alfresco Search Enterprise 3.2 consists of Alfresco Content Services, Elasticsearch Server and the Elasticsearch connectors. Further According to the official documentation there are number of prerequisites such as ActiveMQ, Postgresql Database and Transform Service. Please also note that it is not a must to have transform service running to extract general metadata.

In this post I will cover how we can Scale ES during re-indexing/ live indexing and when to use different ES connector jars.

2. Alfresco Search Enterprise (ASE)

Alfresco Content Services supports the Elasticsearch platform for searching within the repository using Alfresco Search Enterprise 3.2. Alfresco Search Enterprise module is consist of 6 jar files. 

ASE Jar ListASE Jar List

2.1. Re-Indexing

alfresco-elasticsearch-reindexing-3.2.0-app.jar: This is all-in-one jar file which index content, medatdata and path for existing content store.

Picture 2.png

However, this perticular jar comes with 3 parameters which we can configure according to the business requirement.

# Reindexing services execution

alfresco.reindex.metadataIndexingEnabled = true

alfresco.reindex.contentIndexingEnabled = true

alfresco.reindex.pathIndexingEnabled = true

Therefore if we wanted to reindex metadata only, you should pass the parmenters to the above command accordingly as below

Picture 4.png

Sample Search Queries to try Out:

For Metadata Search:  cm:name:'test', cm:author:admin ,cm:title:'test'

For Path Search: PATH:"/app:company_home/st:sites/cm:test/cm:documentLibrary/*"

For Content Search: cm:content:’test’

2.2. Live-Indexing

There are 5 live indexing jars available in ES connector distribution zip.

alfresco-elasticsearch-live-indexing-3.2.0-app.jar : This is all-in-one jar file which index content, medatdata and path for realtime data which consist of all 4 live-indexing jar files specific to mediation, metadata, content, and path. Unlike with all-in-one reindex jar we do not have control over what we should index.

Picture 3.png

When to use other live indexing jars? 

In the events that business do not have the requirement to use full text indexing(content indexing) and when deployinng at Scale. 

To start alfresco-elasticsearch-live-indexing-mediation-3.2.0-app.jar run below command.

Picture 6.png

 

alfresco-elasticsearch-live-indexing-metadata-3.2.0-app.jar: Index metadata only. To start run below command.

Picture 7.png

alfresco-elasticsearch-live-indexing-path-3.2.0-app.jar: Index path only

Picture 8.png

alfresco-elasticsearch-live-indexing-content-3.2.0-app.jar : Index content only

3. Deploying at Scale

3.1. Live-Indexing

When designing highly available systems deploying at scale is essential. Hence below diagram shows most optimized way of designing high available architecture.

Live-Indexing: Deploying at ScaleLive-Indexing: Deploying at Scale

There will be Single point of Failure in Mediation Component as it cannot be scaleup. Therefore, it is a must that we need Monitor the mediation component and run reindexing app for the specific period in case of a failure.

3.2. Re-Indexing

It can take a large amount of time when re-indexing a large repository using a single re-index process. Therefore, with below two approaches you can scale reindexing process vertically as well as horizontally.

3.2.1. Aapproach 1

In this approach we can have multiple EC2 instances to have horizontal scaling and inside each instance we can run multiple reindexing threads.

Re-Indexing:Approach1Re-Indexing:Approach1

Setting Up Re-Indexer Instance

  • Copy alfresco-elasticsearch-connector-distribution-3.2 into each instance
  • We were running 6 threads on one instance and 5 threads on second instance. This can be change accordingly.
  • Run below code with unique port numbers and reindex.fromId and reindex.toId to run as many threads needed in a instance.
  • To fetch by IDS alfresco.reindex.jobName=reindexByIds: index nodes in an interval of database ALF_NODE.id column

Picture 10.png

3.2.2. Approach 2

Re Indexing using remote partitioning. More details can be found in Alfresco Docs. Refer: https://docs.alfresco.com/search-enterprise/latest/admin/#alfresco-elasticsearch-connector

Picture 11.png

To Start Manager, execute below.

java -jar alfresco-elasticsearch-reindexing-3.2.0-app.jar  
 --alfresco.reindex.jobName=reindexByIds 
--alfresco.reindex.partitioning.type=manager
--alfresco.reindex.pagesize=100 --alfresco.reindex.batchSize=100 
--alfresco.reindex.fromId=0 
 --alfresco.reindex.toId=10000 
--spring.batch.datasource.url=
       jdbc:postgresql://localhost:5432/alfresco 
 --spring.batch.datasource.username=alfresco 
--spring.batch.datasource.password=alfresco 
--spring.batch.datasource.driver-class-name=org.postgresql.Driver 
 --spring.datasource.url=jdbc:postgresql://localhost:5432/alfresco 
--spring.datasource.username=alfresco 
 --spring.datasource.password=alfresco 
--alfresco.reindex.partitioning.grid-size=20
--spring.batch.drop.script=
classpath:/org/springframework/batch/core/schema-drop-postgresql.sql 
 --spring.batch.schema.script=
classpath:/org/springframework/batch/core/schema-postgresql.sql
 
 
 

To Start Worker, execute below.

java -jar alfresco-elasticsearch-reindexing-3.2.0-app.jar 
--alfresco.reindex.partitioning.type=worker 
--alfresco.reindex.pagesize=100 --alfresco.reindex.batchSize=100 
--alfresco.reindex.concurrentProcessors=2 
--spring.batch.datasource.url=
jdbc:postgresql://localhost:5432/alfresco 
--spring.batch.datasource.username=alfresco 
--spring.batch.datasource.password=alfresco
--spring.batch.datasource.driver-class-name=org.postgresql.Driver 
--spring.datasource.url=jdbc:postgresql://localhost:5432/alfresco 
--spring.datasource.username=alfresco 
--spring.datasource.password=alfresco 
--spring.batch.drop.script=
classpath:/org/springframework/batch/core/schema-drop-postgresql.sql 
--spring.batch.schema.script=
classpath:/org/springframework/batch/core/schema-postgresql.sql
 --server.port=9091

Note: If you are re-indexing only metadata/ AND Path with remote partitioning approach, make sure to set the related properties while executing Worker command.

4. Comparison of re-indexing approaches

 

Pros

Cons

Approach 1: Multi-threading

Less time consuming, best suit for customers with larger repositories.

Considerable manual work involved setting up threads, however as re-indexing is just one time process this can be highly disregard.

Approach 2: Remote Partitioning

Slower therefore suit for customers with smaller repositories.

Easy to Manage. Number of workers/partitions can be easily managed by setting alfresco.reindex.partitioning.grid-size. Manager thread automatically assign fromId and toId values on worker nodes. 

5. Reference:

Read more
0 0 954
LeoMattioli
Partner

A folder with many linked nodes takes a long time to display in Share.

This is because the metadata of all the linked nodes is retrieved in the JSON response.

Read more...

Read more
4 0 1,938
Alfresco Content Services Blog