Hi Community:
Sometimes, I have a painful requirement when dealing with large repositories (let's say more than 10M of documents).
I have to apply an aspect (cm:indexControl) and some properties (cm:isIndexed=true, and cm:isContentIndexed=false) on every document of the repository. What strategies may you use in a very large repository ? Is there a safer or controlled way for doing this ?
In the past I did it in smaller repos with a basic script, useful but I think it is not enough for this case.
- I used REST API for obtaining the full set of nodeRefs to apply. Basically I did TYPE based paginated searches for every document type.
- And then I iterated over the set of custom nodeRefs, with a simple custom webscript for applying the aspect and properties on each node.
Surely this is not the most effective / fast way for doing. What do you think ? Is there a way for not doing this one by one ? How would you improve each part ?
I use Alfresco 5.2 EE and Alfresco Search Services 1.3.
Kind regards and thanks in advance.
--C.
P.S: Yes, the idea is reindexing SOLR later, for getting smaller SOLR contentstore and indices.
I guess the safer way is to create something on the Repo side, using the Java API.
Developing an Scheduled Job to apply the aspect to the nodes using a paginated search will be faster than using the external API.
Thanks for the idea Angel:
It seems reasonable to develop an scheduled job. It reminds a little bit the SOLR cronjob strategy (but in this case it would be in the repo part).
But do you know how would you query over all living and relevant nodes in an efective way ?
Regards.
--C.
You may use DB or Search Service in order to get the batch of nodes to be updated. Using DB will be more efficient, but it may depend on your requirements.
If you need some inspiration, take a look at the implementation of the TraschcanCleaner addon:
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.