Hello everyone,
I have installed Alfresco Community Edition Vers 5.2 on windows (using exe file). As I noticed in my log file, when I upload a PDF file larger than 10 MB, the Alfresco (Solr) is not extracting its text and therefore the file content can not be searched. The log file says:
Metadata extraction rejected, Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@39882d66 Reason: Max doc size exceeded 10 MB.
I would appreciate it if someone could tell me how can I increase this size. I have already tried some solutions (for example increasing alfresco.contentStreamLimit located in file alfresco-community/solr4/archive-SpaceStore/conf/solrcore and alfresco-community/solr4/workspace-SpaceStore/conf/solrcore)
Thanks a lot in advance.
The limitation is defined in your Alfresco repository which converts the pdf to text. Please check your transformer configuration which is by default defined in alfresco-ce-repository/transformers.properties at 5.2.g-patched · ecm4u/alfresco-ce-repository · Git... (sorry I didn't find a valid tag in the Alfresco git repo for 5.2).
Depending on the transformer which takes the task you should increase the maxSourceSizeKBytes.
e.g.
content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600
and set debuggin in your log4j properties
log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG
to find out which transformer actually is running for your documents and/or install GitHub - OrderOfTheBee/ootbee-support-tools: OOTBee Support Tools addon to extend set of administrat... to debug and modify transformation config from your browser.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.