Hello,
I'm running Alfresco 7 on a CentOS 7 VM. And I noticed since the update from Alfresco 6 to Alfresco 7, metadata extraction from office documents don't seem to work at all.
In Alfresco 6, on a webdav upload, I had logs like this:
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get supported: extracter.TikaAuto
Get supported: extracter.Poi
Get returning: extracter.Poi
Starting metadata extraction:
extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@b772bfe
Concurrent extractions : 0
New extraction accepted. Concurrent extractions : 1
Now I have this :
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Finding extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Find unsupported: extracter.RFC822
Find returning: []
Get returning: null
Get extractors for application/vnd.openxmlformats-officedocument.wordprocessingml.document
Get returning: null
All the classes are here in the lib folder added to the classpath :
It's not the same classes that in Alfresco 6 (tika-parsers-alfresco-patched and tika-core-alfresco-patched) but I guess it's related to the upgrade to Tika 2.x.
I tried copying the jars in the tomcat folder to be sure it's not a classpath issue, I tried making a content-services-context.xml to override conf, just in case I added lines like extracter.TikaAuto.enabled = true in alfresco-global.properties, but so far didn't find the cause of the issue.
Does someone have any clue on this?
Thanks,
Raphaël.