Hi!
On creation and even on deletion of any document I get a duplicate key exception in the postgresql-database:
### Error updating database. Cause: org.postgresql.util.PSQLException: ERROR: duplicate key violated
Unique-Constraint „idx_alf_conturl_cr“
2 calls (apparently retrying):
insert into alf_content_url (id, content_url, content_url_short, content_url_crc, content_size, orphan_time)
values (?, ?, ?, ?, ?, ?)
The documents are being created without errors in Alfresco (due to retrying transaction helper) but where do these exceptions come from?
Thank you very much in advance for some hints.
This is a new feature added to Alfresco 5.1. It has been raised, but I don't know if it's going to be solved.
I was looking for a public issue, but it's only at support level. Should be useful to raise also a public one?
I found it: https://issues.alfresco.com/jira/browse/MNT-17060
Finally the classified this as "not a bug"...
For me it is a bug, specially when I was importing million of rows and the database log filled my disk
Interestingly I have been seeing a similar issue with alf_qname table on Alfresco 5.0.d before. Probably they did not check the cache before attempting to create the QName back then...
Totally agreed that this is a bug. There already is a getOrCreateContentUrl in the AbstractContentDataDAOImpl, so I see no reason why they'd need to do a createOrGetByValue on the contentUrlCache in createContentDataEntity directly instead of calling that getOrCreateContentUrl to reuse existing code. This looks like a poor job on part of ACE-3948 - they seem to have completely misunderstood the root cause. Content URL re-use is extremely common, e.g. when copying around nodes referencing the same content. Uploading a file via Share involves creating a new version which involves copying, so in that case a conflict is guaranteed using a createOrGet approach. The same can happen when copying a node to a different location - which may cause dozens or hundreds of key conflicts depending on how many children get recursively copied. Having one SELECT more (which they complain about in ACE-3948) is always preferable to consciously causing and not caring about a key conflict.
If they really wanted to avoid the SELECT, they should have carried the content URL ID along inside the ContentData object, so they could check that against null when trying to determine if they can re-use an existing entity or have to create a new one. So when a new node is uploaded, they'd see content URL ID as null and create a new enttiy, but when it is copied as part of versioning, they'd see the content URL ID and just do the lookup in the cache (which should contain it, since it was created in that transaction).
Thank you very much, Axel, for your detailed analysis.
I totally agree in your stating one additional select is always preferable to provoking a duplicate key exception. Furthermore an insert in any case binds more ressources in the database than an additional select which can be cached by the database if it has already similarly been executed some seconds before.
Thanks for the conversation. I shared it with the architects for this area of the product.
As described in MNT-17060, we rely on the database to enforce the key constraint because it is the most performant way of doing bulk inserts of new data. When the database sees the violation, we pick that up and respond appropriately. This is a common pattern with our current architecture, and is a convenient optimization.
We selected this architecture as a result of performance profiling in order to optimize new data creation because bulk import is a critical use case. As a result, there is one fewer select statement for every file. This is documented in ACE-3948.
I got the same issue with duplicate key error on postgres DB, on each document creation.
As the ticket is not accessible anymore, how can we solved this problem?
UTC [962] ERROR: duplicate key value violates unique constraint "idx_alf_conturl_cr" CDT STATEMENT: insert into alf_content_url (id, content_url, content_url_short, content_url_crc, content_size, orphan_time) values ($1, $2, $3, $4, $5, $6) select * from alf_content_url order by id desc;
We are running Alfresco on docker with this versions from docker-compose :
postgres:13.3 alfresco/alfresco-acs-nginx:3.1.1 alfresco/alfresco-content-repository-community:7.1.1 alfresco/alfresco-search-services:2.0.2.1 alfresco/alfresco-transform-core-aio:2.5.6 alfresco/alfresco-share:7.1.1 alfresco/alfresco-activemq:5.16.1
Thanks in advance
Cordially
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.