This blog compares query support provided by transactional metadata query (TMDQ) and the Index Engine. The two differ in a number of respects which are described here. This blog is an evolution of material previously presented at the Alfresco Summit in 2013.
TMDQ delivers support for transactional queries that used to be provided by Lucene in Alfresco Content Services (ACS) prior to version 5.0. In ACS 4.1 SOLR was introduced with an eventually consistency query model. In 5.0, Lucene support was removed in favour of TMDQ. As TMDQ replaced Lucene, some restrictions on its use are similar. For example, both post processes results for permissions in a similar way and there are restrictions on the scale of the result sets it can cope with as a result. The Index Engine has no such restrictions. It also seems from use that if a query can be run against the database the scope of the query is such that it will probably have no issue with the number of results returned.
Some queries can be executed both transactionally against the database or with eventual consistency against the Index Engine. Only queries using the AFTS or CMIS query languages can be executed against the database. The Lucene query language can not be used against the database while selectNodes (XPATH) on the Java API always goes against the database, walking and fetching nodes as required.
In general, TMDQ does not support: structural queries, full text search, special fields like SITE that are derived from structure and long strings (> 1024 characters). Text fields support exact(ish) and pattern based matching subject to the database collation. Filter queries are rewritten along with the main query to create one large query. Ordering is fine, but again subject to database collation for text.
TMDQ does not support faceting. It does not support any aggregation: this includes counting the total number of matches for the query. FINGERPRINT support is only on the Index Server.
AFTS and CMIS queries are parsed to an abstract form. This is then sent to an execution engine. Today, there are two execution engines: the database and the Index Engine. The default is to try the DB first and fall back to the Index Engine - if the query is not supported against the DB. This is configurable for a search sub-system and per query using the Java API. Requesting consistency should appear in the public API "some time soon".
Migrations from Alfresco Content Service prior to 5.0 will require two optional patches to be applied to support TMDQ. Migrations to 5.0 require one patch: 5.0 to 5.1 a second. New installations will support TMDQ by default. The patches add supporting indexes that make the database ~25% larger.
From the public API, anything that is not a simple query, a filter query, an option that affects these, or an option that affects what is returned for each node in the results, is not supported by TMDQ. The next two sections consider what each query language supports.
Explicitly, TMDQ supports:
The default limits for permission evaluation will restrict the results returned from TMDQ based on both the number of results processed and time taken. These can be increased if required.
The public API does not support TMDQ for:
Some of these will be ignored and produce transactional results; others will fail and be eventual.
The public API will ignore the SQL select part of a CMIS query and decorate the results as it would do for AFTS.
For the CMIS query language all expressions except for CONTAINS(), SCORE() and IN_TREE() can now be executed against the database. Most data types are supported except for the CMIS uri and html types. Strings are supported but only if 1024 characters or less in length. In Alfresco Content Services 5.0, OR, decimal and boolean types were not supported; they are from 5.1 on. Primary and secondary types are supported and require inner joins to link them together - they can be somewhat tedious to write and use.
You can skip joins to secondary types from the fetch in CMIS using the public API. You would need an explicit SELECT list and supporting joins from a CMIS client. You still need joins to secondary types for predicates and ordering. As CMIS SQL supports ordering as part of the query language you have to do it there and not via the public API sort.
Post 5.2, left outer join from primary and secondary types to secondary types will also be supported. This covers queries to find documents that do not have an aspect applied - which is currently best implemented using something like
CONTAINS('-ASPECT:hidden')
today.
For multi-valued properties, the CMIS query language supports ANY semantics from SQL 92. A query against a multi-lingual property like title or description is treated as multi-valued and may match in any language. In the results you will see the best value for your locale - which may not match the query. Ordering will consider any value.
UPPER() and LOWER() functions were in early drafts for the CMIS 1.0 specification and sunsequently dropped. These are not part of the CMIS 1.0 or 1.1 specifications. They are currently supported in the CMIS query language for TMDQ only as ways to address specific database collation issues and case sensitivity. Only equality is supported. LIKE is not currently supported. For example:
{
"query": {
"language": "cmis",
"query" : "select * from cmis:document where LOWER(cmis:name) = 'project contract.pdf'"
}
}
It is more difficult to write AFTS queries that use TMDQ as the default behaviour is to use full text queries for text: these can not go against the database. Again, special fields like SITE and TAG that are derived from structure will not go to the database. TYPE, ASPECT and the related exact matches are OK. All property data types are fine but strings again have to be less than 1024 characters in length. Text queries have to be prefixed with = to avoid full text search. PARENT is supported. OR is supported in 5.1 and later.
Ranges are not currently supported - there is no good reason for this - it needs to generate a composite constraint which we have not done. PATH is not supported nor is ANCESTOR.
Transactional Metadata Query and the Index Engine are intended to support different use case. They differ in queries and options that they support and subtly in the results with respect to collation and scoring. We default to trying transactional support first for historical reasons and it seems to be what most people prefer if they can have it.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.