Hello,
sorry, if the question is too basic, but I searched for hours for an answer.
I don't understand the meaning in the docs for the exact phrase search:
"The whole phrase will be tokenized"
Thanks for explaining the "tokenized". I am looking forward to understand the difference to the exact term search, which is not clear for me:
https://docs.alfresco.com/search-services/latest/using/#search-for-an-exact-term
Thanks for any help,
Thorsten
Phrases are enclosed in double quotes. Any embedded quotes can be escaped using ``. If no field is specified then the default TEXT field will be used, as with searches for a single term.
The whole phrase will be tokenized before the search according to the appropriate data dictionary definition(s).
SOLR is using tokenization when searching: https://solr.apache.org/guide/6_6/tokenizers.html
That means that searching term is not what you are typing, but some meaningful parts of the sentence.
When searching for "Running is a sport", the real query is expanded to "run, run_is, is, is_a, a, a_sport, sport". So you are getting all the results including that tokens.
However, when using ="Running is a sport", the query returns the fields that include exactly that terms in the order specified "Running, is, a, sport".
Thank you very much for clarification of "tokenization"!
@angelborroy wrote:When searching for "Running is a sport", the real query is expanded to "run, run_is, is, is_a, a, a_sport, sport".
I did not find in the solr6 tokenization doc, that "is_a" or "a_sport" has also to be seen as a token. I expected that only different words are tokens, but not all two word combinations behind each other. (Just to be sure: The underscore of your example does mean a single space, doesn't it?)
@angelborroy wrote:So you are getting all the results including that tokens.
Does this mean, that every token you mentioned has to appear in every result document? But the order of the found tokens is not necessary? Therefore also documents are found with the following content: 'Is sport a running game'. No documents are found with this content: "Is this game a sport". Is this correct?
BTW If this is true, I don't understand why this search is called "phrase" search. Normally a phrase search implicits a certain order. It's more like a "set search"...
@angelborroy wrote:However, when using ="Running is a sport", the query returns the fields that include exactly that terms in the order specified "Running, is, a, sport".
I am glad that I interpreted this syntax correctly. Is it possible to use it as a JSON query without problems? I could not integrate the equal sign immediately into the following syntax:
{ "query": \{ "query":"cm:content:('*Running is a sport*')" } }
IMO the equal sign does not harmonize with cm:content. But perhaps I should omit cm:content and replace it with TEXT?
Thorsten
When using "=" with content (TEXT) fields, not the whole field value is considered. It will also fetch the content that includes that sentence.
@angelborroy wrote:When using "=" with content (TEXT) fields, not the whole field value is considered. It will also fetch the content that includes that sentence.
I am not sure if I understand you. Do you refer to my wildcards in the example above?
Regarding the field type TEXT: Is the following definition of TEXT correct?
TEXT virtual field (Because the link refers to Alfresco Search Enterprise. I did not find any other doc.)
BTW The syntax for an exact term search with JSON is clear now. The following works:
{ "query": { "query":"=cm:content:'Runnnig is a sport'" } }
Thanks,
Thorsten
TEXT uses can be found in https://docs.alfresco.com/search-services/latest/using/#search-in-fields
When using "=" operator you have two different behaviours:
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.