Need for Query and Index in AEM

For functionalities like full text search, retrieve content based on certain property or conditions associated with property/to avoid iterations on huge volume of content under the root, we write query based logic in AEM.

Languages supported :
  • XPATH
  • JCR-SQL2 
XPath:
  • Created using AEM Query Builder API. - com.day.cq.search.*
  • From development point of view, we need to be aware of standard OOB predicates to arrive at XPATH query.
JCR-SQL2:
  • JCR-SQL2 queries are created using QueryManager - javax.jcr.query.QueryManager
  • QueryManager is acquired through JCR Session  - session.getWorkspace().getQueryManager()
Query Processing:
  • Before AEM 6.0/Jackrabbit 2, everything in AEM is indexed by default.
  • With Jackrabbit Oak, we can create custom indexes based on the need. OOB indexes are available under /oak:index node in the repository.
  • Oak Query Engine
    • Process queries in the form of JCR-SQL2. 
      • This means if we write queries using QueryBuilder API involving predicates, it will result in XPath query.
      • QueryEngineImpl(org.apache.jackrabbit.oak.query.QueryEngineImpl) then parses XPATH query and converts to SQL2.
    • Uses a cost based query optimizer to get the cost involved to process the query from all the available indexes.
    • All the available indexes under oak:index will estimate the cost.
    • Cost for traversal is also calculated.
    • Cost value of the index can be "Infinity". This implies that respective index cannot deal with specific condition/respective index cannot query the data.
    • Query Engine then picks the index which has lowest estimated cost.
    • Note : 
      • Cost value is an estimated worst-case value and hence need not be accurate. 
      • The above said process happens whenever a query is executed.
  • Example: 
    • If a query is written to get all pages with specific "jcr:title" value, query engine need not traverse the entire repository for jcr:title, instead looks for selected index(based on cost) and then fetch/filter results from that indexed content.
    • oak:index definition for jcr:title of page is available OOB in /oak:index/cqPageLucene/indexRules/cq:Page/properties/jcrTitle
Need for creating custom index:
  • When the query engine has to traverse the entire repository/more than allowed nodes(per configuration, its100000 - Apache Jackrabbit Query Engine Settings - In memory read limit) for what we have queried for, it will result in slow query and eventually throw UnsupportedOperationException to stop further processing. (Observing the logs, we would notice a warn level message suggesting to create an index as shown below)
*DEBUG* [0:0:0:0:0:0:0:1 [1588022611960] GET /libs/cq/search/content/querydebug.html HTTP/1.1] org.apache.jackrabbit.oak.query.QueryImpl no proper index was found for filter Filter(query=select [jcr:path], [jcr:score], * from [nt:unstructured] as a where isdescendantnode(a, '/') /* xpath: /jcr:root//element(*, nt:unstructured) */, path=//*)
*WARN* [0:0:0:0:0:0:0:1 [1588022611960] GET /libs/cq/search/content/querydebug.html HTTP/1.1] org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index): select [jcr:path], [jcr:score], * from [nt:unstructured] as a where isdescendantnode(a, '/') /* xpath: /jcr:root//element(*, nt:unstructured) */; consider creating an index

Screenshot of OSGI config for reference:


query engine settings

Play around in your local instance (to visualize the flow mentioned above)
  • Create a new logger entry in sling log for below highlighted APIs
  • Navigate to http://localhost:4502/libs/cq/search/content/querydebug.html and frame a sample query predicates and execute. (Can also try to intentionally induce a traversal query in local and observe the logs)
    path=/content/we-retail
    type=cq:Page
    1_property=@jcr:content/jcr:title
    1_property.value=English
    p.limit=-1
  • Observe the logs either in log file or directly in /system/console/slinglog (Have highlighted the points mentioned in the flow)

Comments

Post a Comment

Popular posts from this blog

Embedding Third party dependency/OSGi bundle in AEM application hosted in AEMasCS

OSGI Factory Configuration implementation

Embed Third party dependency using bnd-maven-plugin