Indexing in AEM - Indexing modes and Index types

In order for queries to perform well, Oak supports indexing concept which will index content stored in the repository based on the index definition/type/indexing mode.
  • Indexing works by comparison of the node state (Difference between base state and modified state ) where NodeState(org.apache.jackrabbit.oak.spi.state.NodeState) respresents a specific immutable state of the node. 
  • Below mentioned types of indexing modes are defined based on how this comparison is made + when the index content gets updated. 
Indexing modes:
  • Sychronous Indexing
    • This mode updates index content as part of the commit to the actual content. In other words, content update and the respective update in index will happen together (as with the name synchronous)
    • Supported Index Type: Property Index
  • Asynchronous Indexing
    • Index update is done via a scheduled jobs(AsyncIndexJobUpdate) defined at specific interval. (5 seconds OOB)
    • As indexing in this mode happens asynchronously irrespective of the updates to the content, there is a chance of slight lag behind the latest repository state and will be eventually consistent. 
    • Supported Index Type: Lucene Index and Solr Index
    • Example: /oak:index/cqTagLuceneaync_indexing
  • Near Real time (NRT) Indexing
    • Indexing happens in two modes/at two places
      • Persisted Index : Index updated via job mentioned above, AsyncIndexJobUpdate. 
      • Local Index : In addition to persisted index, indexes will be created locally with help of copy-on-read support(Config as part of org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService). It keeps data between two async index job runs.
        • Note: We can choose to provide local index path as part of the config. If not mentioned, it will be stored in "index" directory of our instance repository home
    • In other words, content that is updated after last async job run and before commencing job at next periodic interval, will be able to show up quickly.  (With both persisted and local index in place)
    • Supported Index Type: Lucene Index
    • Usage:
      • NRT indexing mode has two variations - nrt and sync.
      • NRT Indexing mode - nrt
        • async property on oak:index is set as ['async', 'nrt']
        • Local index is updated asynchronously
      • NRT Indexing mode - sync
        • async property on oak:index is set as ['async', 'sync']
        • Local index is updated synchronously. This mode indexes slowly compared to nrt mode. 
    • Example : Few OOB Lucene index works on "NRT indexing mode -nrt". One such is in /oak:index/cqPageLucenenrt_index_mode
Types of Oak Index:
  • Property Index
    • Useful for queries which has property constraints that are not full text. 
    • Identified by the following properties:
      • type -> property 
      • propertyNames -> property name for which index is to be created [Name array]
  • Lucene Index
    • Lucene Fulltext Index
      • Useful for queries involving full-text conditions.
      • Identified by the following properties:
        • type -> lucene
        • async -> async
    • Lucene Property Index
      • Same as property index mentioned above. Given that it is Lucene property index, it will index in async mode. 
      • Identified by the following properties:
        • type -> lucene
        • async -> async
        • fullTextEnabled -> false [Boolean]
        • includePropertyNames -> property names for which index is to be created [String array]
  • Solr Index 
    • Used when Apache Solr is used for search functionality.
    • Identified by the following properties:
      • type -> solr
      • async -> async
  • Ordered Index (deprecated)
Note:
  • Based on the query predicates we use, indexes used will differ accordingly (Difference is explained with example query below)
  • Apart from the key properties highlighted above, there are other supporting properties/nodes for an index definition. Will be covered separately for better clarity. 
Index Manager, Admin UI:
We have Admin UI OOB for displaying the indexes available in our instance.
  • Navigate to Tools -> Operations -> Diagnosis -> Index Manager
  • We have filter options to filter based on the index name, type, path. 
  • On selecting specific index, we have two options - Index Info, Consistency check.
Index Manager

For node and property level details of oak:index by type, use query predicates like below in querydebug.html
(From the result set, we can navigate to CRXDE of the respective node and observe the nodes, properties)
  • Get all Property Indexes:
  • path=/oak:index
    type=oak:QueryIndexDefinition
    1_property=type
    1_property.value=property
    p.limit=-1
  • Get all Lucene Indexes:
  • path=/oak:index
    type=oak:QueryIndexDefinition
    1_property=type
    1_property.value=lucene
    p.limit=-1
  • Get all indexes of mode NRT, with scheme nrt:
  • path=/oak:index
    type=oak:QueryIndexDefinition
    1_property=type
    1_property.value=lucene
    2_property=async
    2_property.value=%nrt
    2_property.operation=like
    p.limit=-1
Play around in local (to understand the index modes and types used for your queries)
  • Use query predicates used in your project/create one and execute in "Explain Query" console mentioned above.
  • Observe the indexes used. 
  • Sample full text query to search for the text "we-retail" in pages:
  • fulltext=we-retail
    path=/content/we-retail
    type=cq:Page
    p.limit=-1
  • Explain Query console with below query.Explain Query
  • Hit on Explain will result in the below display Query Explanation
  • Same query without "type" predicate will result in different index being used. Query Explanation full text
Observation:
  • As evident from above example, if type is used, lucene index which has definition for specific type will be picked. (In this case, type used : cq:Page and index : cqPageLucene)
  • type is removed, full text lucene index is used - /oak:index/lucene (async mode -> fulltext-async)
Note
  • Each of the bulletins explained above by itself is a vast topic and has more details to it. Will try to cover in upcoming posts.

Comments

Popular posts from this blog

Embedding Third party dependency/OSGi bundle in AEM application hosted in AEMasCS

Embed Third party dependency using bnd-maven-plugin

OSGI Factory Configuration implementation