Posts

Showing posts from June, 2020

Unit testing in AEM - Introduction

Image
This post is about an introduction to Unit testing the Java class part of AEM application by starting with quick recap of JUnit framework followed by Mocking and APIs available specific to AEM with respect to testing.  Java class that we write as part of AEM involves Sling API/JCR API/AEM related APIs and it all ultimately targets the content on our repository.  In other words, the logic revolves around the content which in AEM context, is a Resource/Node and its related properties (may it be a Sling model/WCMUsePojo/Sling Servlets/OSGI component/any related for that matter)  Quick recap of basics: JUnit is the testing framework for Java and is available under the package - org.junit.* It provides Test Fixture, Test Runner and Test class Test fixture is fixed state of objects in which tests are run. One of the methods it includes which is relevant for us is setUp method that we write in Test class. (Annotated with @BeforeEach, JUnit5 / @Before, JUnit 4) Test class has methods to be t

Apache Tika config in Lucene Index and Query Flow Summary

Image
This post is about the Apache tika config on Lucene full text Index and summary on queries/indexing that we discussed in past few posts. Apache Tika is used to detect and extract the text from varying file formats. It consist of Detector and Parser where Detector is used to detect the file format and Parser will parse the contents of the file.  In Lucene Index, Oak uses the default config which uses  TypeDetector -  org.apache.tika.detect.TypeDetector This detector uses the content type available in input metadata to arrive at the content type/mimeType DefaultParser -  org.apache.tika.parser.DefaultParser Composite parser which is based on all available specific parser implementations.  Eg. PDFParser, MP4Parser and all other parser implementation available in Apache Tika. Empty Parser -  org.apache.tika.parser.EmptyParser As with the name, it is a dummy parser/ not parses anything Hence defining mime types within Empty Parser is equivalent to excluding them from text extraction. In Def

Lucene Index in AEM - Part 3

Image
This post illustrates the use of analyzers in full text search with sample use case. Apache Lucene Analyzers : Analyzers as with the name is used to analyze the text both at the time of indexing and at the time of searching (via query execution) An analyzer examines the text of fields and generates a token stream. It can be either a  Single Java class or  Composed of a series of Tokenizer and Filter Java classes. Tokenizer breaks the data into lexical units or tokens Filters then examines these tokens -> amends/discard/create new one based on the configuration Series of Tokenizer + Filters => Analyzer There are direct Analyzer classes, Tokenizer and Filters available OOB. Based on our requirement we can choose to use either direc t Analyzer or Tokenizer + Filter combination.(Analyzer via composition) Examples: Analyzer -  StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer) Removes stop words, converts to lowercase, recognize URLs and emails - most commonly u

Lucene Index in AEM - Part 2

Image
This post illustrates about full text search scenario and steps to create custom Lucene Full Text Index. On a high level, for full text search, we need to index all nodes and properties (these are the two major means in which our content is held in the repository)  For property: In order to make a specific property to be indexed for both full text and property constraint scenario, then in the property definition of respective indexRule , we need to add below property for Full text: Name Type Value nodeScopeIndex Boolean true For property constraint: (name and isRegexExp are interrelated as already explained in previous post) Name Type Value propertyIndex Boolean true name String propertyname or regex pattern isRegexExp Boolean false or true Example : jcr:title property of a page might be used in queries with property constraints or we might need to get the results of full text/contains queries based on the jcr:title  or both.  For node: indexRules specific to