Monday, May 13, 2013

SharePoint 2013 Search Architecture and Components


I was reading through some articles on SharePoint 2010 Search and trying to get the concept embedded on to my brain. I will post more those in a different post.And here comes SharePoint 2013 Search with an even improvised architecture. The FAST Search has been integrated with SharePoint 2013.SharePoint Search is further componentized now.

There are 6 Search Components and 4 Search Databases.

Before you read this article, I would say, complete wipe off the previous Search concepts from your mind. Though the underlying Search architecture is same, it is better to start fresh.

In a nutshell the six Search Components are:
1) Crawl Component
2) Content Processing Component
3) Index Component
4) Query Processing Component
5) Administration Component
6) Analytics Processing Component

The databases associated with Search are:
1) Crawl DB
2) Search Administration DB
3) Link DB
4) Analytics DB

Crawl and Component Process
1)  Crawl Component [CC]
The Crawl Component crawls the content sources and delivers the crawled items [the content and the metadata] to Content Processing Component. Crawl Component uses the Crawl DB to store information about crawled items [like last crawl time, historical information about crawled items etc.]
2) Content Processing Component [CPC]
Content Processing Component is placed between the Crawl Component and Index Component. The crawled items are processed by the Content Processing Component and fed into the Index Component. The processes include document parsing and property mapping.

Index and Query Process
3) Index Component [IC]
The Index Component gets the processed items from the Content Processing Component and writes it to an index file. The Index component receives the queries from the Query Processing Component and provides the results sets
4) Query Processing Component [QPC]
The Query Processing Component is placed between the front end and the Index Component. It analyses and processes search queries and results. The processed search query is submitted to the Index Component. The Index Component returns the result set based on the processed query to the Query Processing Component. This result is send to the front end by the Query Processing Component.

Search Administration
5) Administration Component [AC]
The Search Administration Component is responsible for running the system processes essential to search. It does the provisioning part [adding and initializing search components] .The Administration Component uses the Search Administration DB. We can have multiple Administration Component for a Search Service application. But at a time, only one would be active.

Analytics Process
6) Analytics Processing Component [APC]
Analytics Processing Component analyzes the crawled items (search analytics) and how users interact with the search results (usage analytics). Analytics Processing Component improves the search relevance and creates search reports with these information. The results from this analysis is send to Content Processing Component to include in the Search Index. This information on usage analytics is stored in Analytics Reporting DB. The Link DB stores information extracted from Content Processing Component. It stores data about search clicks and search results. This information is stored unprocessed. The Analytics Processing Component does the analysis.


2 comments: