CONTENTSTART
EXCLUDESTART EXCLUDEEND

Limit Page Crawler Content

The only way for Kentico to index content not in the Page Form / Text webparts is to use the Page Crawler method.  This method will load the page then scan for any text on the page from all sources, including repeaters.

The down side of this, is by default it scans the entire page, including the header and footer.  So if you have the text "Blog" on the header or footer, every single page will show up in the search results.

The Concept

What we need to do is to find a way to tell Kentico "The content is between A and B, index that only," and that's accomplished through adding some keywords for our "start" and "end" to the pages, then modifying the global event "DocumentEvents.GetContent.Execute"

This global event is fired when a page's content is requested by the Smart Search.  At this point the Content of the document is available, including our keywords.  Let's go through this step by step.

The Code

Create a file C# class file in your App_Code folder  (something like CustomSmartSearchContentLoader.cs), and add this code:

using CMS.Base;
using CMS.DataEngine;
using CMS.DocumentEngine;
using CMS.Helpers;
using CMS.Membership;
using System;
using System.Collections.Generic;
using System.Linq;

[SmartSearchContentLoader]
public partial class CMSModuleLoader
{
    /// <summary>
    /// Attribute class for assigning event handlers.
    /// </summary>
    private class SmartSearchContentLoaderAttribute : CMSLoaderAttribute
    {
        /// <summary>
        /// Called automatically when the application starts.
        /// </summary>
        public override void Init()
        {
            // Assigns a handler to the GetContent event for pages
            DocumentEvents.GetContent.Execute += OnGetPageContent;
        }

        private void OnGetPageContent(object sender, DocumentSearchEventArgs e)
        {
            // If it's a page crawler, limit content to the CONTENTSTART, CONTENTEND, and skip any content from EXCLUDESTART to EXCLUDEEND
            if (e.IsCrawler)
            {
                string content = e.Content;
                if (content.Contains("CONTENTSTART"))
                {
                    content = content.Substring(content.IndexOf("CONTENTSTART") + 12);
                }
                if (content.Contains("CONTENTEND"))
                {
                    content = content.Substring(0, content.IndexOf("CONTENTEND"));
                }
                while (content.Contains("EXCLUDESTART"))
                {
                    string ExcludePortion = content.Substring(content.IndexOf("EXCLUDESTART"));
                    if (ExcludePortion.Contains("EXCLUDEEND"))
                    {
                        ExcludePortion = ExcludePortion.Substring(0, ExcludePortion.IndexOf("EXCLUDEEND") + 10);
                    }
                    content = content.Replace(ExcludePortion, "");
                }
                e.Content = content;
				
                // for Page Crawlers and custom indexes, although it will search on the Content, it won't put it in the "Content" for the eval, need to use a custom general field.
                e.SearchDocument.AddGeneralField("UseCustomContent", true, true, false);
                e.SearchDocument.AddGeneralField("CustomContent", e.Content, true, false);
            }
            else
            {
                e.SearchDocument.AddGeneralField("UseCustomContent", false, true, false);
            }
        }
    }
}
Next, you need to add the Keywords to your page to define where the start and ends are (CONTENTSTART and CONTENTEND).  Additionally you can add the exclude start and end keywords to exclude chunks within the searchable content (EXCLUDESTART and EXCLUDEEND).

Since we don't want this text visible, but it must be actual text to be picked up by the crawler (you can't use HTML comments since those are ignored), instead I use the following code to the Master Template's layout.
<div id="content">
  <div class="container">
    <span style="display:none;">CONTENTSTART</span>
    	<cms:CMSWebPartZone ZoneID="PagePlaceholder" runat="server" />
    <span style="display:none;">CONTENTEND</span>
  </div>
</div>
A couple notes, i used <div style="display:none;">  so the keywords are not visible, but are scannable by the content grabber.

Likewise, you can create a WebPart Container that wraps webparts in the exclude tags if you don't want it included on a page's rendering:

 

Final Notes

One thing to note is that for some reason, although we are modifying the Content (and indeed the smart search uses that modified content to search), the Smart Search description that's rendered on the search results still uses the original Content (the entire page).

So in this logic I have added two fields, "UseCustomContent" and "CustomContent."

If you want the search results to show the proper content in your results then, you should look to add a transformation that looks like the below:

<!-- For Kentico 8->8.2 -->
<div>
    <a href="<%# SearchResultUrl() %>"><%# Eval("Title") %></a> (<%# Eval("Created") %>) <br>
    <%#
        IfCompare(Eval("UseCustomContent"), true,
          SearchHighlight(LimitLength(HTMLHelper.StripTags(Eval<string>("Content")), 200), "<strong>", "</strong>"),
          SearchHighlight(LimitLength(HTMLHelper.StripTags(Eval<string>("CustomContent")), 200), "<strong>", "</strong>")
       )
    %>
</div>
<!-- For Kentico 9+ -->
<div>
    <a href="<%# SearchResultUrl() %>"><%# Eval("Title") %></a> (<%# Eval("Created") %>) <br>
    <%#
        IfCompare(GetSearchValue("UseCustomContent"), true,
          SearchHighlight(LimitLength(HTMLHelper.StripTags(Eval<string>("Content")), 200), "<strong>", "</strong>"),
          SearchHighlight(LimitLength(HTMLHelper.StripTags(GetSearchValue("CustomContent").ToString()), 200), "<strong>", "</strong>")
   )
    %>
</div>
Comments
Trevor Fayas
The method should still be there, try SearchHelper.AddGeneralField, but I don't see the original method being removed.
12/2/2018 5:27:22 PM

Gra
Nice idea for filter content. I tried to use searchDocument.AddGeneralField method but is not find in kentico11, any advice ?
12/2/2018 5:16:17 PM

= five + seven
CONTENTEND