UmbracoExamine

Adds an index field to the collection

Default property for accessing an IndexField definition

Field Name

Initializes a new instance of the class.

The search. The occurance.

Query on the id

The id. A new with the clause appended

Query on the NodeName

Name of the node. A new with the clause appended

Query on the NodeTypeAlias

The node type alias. A new with the clause appended

Query on the Parent ID

The id of the parent. A new with the clause appended

Query on the specified field

Name of the field. The field value. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. if set to true [include lower]. if set to true [include upper]. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. if set to true [include lower]. if set to true [include upper]. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. A new with the clause appended

Ranges the specified field name.

Name of the field. The start. The end. if set to true [include lower]. if set to true [include upper]. A new with the clause appended

Query on the NodeName

Name of the node. A new with the clause appended

Query on the NodeTypeAlias

The node type alias. A new with the clause appended

Query on the specified field

Name of the field. The field value. A new with the clause appended

Queries multiple fields with each being an And boolean operation

The fields. The query. A new with the clause appended

Queries multiple fields with each being an And boolean operation

The fields. The query. A new with the clause appended

Queries multiple fields with each being an Or boolean operation

The fields. The query. A new with the clause appended

Queries multiple fields with each being an Or boolean operation

The fields. The query. A new with the clause appended

Queries multiple fields with each being an Not boolean operation

The fields. The query. A new with the clause appended

Queries multiple fields with each being an Not boolean operation

The fields. The query. A new with the clause appended

Queries on multiple fields with their inclusions customly defined

The fields. The operations. The query. A new with the clause appended

Queries on multiple fields with their inclusions customly defined

The fields. The operations. The query. A new with the clause appended

Orders the results by the specified fields

The field names. A new with the clause appended

Orders the results by the specified fields in a descending order

The field names. A new with the clause appended

Gets the boolean operation which this query method will be added as

The boolean operation.

An Examine searcher which uses Lucene.Net as the

Default constructor

Constructor to allow for creating an indexer at runtime

Used as a singleton instance

Initializes the provider.

The friendly name of the provider. A collection of the name/value pairs representing the provider-specific attributes specified in the configuration for this provider. The name of the provider is null. The name of the provider has a length of zero. An attempt is made to call on a provider after the provider has already been initialized.

Do not access this object directly. The public property ensures that the folder state is always up to date

Simple search method which defaults to searching content nodes

Searches the data source using the Examine Fluent API

The fluent API search.

Creates search criteria that defaults to IndexType.Any and BooleanOperation.And

Creates an instance of SearchCriteria for the provider

Creates an instance of SearchCriteria for the provider

The type of data in the index. The default operation. A blank SearchCriteria

Gets the searcher for this instance

Checks if the reader is current, closed or not up to date

The reader status Performs error checking as the reader may be closed

This checks if the singleton IndexSearcher is initialized and up to date.

Used to specify if leading wildcards are allowed. WARNING SLOWS PERFORMANCE WHEN ENABLED!

Directory where the Lucene.NET Index resides

The analyzer to use when searching content, by default, this is set to StandardAnalyzer

Name of the Lucene.NET index set

An implementation of the fluent API boolean operations

Sets the next operation to be AND

Sets the next operation to be OR

Sets the next operation to be NOT

Compiles this instance for fluent API conclusion

Static methods to help query umbraco xml

Converts a content node to XDocument

true if data is going to be returned from cache If the type of node is not a Document, the cacheOnly has no effect, it will use the API to return the xml.

Converts an to a

Node to convert Converted node

Creates an from the collection of

Elements to create document from Document containing elements

Converts an umbraco library call to an XDocument

Checks if the XElement is an umbraco property based on an alias. This works for both types of schemas

Returns true if the XElement is recognized as an umbraco xml NODE (doc type)

This takes into account both schemas and returns the node type alias. If this isn't recognized as an element node, this returns an empty string

Returns the property value for the doc type element (such as id, path, etc...) If the element is not an umbraco doc type node, or the property name isn't found, it returns String.Empty

Returns umbraco value for a data element with the specified alias.

Extension methods for IndexSet

Convert the indexset to indexerdata. This detects if there are no user/system fields specified and if not, uses the data service to look them up and update the in memory IndexSet.

Returns a string array of all fields that are indexed including Umbraco fields

Returns a list of ALL properties names for all nodes defined in the data source

Returns a list of ALL system property names for all nodes defined in the data source

removes html markup from a string

Gets published content by xpath

This is quite an intensive operation... get all root content, then get the XML structure for all children, then run xpath against the navigator that's created

Unfortunately, we need to implement our own IsProtected method since the Umbraco core code requires an HttpContext for this method and when we're running async, there is no context

Unfortunately, we need to implement our own IsProtected method since the Umbraco core code requires an HttpContext for this method and when we're running async, there is no context

Returns a list of all of the user defined property names in Umbraco

Returns a list of all system field names in Umbraco

Adds a single character wildcard to the string for Lucene wildcard matching

The string to wildcard. An IExamineValue for the required operation Thrown when the string is null or empty

Adds a multi-character wildcard to a string for Lucene wildcard matching

The string to wildcard. An IExamineValue for the required operation Thrown when the string is null or empty

Configures the string for fuzzy matching in Lucene using the default fuzziness level

The string to configure fuzzy matching on. An IExamineValue for the required operation Thrown when the string is null or empty

Configures the string for fuzzy matching in Lucene using the supplied fuzziness level

The string to configure fuzzy matching on. The fuzzieness level. An IExamineValue for the required operation Thrown when the string is null or empty

Configures the string for boosting in Lucene

The string to wildcard. The boost level. An IExamineValue for the required operation Thrown when the string is null or empty

Configures the string for proximity matching

The string to wildcard. The proximity level. An IExamineValue for the required operation Thrown when the string is null or empty

Escapes the string within Lucene

The string to wildcard. An IExamineValue for the required operation Thrown when the string is null or empty

Sets up an for an additional Examiness

The IExamineValue to continue working with. The string to postfix. Combined strings

Converts an Examine boolean operation to a Lucene representation

The operation. The translated Boolean operation

Converts a Lucene boolean occurrence to an Examine representation

The occurrence to translate. The translated boolean occurrence

This is a Lucene.Net Examine indexer for Umbraco

Some links picked up along the way: A matrix of concurrent lucene operations: http://www.jguru.com/faq/view.jsp?EID=913302. Based on the info here, it is best to only call optimize when there is no activity, we only optimized after the queue has been processed and at start up: http://www.gossamer-threads.com/lists/lucene/java-dev/47895 http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html

The prefix added to a field when it is included in the index for sorting

Specifies how many index commits are performed before running an optimization

Used to store a non-tokenized key for the document

Used to store a non-tokenized type for the document

Used to store the path of a content object

Default constructor

Constructor to allow for creating an indexer at runtime

Set up all properties for the indexer based on configuration information specified. This will ensure that all of the folders required by the indexer are created and exist. This will also create an instruction file declaring the computer name that is part taking in the indexing. This file will then be used to determine the master indexer machine in a load balanced environment (if one exists).

The friendly name of the provider. A collection of the name/value pairs representing the provider-specific attributes specified in the configuration for this provider. The name of the provider is null. The name of the provider has a length of zero. An attempt is made to call on a provider after the provider has already been initialized.

Used to perform thread locking

used to thread lock calls for creating and verifying folders

Used for double check locking during an index operation

We need an internal searcher used to search against our own index. This is used for finding all descendant nodes of a current node when deleting indexes.

Forces a particular XML node to be reindexed

XML node to reindex Type of index to use

Rebuilds the entire index from scratch for all index types

This will completely delete the index and recreate it

Deletes a node from the index.

When a content node is deleted, we also need to delete it's children from the index so we need to perform a custom Lucene search to find all decendents and create Delete item queues for them too. ID of the node to delete

Re-indexes all data for the index type specified

Adds single node to index. If the node already exists, a duplicate will probably be created, To re-index, use the ReIndexNode method.

The node to index. The type to store the node as.

This wil optimize the index for searching, this gets executed when this class instance is instantiated.

This can be an expensive operation and should only be called when there is no indexing activity

Removes the specified term from the index

Boolean if it successfully deleted the term, or there were on errors

Ensures that the node being indexed is of a correct type and is a descendent of the parent id specified.

Collects all of the data that needs to be indexed as defined in the index set.

A dictionary representing the data which will be indexed

Collects the data for the fields and adds the document which is then committed into Lucene.Net's index

The fields and their associated data. The writer that will be used to update the Lucene index. The node id. The type to index the node as. The path of the content node This will normalize (lowercase) all text before it goes in to the index.

Process all of the queue items. This checks if this machine is the Executive and if it's in a load balanced environments. If then acts accordingly: Not the executive = doesn't index, i In async mode = use file watcher timer

Loop through all files in the queue item folder and index them. Regardless of weather this machine is the executive indexer or not or is in a load balanced environment or not, this WILL attempt to process the queue items into the index.

The number of queue items processed Inheritors should be very carefully using this method, SafelyProcessQueueItems will ensure that the correct machine processes the items into the index. SafelyQueueItems calls this method if it confirms that this machine is the one to process the queue.

Returns an XDocument for the entire tree stored for the IndexType specified.

The xpath to the node. The type of data to request from the data service. Either the Content or Media xml. If the type is not of those specified null is returned

Saves a file indicating that the executive indexer should remove the from the index those that match the term saved in this file. This will save a file prefixed with the current machine name with an extension of .del

Writes the information for the fields to a file names with the computer's name that is running the index and a GUID value. The indexer will then index the values stored in the files in another thread so that processing may continue. This will save a file prefixed with the current machine name with an extension of .add

The fields. The node id. The type. The path of the content node

This makes sure that the folders exist, that the executive indexer is setup and that the index is optimized. This is called at app startup when the providers are initialized but called again if folder are missing during a an indexing operation.

Handles the file watcher timer poll elapsed event This will: - Disable the FileSystemWatcher - Recursively process all queue items in the folder and check after processing if any more files have been added - Once there's no more files to be processed, re-enables the watcher

Checks the writer passed in to see if it is active, if not, checks if the index is locked. If it is locked, returns checks if the reader is not null and tries to close it. if it's still locked returns null, otherwise creates a new writer.

Checks the reader passed in to see if it is active, if not, checks if the index is locked. If it is locked, returns checks if the writer is not null and tries to close it. if it's still locked returns null, otherwise creates a new reader.

Reads the FileInfo passed in into a dictionary object and deletes it from the index

Reads the FileInfo passed in into a dictionary object and adds it to the index

All field data will be stored into Lucene as is except for dates, these can be stored as standard: yyyyMMdd Any standard text will be put in lower case format.

Adds all nodes with the given xPath root.

The x path. The type.

Creates the folder if it does not exist.

Checks if the index is ready to open/write to.

Check if there is an index in the index folder

Checks the disposal state of the objects

When the object is disposed, all data should be written

Releases unmanaged and - optionally - managed resources

true to release both managed and unmanaged resources; false to release only unmanaged resources.

The data service used for retreiving and submitting data to the cms

The analyzer to use when indexing content, by default, this is set to StandardAnalyzer

Used to keep track of how many index commits have been performed. This is used to determine when index optimization needs to occur.

Indicates whether or this system will process the queue items asynchonously. Default is true.

The interval (in seconds) specified for the timer to process index queue items. This is only relevant if is true.

The folder that stores the Lucene Index files

The folder that stores the index queue files

The Executive to determine if this is the master indexer

The index set name which references an Examine

By default this is false, if set to true then the indexer will include indexing content that is flagged as publicly protected. This property is ignored if SupportUnpublishedContent is set to true.

Occurs when [index optimizing].

Occurs when [document writing].

Determines if the manager will call the indexing methods when content is saved or deleted as opposed to cache being updated.

Data service used to query for media

This is quite an intensive operation... get all root media, then get the XML structure for all children, then run xpath against the navigator that's created

Deletes all files in the folder and returns the number deleted.

An implementation of the search results returned from Lucene.Net

Internal cache of search results

Creates the search result from a

The doc to convert. The score. A populated search result object

Skips to a particular point in the search results.

This allows for lazy loading of the results paging. We don't go into Lucene until we have to. The number of items in the results to skip. A collection of the search results

Gets the enumerator starting at position 0

A collection of the search results

Returns an enumerator that iterates through a collection.

An object that can be used to iterate through the collection.

Gets the total number of results for the search

The total items from the search.

An instance for wiring up Examine to the Umbraco events system

Creates a new instance of the class

Only index using providers that SupportUnpublishedContent

Only remove indexes using providers that SupportUnpublishedContent

Only Update indexes for providers that dont SupportUnpublishedContent

Only update indexes for providers that don't SupportUnpublishedContnet

Defines XPath statements that map to specific umbraco nodes

The folder path of where the lucene index is stored

The index path. This can be set at runtime but will not be persisted to the configuration file

Returns the DirectoryInfo object for the index path.

The index directory.

When this property is set, the indexing will only index documents that are children of this node.

The collection of node types to index, if not specified, all node types will be indexed (apart from the ones specified in the ExcludeNodeTypes collection).

The collection of node types to not index. If specified, these node types will not be indexed.

A collection of user defined umbraco fields to index

If this property is not specified, or if it's an empty collection, the default user fields will be all user fields defined in Umbraco

The fields umbraco values that will be indexed. i.e. id, nodeTypeAlias, writer, etc...

If this is not specified, or if it's an empty collection, the default optins will be specified: - id - version - parentID - level - writerID - creatorID - nodeType - template - sortOrder - createDate - updateDate - nodeName - urlName - writerName - creatorName - nodeTypeAlias - path

Event arguments for a Document Writing event

Lucene.NET Document, including all previously added fields

Fields of the indexer

NodeId of the document being written

A class that defines the type of index for each Umbraco field (non user defined fields) Alot of standard umbraco fields shouldn't be tokenized or even indexed, just stored into lucene for retreival after searching.

return the index policy for the field name passed in, if not found, return normal

This class is used to query against Lucene.Net

Returns a that represents this instance.

A that represents this instance.

Query on the id

The id. A new with the clause appended

Query on the NodeName

Name of the node. A new with the clause appended

Query on the NodeName

Name of the node. A new with the clause appended

Query on the NodeTypeAlias

The node type alias. A new with the clause appended

Query on the NodeTypeAlias

The node type alias. A new with the clause appended

Query on the Parent ID

The id of the parent. A new with the clause appended

Query on the specified field

Name of the field. The field value. A new with the clause appended

Query on the specified field

Name of the field. The field value. A new with the clause appended

Returns the Lucene query object for a field given an IExamineValue

A new with the clause appended

Creates our own style 'multi field query' used internal for the grouped operations

A new with the clause appended

Passes a raw search query to the provider to handle

The query. A new with the clause appended

Orders the results by the specified fields

The field names. A new with the clause appended

Orders the results by the specified fields in a descending order

The field names. A new with the clause appended

Internal operation for adding the ordered results

if set to true [descending]. The field names. A new with the clause appended

Gets the boolean operation which this query method will be added as

The boolean operation.

Default property for accessing Image Sets

Manages the delegation of authority over which machine in a load balanced environment will perform the indexing. This is done by an IO race on initialization of the LuceneExamineIndexer. If a server's app pool is recycled at a seperate time than the rest of the servers in the cluster, it will generally take over the executive role (this is dependant on the time that the last latest server's app pool was restarted). The Executive is determined by file lock (.lck) file, theoretically there should only be one of these. If there is only 1 server in the cluster, then obviously it is the Executive.

Determines if the executive has been initialized. This is useful for checking if files have been deleted during website operations.

Fired every 10 minutes by the timer object. This timestamps the EXA file to enure the system knows that this server is active. This is to ensure that all systems in a Load Balanced environment are aware of exactly how many other servers are taking part in the load balancing and who they are.

Creates an xml file to declare that this machine is taking part in the index writing. This is used to determine the master indexer if this app exists in a load balanced environment.

Creates a lock file for this machine if there aren't other ones.

returns true if a lock file was successfully created for this machine.

delete all old lck files (any that are more than cutoffTime old)

delete all old exa files (any that are more than cutoffTime old)

Get all lck files that are not named by this machines name. If there are any, this means that another machine has won the race and created the lck file for itself. If there is a lck file with the current machines name, then this must mean it was previously the master indexer and the apppool has recycled in less than the hour.

Updates the timestamp for lck file if it exists

Updates the timestamp for the exa file

Read the machines EXA file

Read the machines LCK file

This will check for any lock files, not created by the current machine. If there are any, then this machine will flag it's exa file as not being the master indexer, otherwise, it will try to create it's own lock file to let others know it is the race winner and therefore the master indexer. If this succeeds, it will update it's exa file to flag it as the master indexer.

When the object is disposed, all data should be written

Ensures there is an elected Executive, otherwise starts the race. Returns a bool as to whether or not this is the Executive machine.

Returns a boolean determining whether or not this server involved in a LoadBalanced environment with Umbraco Examine.

Returns the machine name of the executive indexer

The number of servers active in indexing

Gets the doc id at a specified index

The index.

Gets the doc score for a doc at a specified index

The index.