UmbracoExamine
Adds an index field to the collection
Default property for accessing an IndexField definition
Field Name
Initializes a new instance of the class.
The search.
The occurance.
Query on the id
The id.
A new with the clause appended
Query on the NodeName
Name of the node.
A new with the clause appended
Query on the NodeTypeAlias
The node type alias.
A new with the clause appended
Query on the Parent ID
The id of the parent.
A new with the clause appended
Query on the specified field
Name of the field.
The field value.
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
if set to true [include lower].
if set to true [include upper].
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
if set to true [include lower].
if set to true [include upper].
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
A new with the clause appended
Ranges the specified field name.
Name of the field.
The start.
The end.
if set to true [include lower].
if set to true [include upper].
A new with the clause appended
Query on the NodeName
Name of the node.
A new with the clause appended
Query on the NodeTypeAlias
The node type alias.
A new with the clause appended
Query on the specified field
Name of the field.
The field value.
A new with the clause appended
Queries multiple fields with each being an And boolean operation
The fields.
The query.
A new with the clause appended
Queries multiple fields with each being an And boolean operation
The fields.
The query.
A new with the clause appended
Queries multiple fields with each being an Or boolean operation
The fields.
The query.
A new with the clause appended
Queries multiple fields with each being an Or boolean operation
The fields.
The query.
A new with the clause appended
Queries multiple fields with each being an Not boolean operation
The fields.
The query.
A new with the clause appended
Queries multiple fields with each being an Not boolean operation
The fields.
The query.
A new with the clause appended
Queries on multiple fields with their inclusions customly defined
The fields.
The operations.
The query.
A new with the clause appended
Queries on multiple fields with their inclusions customly defined
The fields.
The operations.
The query.
A new with the clause appended
Orders the results by the specified fields
The field names.
A new with the clause appended
Orders the results by the specified fields in a descending order
The field names.
A new with the clause appended
Gets the boolean operation which this query method will be added as
The boolean operation.
An Examine searcher which uses Lucene.Net as the
Default constructor
Constructor to allow for creating an indexer at runtime
Used as a singleton instance
Initializes the provider.
The friendly name of the provider.
A collection of the name/value pairs representing the provider-specific attributes specified in the configuration for this provider.
The name of the provider is null.
The name of the provider has a length of zero.
An attempt is made to call on a provider after the provider has already been initialized.
Do not access this object directly. The public property ensures that the folder state is always up to date
Simple search method which defaults to searching content nodes
Searches the data source using the Examine Fluent API
The fluent API search.
Creates search criteria that defaults to IndexType.Any and BooleanOperation.And
Creates an instance of SearchCriteria for the provider
Creates an instance of SearchCriteria for the provider
The type of data in the index.
The default operation.
A blank SearchCriteria
Gets the searcher for this instance
Checks if the reader is current, closed or not up to date
The reader status
Performs error checking as the reader may be closed
This checks if the singleton IndexSearcher is initialized and up to date.
Used to specify if leading wildcards are allowed. WARNING SLOWS PERFORMANCE WHEN ENABLED!
Directory where the Lucene.NET Index resides
The analyzer to use when searching content, by default, this is set to StandardAnalyzer
Name of the Lucene.NET index set
An implementation of the fluent API boolean operations
Sets the next operation to be AND
Sets the next operation to be OR
Sets the next operation to be NOT
Compiles this instance for fluent API conclusion
Static methods to help query umbraco xml
Converts a content node to XDocument
true if data is going to be returned from cache
If the type of node is not a Document, the cacheOnly has no effect, it will use the API to return
the xml.
Converts an to a
Node to convert
Converted node
Creates an from the collection of
Elements to create document from
Document containing elements
Converts an umbraco library call to an XDocument
Checks if the XElement is an umbraco property based on an alias.
This works for both types of schemas
Returns true if the XElement is recognized as an umbraco xml NODE (doc type)
This takes into account both schemas and returns the node type alias.
If this isn't recognized as an element node, this returns an empty string
Returns the property value for the doc type element (such as id, path, etc...)
If the element is not an umbraco doc type node, or the property name isn't found, it returns String.Empty
Returns umbraco value for a data element with the specified alias.
Extension methods for IndexSet
Convert the indexset to indexerdata.
This detects if there are no user/system fields specified and if not, uses the data service to look them
up and update the in memory IndexSet.
Returns a string array of all fields that are indexed including Umbraco fields
Returns a list of ALL properties names for all nodes defined in the data source
Returns a list of ALL system property names for all nodes defined in the data source
removes html markup from a string
Gets published content by xpath
This is quite an intensive operation...
get all root content, then get the XML structure for all children,
then run xpath against the navigator that's created
Unfortunately, we need to implement our own IsProtected method since
the Umbraco core code requires an HttpContext for this method and when we're running
async, there is no context
Unfortunately, we need to implement our own IsProtected method since
the Umbraco core code requires an HttpContext for this method and when we're running
async, there is no context
Returns a list of all of the user defined property names in Umbraco
Returns a list of all system field names in Umbraco
Adds a single character wildcard to the string for Lucene wildcard matching
The string to wildcard.
An IExamineValue for the required operation
Thrown when the string is null or empty
Adds a multi-character wildcard to a string for Lucene wildcard matching
The string to wildcard.
An IExamineValue for the required operation
Thrown when the string is null or empty
Configures the string for fuzzy matching in Lucene using the default fuzziness level
The string to configure fuzzy matching on.
An IExamineValue for the required operation
Thrown when the string is null or empty
Configures the string for fuzzy matching in Lucene using the supplied fuzziness level
The string to configure fuzzy matching on.
The fuzzieness level.
An IExamineValue for the required operation
Thrown when the string is null or empty
Configures the string for boosting in Lucene
The string to wildcard.
The boost level.
An IExamineValue for the required operation
Thrown when the string is null or empty
Configures the string for proximity matching
The string to wildcard.
The proximity level.
An IExamineValue for the required operation
Thrown when the string is null or empty
Escapes the string within Lucene
The string to wildcard.
An IExamineValue for the required operation
Thrown when the string is null or empty
Sets up an for an additional Examiness
The IExamineValue to continue working with.
The string to postfix.
Combined strings
Converts an Examine boolean operation to a Lucene representation
The operation.
The translated Boolean operation
Converts a Lucene boolean occurrence to an Examine representation
The occurrence to translate.
The translated boolean occurrence
This is a Lucene.Net Examine indexer for Umbraco
Some links picked up along the way:
A matrix of concurrent lucene operations:
http://www.jguru.com/faq/view.jsp?EID=913302.
Based on the info here, it is best to only call optimize when there is no activity,
we only optimized after the queue has been processed and at start up:
http://www.gossamer-threads.com/lists/lucene/java-dev/47895
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html
The prefix added to a field when it is included in the index for sorting
Specifies how many index commits are performed before running an optimization
Used to store a non-tokenized key for the document
Used to store a non-tokenized type for the document
Used to store the path of a content object
Default constructor
Constructor to allow for creating an indexer at runtime
Set up all properties for the indexer based on configuration information specified. This will ensure that
all of the folders required by the indexer are created and exist. This will also create an instruction
file declaring the computer name that is part taking in the indexing. This file will then be used to
determine the master indexer machine in a load balanced environment (if one exists).
The friendly name of the provider.
A collection of the name/value pairs representing the provider-specific attributes specified in the configuration for this provider.
The name of the provider is null.
The name of the provider has a length of zero.
An attempt is made to call on a provider after the provider has already been initialized.
Used to perform thread locking
used to thread lock calls for creating and verifying folders
Used for double check locking during an index operation
We need an internal searcher used to search against our own index.
This is used for finding all descendant nodes of a current node when deleting indexes.
Forces a particular XML node to be reindexed
XML node to reindex
Type of index to use
Rebuilds the entire index from scratch for all index types
This will completely delete the index and recreate it
Deletes a node from the index.
When a content node is deleted, we also need to delete it's children from the index so we need to perform a
custom Lucene search to find all decendents and create Delete item queues for them too.
ID of the node to delete
Re-indexes all data for the index type specified
Adds single node to index. If the node already exists, a duplicate will probably be created,
To re-index, use the ReIndexNode method.
The node to index.
The type to store the node as.
This wil optimize the index for searching, this gets executed when this class instance is instantiated.
This can be an expensive operation and should only be called when there is no indexing activity
Removes the specified term from the index
Boolean if it successfully deleted the term, or there were on errors
Ensures that the node being indexed is of a correct type and is a descendent of the parent id specified.
Collects all of the data that needs to be indexed as defined in the index set.
A dictionary representing the data which will be indexed
Collects the data for the fields and adds the document which is then committed into Lucene.Net's index
The fields and their associated data.
The writer that will be used to update the Lucene index.
The node id.
The type to index the node as.
The path of the content node
This will normalize (lowercase) all text before it goes in to the index.
Process all of the queue items. This checks if this machine is the Executive and if it's in a load balanced
environments. If then acts accordingly:
Not the executive = doesn't index, i
In async mode = use file watcher timer
Loop through all files in the queue item folder and index them.
Regardless of weather this machine is the executive indexer or not or is in a load balanced environment
or not, this WILL attempt to process the queue items into the index.
The number of queue items processed
Inheritors should be very carefully using this method, SafelyProcessQueueItems will ensure
that the correct machine processes the items into the index. SafelyQueueItems calls this method
if it confirms that this machine is the one to process the queue.
Returns an XDocument for the entire tree stored for the IndexType specified.
The xpath to the node.
The type of data to request from the data service.
Either the Content or Media xml. If the type is not of those specified null is returned
Saves a file indicating that the executive indexer should remove the from the index those that match
the term saved in this file.
This will save a file prefixed with the current machine name with an extension of .del
Writes the information for the fields to a file names with the computer's name that is running the index and
a GUID value. The indexer will then index the values stored in the files in another thread so that processing may continue.
This will save a file prefixed with the current machine name with an extension of .add
The fields.
The node id.
The type.
The path of the content node
This makes sure that the folders exist, that the executive indexer is setup and that the index is optimized.
This is called at app startup when the providers are initialized but called again if folder are missing during a
an indexing operation.
Handles the file watcher timer poll elapsed event
This will:
- Disable the FileSystemWatcher
- Recursively process all queue items in the folder and check after processing if any more files have been added
- Once there's no more files to be processed, re-enables the watcher
Checks the writer passed in to see if it is active, if not, checks if the index is locked. If it is locked,
returns checks if the reader is not null and tries to close it. if it's still locked returns null, otherwise
creates a new writer.
Checks the reader passed in to see if it is active, if not, checks if the index is locked. If it is locked,
returns checks if the writer is not null and tries to close it. if it's still locked returns null, otherwise
creates a new reader.
Reads the FileInfo passed in into a dictionary object and deletes it from the index
Reads the FileInfo passed in into a dictionary object and adds it to the index
All field data will be stored into Lucene as is except for dates, these can be stored as standard: yyyyMMdd
Any standard text will be put in lower case format.
Adds all nodes with the given xPath root.
The x path.
The type.
Creates the folder if it does not exist.
Checks if the index is ready to open/write to.
Check if there is an index in the index folder
Checks the disposal state of the objects
When the object is disposed, all data should be written
Releases unmanaged and - optionally - managed resources
true to release both managed and unmanaged resources; false to release only unmanaged resources.
The data service used for retreiving and submitting data to the cms
The analyzer to use when indexing content, by default, this is set to StandardAnalyzer
Used to keep track of how many index commits have been performed.
This is used to determine when index optimization needs to occur.
Indicates whether or this system will process the queue items asynchonously. Default is true.
The interval (in seconds) specified for the timer to process index queue items.
This is only relevant if is true.
The folder that stores the Lucene Index files
The folder that stores the index queue files
The Executive to determine if this is the master indexer
The index set name which references an Examine
By default this is false, if set to true then the indexer will include indexing content that is flagged as publicly protected.
This property is ignored if SupportUnpublishedContent is set to true.
Occurs when [index optimizing].
Occurs when [document writing].
Determines if the manager will call the indexing methods when content is saved or deleted as
opposed to cache being updated.
Data service used to query for media
This is quite an intensive operation...
get all root media, then get the XML structure for all children,
then run xpath against the navigator that's created
Deletes all files in the folder and returns the number deleted.
An implementation of the search results returned from Lucene.Net
Internal cache of search results
Creates the search result from a
The doc to convert.
The score.
A populated search result object
Skips to a particular point in the search results.
This allows for lazy loading of the results paging. We don't go into Lucene until we have to.
The number of items in the results to skip.
A collection of the search results
Gets the enumerator starting at position 0
A collection of the search results
Returns an enumerator that iterates through a collection.
An object that can be used to iterate through the collection.
Gets the total number of results for the search
The total items from the search.
An instance for wiring up Examine to the Umbraco events system
Creates a new instance of the class
Only index using providers that SupportUnpublishedContent
Only remove indexes using providers that SupportUnpublishedContent
Only Update indexes for providers that dont SupportUnpublishedContent
Only update indexes for providers that don't SupportUnpublishedContnet
Defines XPath statements that map to specific umbraco nodes
The folder path of where the lucene index is stored
The index path.
This can be set at runtime but will not be persisted to the configuration file
Returns the DirectoryInfo object for the index path.
The index directory.
When this property is set, the indexing will only index documents that are children of this node.
The collection of node types to index, if not specified, all node types will be indexed (apart from the ones specified in the ExcludeNodeTypes collection).
The collection of node types to not index. If specified, these node types will not be indexed.
A collection of user defined umbraco fields to index
If this property is not specified, or if it's an empty collection, the default user fields will be all user fields defined in Umbraco
The fields umbraco values that will be indexed. i.e. id, nodeTypeAlias, writer, etc...
If this is not specified, or if it's an empty collection, the default optins will be specified:
- id
- version
- parentID
- level
- writerID
- creatorID
- nodeType
- template
- sortOrder
- createDate
- updateDate
- nodeName
- urlName
- writerName
- creatorName
- nodeTypeAlias
- path
Event arguments for a Document Writing event
Lucene.NET Document, including all previously added fields
Fields of the indexer
NodeId of the document being written
A class that defines the type of index for each Umbraco field (non user defined fields)
Alot of standard umbraco fields shouldn't be tokenized or even indexed, just stored into lucene
for retreival after searching.
return the index policy for the field name passed in, if not found, return normal
This class is used to query against Lucene.Net
Returns a that represents this instance.
A that represents this instance.
Query on the id
The id.
A new with the clause appended
Query on the NodeName
Name of the node.
A new with the clause appended
Query on the NodeName
Name of the node.
A new with the clause appended
Query on the NodeTypeAlias
The node type alias.
A new with the clause appended
Query on the NodeTypeAlias
The node type alias.
A new with the clause appended
Query on the Parent ID
The id of the parent.
A new with the clause appended
Query on the specified field
Name of the field.
The field value.
A new with the clause appended
Query on the specified field
Name of the field.
The field value.
A new with the clause appended
Returns the Lucene query object for a field given an IExamineValue
A new with the clause appended
Creates our own style 'multi field query' used internal for the grouped operations
A new with the clause appended
Passes a raw search query to the provider to handle
The query.
A new with the clause appended
Orders the results by the specified fields
The field names.
A new with the clause appended
Orders the results by the specified fields in a descending order
The field names.
A new with the clause appended
Internal operation for adding the ordered results
if set to true [descending].
The field names.
A new with the clause appended
Gets the boolean operation which this query method will be added as
The boolean operation.
Default property for accessing Image Sets
Manages the delegation of authority over which machine in a load balanced environment will perform the indexing.
This is done by an IO race on initialization of the LuceneExamineIndexer.
If a server's app pool is recycled at a seperate time than the rest of the servers in the cluster, it will generally
take over the executive role (this is dependant on the time that the last latest server's app pool was restarted).
The Executive is determined by file lock (.lck) file, theoretically there should only be one of these.
If there is only 1 server in the cluster, then obviously it is the Executive.
Determines if the executive has been initialized.
This is useful for checking if files have been deleted during website operations.
Fired every 10 minutes by the timer object. This timestamps the EXA file to
enure the system knows that this server is active.
This is to ensure that all systems in a Load Balanced environment are aware of exactly how
many other servers are taking part in the load balancing and who they are.
Creates an xml file to declare that this machine is taking part in the index writing.
This is used to determine the master indexer if this app exists in a load balanced environment.
Creates a lock file for this machine if there aren't other ones.
returns true if a lock file was successfully created for this machine.
delete all old lck files (any that are more than cutoffTime old)
delete all old exa files (any that are more than cutoffTime old)
Get all lck files that are not named by this machines name. If there are any, this means that another machine
has won the race and created the lck file for itself. If there is a lck file with the current machines name, then this
must mean it was previously the master indexer and the apppool has recycled in less than the hour.
Updates the timestamp for lck file if it exists
Updates the timestamp for the exa file
Read the machines EXA file
Read the machines LCK file
This will check for any lock files, not created by the current machine. If there are any, then this machine will flag it's
exa file as not being the master indexer, otherwise, it will try to create it's own lock file to let others know it is the race
winner and therefore the master indexer. If this succeeds, it will update it's exa file to flag it as the master indexer.
When the object is disposed, all data should be written
Ensures there is an elected Executive, otherwise starts the race.
Returns a bool as to whether or not this is the Executive machine.
Returns a boolean determining whether or not this server involved in a LoadBalanced
environment with Umbraco Examine.
Returns the machine name of the executive indexer
The number of servers active in indexing
Gets the doc id at a specified index
The index.
Gets the doc score for a doc at a specified index
The index.