Knowledgebase: Sonian Archiving
Search - File - Custom Search
Posted by Rob Chidester on 02 July 2014 06:57 PM

Sonian

Custom Search For File

OVERVIEW

The custom search feature is intended to provide advanced users with the flexibility to create searches that are more complex than those that are available via the Simple, Advanced, or Wizard-based search features.  Examples of capabilities that are accessible via Custom Search include:

Advanced Boolean Queries - Combining Boolean logic (i.e. AND, OR, NOT) with grouping to locate files based on a combination of nested conditions.

Fuzzy Queries - Including common misspellings and alternate spellings of search terms in results.

Proximity Queries - Searching for terms that exist within a certain number of terms of one another in a file.

Advanced Date/Time Queries - Combining date/time ranges with Boolean logic and/or wildcard capabilities to search for files based on a variety of time-related criteria.

NOTE: Custom search might yield different search results apposed to the other search types. When inputting items into a custom search query, our search mechanism will search on exactly what is inputted creating a more precision search. When using Simple, Advanced, or Wizard search our search mechanism will fit in other backed fields that may give your more items than a Custom search to help give you a wider range of data to review.


CREATING A CUSTOM SEARCH

The process of creating a custom search for files is as follows:

        1. Enter a name for the new custom search.
  
        2. Define the search scope. Choose to search in "E-mail", "Files", or a combination of both for integrated search. 

        3. Create a Custom Query.  For assistance creating a custom query, read below.

        4. (Optional) Select the start date for the search using the “Begins On” date picker.        

        5. (Optional) Enter the end date for the search using the “Ends On” date picker. If you leave the "Ends on" field blank, the results will start from the date chosen in the "Begins on" field and go to the most recent collected messages (most of the time, today's date). 

        6. (Optional) Enter Tags and/or Notes for the Search.

        7. (Optional) Select whether you want to display 1) All results 2) only results on legal hold 3) exclude results on legal hold. Note that this optional field can be used as the sole search criteria/condition. 

        8. (Optional) Select Permissions to display the drop-down menu allowing to give search user access to this specific search. This option allows you to allow an archive user to either 1) access this search 2) edit this search 3) export this search or any combination thereof.

        9. Click “Save and Display Results” to save the search and go directly to the search results or "Save and return to list" to save the search and go back to the SAVED SEARCHES pages displaying all the previously created searches.

*Note that constructing Custom Queries is typically an iterative process--particularly against large data sets.  After reviewing the results of the initial query, users can refine the search by clicking “Edit Search to Generate New Results”.

Screenshot - Custom search


CUSTOM QUERY SYNTAX

Custom query terms are the “what and where” components of the search--they enable you to specify what information to search for, and in which index fields to look for that information.

NOTE: If the terms or phrase you are searching for contains a ":" you need to include the search terms or phrase between quotation marks ("..."). For instance, if you are looking for the terms/phrase the broker did the following: then, you would have to write in the search terms field: "the broker did the following:"


Custom Query Fields

To specify a field to query, you would type the name of the field, followed by a colon (with no space in between).  The available fields for files search are as follows:

            filename: - Refers to the contents of the "name" field of a file.

            extension: - Refers to the contents of the "extension" (file format) at the end of a file name.

            content-type: - Refers to file's format. It is similar to the extension: field but the content-type field will be pull out the information from the file's metadata.

            file-path: - Refers to the contents of the file path or folder structure that is retained at the time of import.

            body: - Refers to the content in the body of a file. ***Note that the body field is also the default field, specifying the “body:” field for a search term is optional.

            retention-date: - Refers to the date at which a file was imported in the archive.

To search for information within a given field, you would include that information directly after the name of the field (with no space in between).  For example:

            extension:doc - Queries the file's extension field to locate all files with the extension “doc” after the file name.

            rentention-date:[2011-01-01T00:00:00Z TO 2011-12-31T00:00:00Z] (time is GMT format) - Queries the file's metadata to locate files that were imported in the archive between January 1st and December 31st, 2011.
 
You can also search for phrases within a particular field.  For example:

            filename:"Earnings Results.pdf" - Queries the file's name field to locate all files with the phrase “Earnings Results.pdf” in their names. The file's extension must be mentioned with the file name to allow the search engine to retrieve the exact file, based on its name.
   
            body:"Please do not share" - Queries the file's body field to locate all files with the phrase “Please do not share” in the body.

            file-path:Root/Level1/Level2/Level3* - Queries the file path to locate all files that are located in the folder called Level3.

   
     Wildcard Query Operators

You can use wildcard operators to locate files based on partial terms.  You can use the asterisk ( * ) operator to locate files that contain specified partial terms. For example:
   
             contr* - Denotes any term that begins with “contr” (such as “contract”, “contribution", "control" or “contracted”).
   
            sara@* - Denotes any term that starts with “sara@” (such as “sara@gmail.com", "sara@acme.com" or "sara@acme.co.uk").
   
            43931* - Denotes any term that starts with the sequence “43931” (such as “43931.00” or "43931226").
   
You can use the question mark (?) operator to locate files that contain a specified term, with one given character replaced.  For example:
   
            “???? ???? ???? ????”- Denotes any term that contains four sets consisting of four characters each with a single space between sets (4417 1234 5678 9012, a common format for credit/debit card numbers”).

            “???\-??\-????” - Denotes any term that contains a set of three characters, followed by a dash, followed by two characters, followed by a dash, followed by three characters (i.e. 123-45-6789, a common format for U.S. social security numbers). See the section below “Searching for Terms that Contain Reserved Characters” for the explanation of “/” use.
   
            “forecast?” - Denotes any term beginning with the the string “forecast”, and followed by any character (such as “forecast” and “forecasts”).


NOTE ON WILDCARDS (*)
: Wildcards can be used with any kind of words or number as long as they contain at least 5 characters or numbers. Using wildcards on words will less than 5 characters or strings with less than 5 numbers will return an error message.

NOTE ON LEADING WILDCARDS (*): with file search you cannot use leading wildcards (at the beginning of a search term). 
For example: 
"*irsty" would not return any result.
"*17" would not return any result. 


Searching for Terms that Contain Reserved Characters

The following fields are reserved for use in the syntax of queries and, as such, require special handling when they are contained within query terms:

            + - & | ! ( ) { } [ ] ^ " ~ * ? : \

In order to search for terms that contain any of these characters, you are required to “escape” each reserved character by inserting a backslash (\) before it.  For example:
   
            body:”1\(555\)555\-1212” - Queries for any file with a body that contains the phrase “1 (555) 555-1212”.
   
            filename:promotion\? Queries for any file with a file name that contains the term “promotion?”.


Boolean Query Operators

You can use Boolean operators such as AND, OR, NOT, +, and - to search for files based on multiple terms. 

Use the AND operator to locate files or that satisfy two or more criteria.  For example:
   
            body:demotion AND filename:contract* - Queries for any files with a body that contains the term “demotion” and a name that contains a term that starts with “contract”.
   
            filename:"gartner cool vendor.pdf" AND file-path:"analyst reports" AND retention-date:[2011-01-01T00:00 :00ZTO 2011-12-31T00:00:00Z] - Queries for any file with a name that includes the phrase “gartner cool vendor.pdf” where the folder path includes "analyst reports" (in the folder "analyst reports") and that was imported in the archive between January 1st and December 31st, 2011.

To locate messages where specific fields satisfy two or more criteria, you use the ‘+’ operator rather than using a more elaborate ‘AND’ query.  For example:
   
            body:(+confidential +IPO +bank) - Queries for any file that contains the terms “confidential”, “IPO”, and “bank” in the body. Parentheses are always needed when using this format with “+” signs; they indicate that the terms are grouped together. An alternative approach would be to write the query body:confidential AND body:IPO AND body:bank.
   
            filename:(+agreement* +CEO) AND extension:doc - Queries for any file that contains in his name a term beginning with the string “agreement” and the term “CEO”, and that has a  “.doc” file extension.

Use the OR operator to locate messages or documents that satisfy any of two or more criteria.  For example:
   
            file-path:contract* OR filename:contract* - Queries for any file that belongs to a folder that contains the term starting with “contract" or a file that contains a term starting with "contract" in the file name field.

            guarantee* OR filename:guarantee* - Queries for files that contain a term that begins with “guarantee” in the file body (default field) or in the file name field.


Fuzzy Logic Operators

You can use fuzzy logic operators to query for terms that are close to, but not precisely the same as, a particular term by inserting a tilde (~) after the term.  Fuzzy logic often comes in handy when you want to include misspellings of key terms in the search results.  For example:
   
            extension:doc~ - Queries for file's extension that contain terms that are close to the term “doc”, such as “doc”, and “docx”.

            body:VIN4422331~ - Queries for files that contain terms that are close to the term “VIN4422331”, such as “BIN4422331” and “VIN2222331”.

You are also able to adjust the tolerance of the fuzzy logic (e.g. specify how “close to” the search term, files terms should be in order to be included in search results.  The tolerance is measured on a 0 to 1 scale, with 1 indicating an exact match to the search term.  The default tolerance is 0.5.  You can adjust the tolerance by inserting the tolerance indicator after the tilde.  For example:

            filename:lawyer~0.9 - Queries for files with a name that contain terms that are very close to the term “lawyer”, such as "lawyer" and “lawyers”, but not “player”. 

            body:lawyer~0.2 - Queries for files with a body that contains terms that are even remotely similar to the term “lawyer”, including "lawyers”, “player”, “players”, “lawed”.

Note that most customers will need to try a few different tolerance levels before locating the level that works best for a given search.


Proximity Logic Operators

You can use proximity logic operators to query for two terms that occur within a specified number of words of one another within a field.  In order to query using proximity logic, surround the search terms in quotations, separated by a space, followed by a tilde (~) and one or more digits to indicate the number of words.  For example:

            ”policies enterprise”~10 - Queries for files with a body that contains the term “policies” followed by the term “enterprise” separated by up to 10 additional terms.
   
            filename:”gartner archiving”~4 - Queries for files with a name that contain the term “gartner” followed by the term “archiving” separated by up to 4 additional terms in the subject field.

Note that the proximity search terms are order-dependent (e.g. “pay raise*”~4 would locate “pay needs to be raised” but not “raise my pay”).

 



Attachments 
 
 search - file - custom search.docx (72.95 KB)

ERROR: This domain name does not match domain registered in the license key file (cms.orlinpilot.com), allowed domains: support.excelmicro.com, please change the product path to match the domain under Admin CP > Settings > General Settings
This product will not work properly unless untill that value is changed.

For more information please contact Kayako support at https://my.kayako.com