Making Effective Searches

Contents


General Information

If you don't have a browser that can support forms, you can still do a search, but you will not be able to change any of the options described below that can be set on the search form.

Search Results

The results page that will be returned to you after a search shows all files containing matches, followed by each of the matched lines. The file names are all links, so that you may quickly go to any matched file to view it. The matched lines (usually) show the matching words highlighted, and each line is also a link. When one is followed, you will be brought to the chosen file at the very line selected.

The total number of matched lines and files will appear at the bottom of the list.

Speed

Most searches should begin to send back results within 30 seconds. If many items were found, it could take a long time to send the results back. In this case it may be better to make a more specific search that will find fewer matches, or limit the amount of data sent back by lowering the maximum values at the bottom of the form. In particular, use the misspellings option sparingly, and avoid such obvious search words as "juggling" or "club".

In unusual cases a search may take a very long time. If a minute has passed and no results have begun to be returned, it may be best to abort the search and attempt a more specific search.

Keep in mind that the computer performing your searches has real work to do. Do not tie it up with repeated lengthy searches. Some things that will speed up a search are to use longer words, whole word searching, no spelling correction, and more phrases when all phrases are required to match.

Bugs

There are several known bugs and limitations in the underlying search engine that I hope can be fixed soon. I am working on cleaning up what minor bugs I am aware of in this interface. Please let me know, via jis@juggling.org, of anything that seems broken, and provide as much information as possible that would allow me to duplicate the problem.


Forming Searches

A search may be made for several different search phrases at once. Simply separate the phrases with commas. You can choose whether to search for all of the phrases together on a line or in a file, or to search for matches to any of the phrases. Each phrase consists of one or more words.

When using forms to prepare searches, there are several ways to control the type of search performed, and the amount of information to be returned. In general, you can choose between faster searches or finding more matches.

Phrases

A phrase is a list of words separated by spaces, and possibly including punctuation, but may not include a comma, which is reserved for separating phrases. There are no other characters with special meaning. All of the characters within a single phrase must match the text of a single line exactly, subject to the options described below.

Words

The search is word-based, where a word is a string of letters or digits. Normally, search phrases will be single words, but if multiple words are given within a single phrase, they will only match the same words, in order, on a single line of text. Some words that appear very often should obviously be avoided when forming searches, such as "the", "and", or "juggling".

Scope

Normally, all files in the entire JIS are searched for the given phrases, however it is possible to limit the scope of the search to several specific areas, if desired. Select the area you wish to search. Note that it is generally nearly as fast to search everything as it is to search a small section, although it can be much slower in some cases. If you are looking for something specific in a known section, this can make it easier to locate what you want.

Multiple Phrases

More than one search phrase may be entered by separating phrases with commas. Normally this will cause the search to look for all lines containing all of the given phrases. For example, the search query "Rastelli,Truzzi" will find all lines containing both "Rastelli" and "Truzzi", in any order.

Setting the option to find all "files with all" of the phrases changes the search to look for lines containing any of the search phrases, but only in files that contain all of them. Thus this query would find all lines containing "Rastelli" or "Truzzi", but only in files that contain both words. This can find many more matches, at almost no additional cost in search time.

Setting the option to find all "lines with any" of the phrases changes the search to look for lines containing any one of the search phrases, which normally will result in many more matches than either of the above two options, and will be much slower.

Case Matching

Normally alphabetic charaters in the search phrases are matched regardless of case. Thus "a" and "A" are equivalent, and each would match either character in a file. Setting the option "case sensitive" would limit these characters in a search phrase so that "a" would not match "A". Case sensitive searches will be somewhat faster, but will find fewer matches. The word "Rastelli" would match "rastelli" or "RASTELLI" only if the search is done case insensitive.

Word Matching

A word in a search phrase will normally match a part of a word, as well as the whole word. The word "ring" would thus match the words "syringe" and "herringbone". Setting this option to "only whole words" will limit searches to find only words that completely match the words in a phrase. This can make the search faster, but will reduce the number of matches found.

It is useful to use partial word matching for something like "possibilit" when it is desired to match a root word with various suffixes or prefixes. Whole word searching is handy to limit the search to the specific word you want.

Misspellings

Normally, words in a search phrase must match words in the text exactly. The integers 1 through 4 specify the maximum number of errors permitted in finding approximate matches. Each insertion, deletion, or substitution counts as one error. This is useful if you are not sure of the spelling of a word you are searching for, or if you wish to discover references to your search phrases that may appear misspelled in the archive.

Thus a search for "Rasteli" with misspellings set to 1 will also find all references to "Rastelli". A search for "diabolo" with misspellings set to 1 will find references to the common error of "diablo" as well. A search for "Karamazov" with misspellings set to 2 will find most variant spellings of this name.

This feature can slow down searches considerably, but is very effective when used as described above. Do not use this on short words, or it will take very long and generate a large number of meaningless false matches. Usually the error limit should be kept to 1 or 2, except for some very long words.

When "the best matches" is specified and no exact matches are found, the search will continue until the closest matches, with the minimum number of errors, are found. In general, this may be slower than specifying an error count, but not by very much. However, this sometimes misses matches. It is safer to specify the number of errors explicitly.

Maximum Lines

There is an upper limit of 2000 returned matched lines. This is also the default value, but setting it higher will have no effect. Setting it lower may speed up your search, by only returning the specified number of matches.

Maximum Files

Normally only the first 500 files with matches are shown. This may be set higher or lower, which can affect search time, but it will always be subject to the above maximum line limit.

Maximum Lines Per File

Normally only the first 10 lines matched in any file are displayed. This may be set higher or lower, which can affect search time. If set to 0, no matched lines will be shown, and only the file names containing matches will be presented.

Acknowledgements

Many thanks to Udi Manber, Sun Wu, Burra Gopal, and Paul Klark at the University of Arizona, for developing the Glimpse search tools upon which this is based.
Making Effective Searches / Juggling Information Service / jis@juggling.org
© 1996 Juggling Information Service. All Rights Reserved.