Attention:

To view or download from this site you must first register and then log in each time you visit.

In order to view full size images on this site you will need to have Adobe� Reader� 7 or later installed.

Please click the link below to download and install the latest version of Acrobat Reader.

Keyword Search

Search tips:

If you Type:	This is What Will Happen
apple banana	Find pages that contain at least one of these words
�apple banana�	Finds pages that contains this phrase
+apple +banana	Finds pages with both words
+apple pie	Finds pages with the word apple but the relevance score will be higher if it also contains pie
+apple -pie	Finds pages with the word apple but not the word pie
+apple ~pie	Finds pages with the word apple but will also find pages with both words. Pages that contain just the word apple will have a higher relevance rating than those that have both words.
+apple +(>pie <strudel)	Finds pages with the phrases apple pie and apple strudel but the relevance rating for apple pie will be higher� than for apple strudel
Apple*	Finds pages with apple, apples, applesauce and applet

The boolean full-text search capability supports the following operators:

": The phrase, that is enclosed in double quotes ", matches only pages that contain this phrase literally, as it was typed.
+: A leading plus sign indicates that this word must be present in every page returned.
-: A leading minus sign indicates that this word must not be present in any page returned.

< >: These two operators are used to change a word's contribution to the relevance value that is assigned to a page. The < operator decreases the contribution and the > operator increases it. See the example below.
( ): Parentheses are used to group words into subexpressions.
~: A leading tilde acts as a negation operator, causing the word's contribution to the page relevance to be negative. It's useful for marking noise words. A page that contains such a word will be rated lower than others, but will not be excluded altogether, as it would be with the - operator.
*: An asterisk is the truncation operator. Unlike the other operators, it should be appended to the word, not prepended.

Optical Character Recognition Basics

When the microfilm was scanned we obtained a digital image. The image can be manipulated as a whole but its text cannot be manipulated separately. In order to do so, we need to "tell" the computer to recognise the text as such and to let us manipulate it as if it was text in a word document. The Optical Character Recognition (OCR) software used does that: it recognises the characters and makes the text searchable. The prime measure of OCR performance, and its limitation, is accuracy. Character accuracy, the most important aspect of text recognition, varies widely based on the quality and nature of the image (particularly the type and size of the fonts used and in the complexity of the page layouts). Generally the better the image's quality is, the higher the accuracy. The accuracy is usually measured for each page during the OCR process as a percentage. 90% accuracy would imply ten errors out of a 100 characters. Due to the poor quality of many of the newspaper pages that had been microfilmed and subsequently scanned, OCR accuracy ranged from about 60% up to about 85%. In order to obtain higher accuracy it would have been necessary to "correct" the OCR results. That means that after the usual OCR, which is done by software, the output would be proofread and corrected by humans. To do this was outside the budget constraints of this project.

The overall result is therefore, text searchable pages but with less than 100% accuracy.