Saturday, October 18, 2008

Searching web by indexing server

Microsoft Index Server is a useful service for automatically indexing and searching static files located on local hard drives. Index Server indexes Office documents, HTML files, and more. The service can be used either locally to find local files over the intranet, or on the web to search over a website. The Index Server service has been around at least since Windows NT 4 and has been simplified in Windows 2000 Server.

Setting Indexed Directories (Catalogs)

With Windows NT 4, you had to manually add the directories to index, then restart the service to begin indexing the data. With Windows 2000, once you have turned on Index Server (use the "Services" app under "Administrative Tools" to start it and then select "Automatic" as the startup type), your directories on the local hard drive are already setup.


Index Server on Windows 2000 comes with two preconfigured catalogs. One is "System" and the other is "Web." From these names I am sure you can guess what each is for. The "System" catalog indexes the directories "C:\" and "C:\Documents and Settings\," and excludes the local user temp and cache folders. The "Web" folder indexes the web root, the IIS help files, the IIS admin files, the sample files, and the database and printer folders.

If the directory you want to search is not in any of the catalogs, or is on another drive (ex: local drive on the intranet), you can easily add it to the index. In "Computer Management" under "Administrative Tools," you can select "Indexing Service" under "Services and Applications" on the tree. By expanding "Indexing Service" or looking in the right pane, you can see the catalog names. You can pull up these catalogs to see the directories they index. To add your directory, you can either place them in an existing catalog (where appropriate) or create your own.




To add a directory in an existing catalog, expand the catalog name and choose "Directories". In the right pane, right click in empty space and click choose "New>Directory". Enter or browse for a path, provide user authentication details if needed, and choose to index on the window that appears.
To create a new catalog, select "Indexing Service" in the left pane. In the right pane, right click in an empty are an choose "New>Catalog". On the window that comes up, name your catalog and choose a directory to save your index. A folder will be created there called "catalog.wci" and it will contain all of your index files. Follow the steps above for adding directories to existing catalogs to add and exclude the directories you want to index.
Once your catalogs and indexes are setup, it may take a little time for index server to actually index the data. Just wait a while, possibly leave your computer idle for half an hour, and when you come back your index should be complete (depending on how big it is).

Searching Over the Data

Now that you have these indexes, you need some way of searching over them. In "Computer Management," a choice under your catalog name will allow you to search that catalog, Windows 2000's "Find Files or Folders" will utilize your indexes, or you can whip up your own searching app using standard database methods. In our example we will use ASP 3.0 to create our own simple searching app.
Index Server uses an ADO compatible engine called MSIDXS to perform the searches, using standard SQL statements. The code below shows creating an ADO object to the MSIDXS engine, executing a SQL statement, and looping through the results.
Dim objConn, objRS, sqlString, strDocTitlesqlString = "SELECT DocTitle FROM CatalogName WHERE CONTAINS('cats or dogs')"Set objConn = Server.CreateObject("ADODB.Connection")objConn.Open "provider=MSIDXS"Set objRS = Server.CreateObject("ADODB.Recordset")objRS.Open sqlString, objConnDo While Not objRS.EOF strDocTitle = objRS("DocTitle") Response.Write(strDocTitle)LoopobjRS.CloseSet objRS = NothingSet objConn = Nothing
There are many options to use in your SQL statements. They include the fields to retrieve, the way to choose indexes/catalogs, the ordering, and the search itself. The table below lists several useful fields to SELECT, ORDER BY, or match in WHERE.



Path
The file path
DocTitle
The title of the file
DocAuthor
The author of the file
DocAppName
The application associated with the file
HitCount
The number of times the WHERE terms are matched
Rank
The rank of the file in the search results
DocKeywords
Keywords in the file
DocSubject
The file's subject
FileName
The name of the file
ShortFileName
The shortened file name
Size
The file size



For the FROM clause, you can either choose an indexed directory or an entire catalog. For a catalog, simply use the catalog's name, like in the example SQL statement above. For a directory, use SCOPE('"c:\directoryname\"'). The SQL statement below uses the SCOPE.
"SELECT DocTitle FROM SCOPE('"c:\inetpub\wwwroot\site" OR "c:\inetpub\wwwroot\site2\"')"
For the WHERE clause, you can either use CONTAINS, like in an example above, or you can do more precise matching, by using field names and string operators such as equal, not equal, and LIKE. The example SQL statements below show this.
"SELECT Path FROM Web WHERE (DocTitle LIKE '%Index Server%')"

"SELECT Path FROM Web WHERE (Size < 5000)" Finally, the ORDER BY clause, which allows you to sort your results. If you are doing a search, you commonly would use Rank or HitCount, but you could use any of the common fields above or any of the other fields. You can also append DESC or ASC to the end for descending or ascending respectively. The SQL statements below show use of the ORDER BY clause. "SELECT Path FROM Web WHERE CONTAINS('c++') ORDER BY HitCount DESC" "SELECT Path FROM Web WHERE CONTAINS('c++') ORDER BY HitCount ASC" Now that you are familiar with the way to communicate with Index Server, you can implement it on your own intranet, or on your website to offer search capabilities to your users.

Special Notes:

Whether Index Server is enabled or not, all unpatched IIS machines are vulnerable to the buffer overflow in an Index Server file. Make sure to keep on top of patches at http://www.windowsupdate.com

No comments: