GRUB Client Version 0.3.0 - Updated March 22, 2002
====================================================

Concept
=======
This software crawls websites and, optionally, your own website for 
content.  When it finds something other than what was found last time 
(via CRCs) it compresses the contents and sends the data back to a 
centralized server located at grub.org.  Grub takes that data and 
rebroadcasts it to the Internet via multicasting or other methods,
like FreeNet or plain old FTP.  This process ultimately allows the 
changes in website content to be broadcast to the entire Internet, 
search engines included, all in real time, and being bandwidth 
friendly in the process.

In addition to providing feeds to companies, we are planning on 
broadcasting the changed pages to the net for general public to 
see and use.  More on this concept will be forthcoming.

Notice
======
The main executable has been renamed to "grubclient" out of respect for 
the GNU Grub bootloader, who's executable is named "grub".  They were out 
first, so we decided to pick another name.  If you have a catchy suggestion 
for a new name, please let us know. 

This directory and its subdirectories contains the sources necessary to 
compile the client side of Grub.  To get better acquainted with Grub, look 
at our website at http://www.grub.org.  The server side can be downloaded 
from SourceForge.
 
This build has been compiled and run on:

RedHat 7.2 2.4.7-10 by Martin
RedHat 7.1 - 2.4.2-2, 2.4.10 by Kord Campbell
RedHat 6.2 - by Scripty
Slackware 8.0 by Steve Breen
SuSE 7.3 by Dr. Riede
Debian 3.0 2.4.16-17 by Tobias and Dirk
Cygwin for Windows by Kord and Ozra

Note:  There are known problems with running the client on BSD systems now
       because of the way we are handling threads.  We need someone to fix 
       this if possible - any takers?

If you compile it on something else and it works, send us your O/S and version
info and we'll put it in the list, and give you credit for it.  You can get the
version info by typing "uname -a" at the prompt.

We are currently working on portting the sofware to Windows, and should be
ready to start beta testing it soon.  A huge portion of the pre-port process 
has been done, so the port shouldn't take us very long to complete.  Please 
hang tight - we know a Windows release is important to a lot of you.

Read the ChangeLog file for changes in the releases.

The TODO file has been created.  If you have any suggestions to add to the mix,
please let us know.

To Extract
==========
To extract the files, type:

tar -xvfz grub-client-0.2.4.tar.gz

To Compile
==========
To compile the new Grub client you will need to have the cURL, curses and
Metakit libraries installed on your system.  The development version of 
cURL is necessary if you are going to compile Grub, which you are getting 
ready to do.  All of these packages are located on our ftp site for your 
convenience.  Most newer distributions of Linux or BSD will have curses 
already, but most do not have cURL or Metakit installed.  We suggest that 
you do NOT run any beta versions of cURL, as they have not been tested 
fully with the grub client.

This version of the client has been tested with metakit-2.4.3 and curl-7.9.5 

If you download the cURL source, or the Metakit source, you are left to 
compiling and installing them on your own - sorry.

If you downloaded the cURL RPMs then simply type:

rpm -i curl-7.9.3-1.i386.rpm
rpm -i curl-devel-7.9.3-1.i386.rpm

Note:  This assumes that you are using version 7.9.4 of cURL, of course. 

Remember, you'll need both the libraries and the developers RPM to be able to 
compile Grub correctly.  If you are just running the executable, all you need
are the libraries.  After you get done installing the RPMs, you will need to 
tell ldconfig where the cURL libraries are.  Do this by editing the file 
"/etc/ld.so.conf" and adding the following to the bottom of the file:

Add this to ld.so.conf --->       /usr/local/include/curl
                                  /usr/local/lib

Run ldconfig after saving:

/sbin/ldconfig 

Your ldconfig may be kept elsewhere than /sbin, so look around for it by doing
a "locate ldconfig".  If that doesn't work, read the cURL docs agian, and then 
email us. 

Note:  We installed the client on FreeBSD and it was a little squirrely to get
it (configure) to see the cURL libraries.  If someone gets this to work and 
has a patch, please email us.  (we realilze that the threads are still broken
in BSD as well.)

Now, on to the install.  Change to the working directory:

cd grub-client-0.2.4

Run the configuration script:

./configure

After that type:

make

For the next step, you need to be root, then type:

make install

This will result with the executable being placed in /usr/local/bin, the configuration 
file named grub.conf being placed in /usr/local/etc, and the logs and other execution 
related files in /usr/local/var/grub.   You can also do a "./configure --prefix=/some/dir"
if you want to install the client somewhere else, or don't have root access.

To Configure
============
If you did a 'make install' as root, and used the default install directory, you will
need to change the ownership of the grub directory in /usr/local/var.  Change the perms
to be owned by whatever user is going to be running the client.  To change perms with 
the username as grubuser:

	cd /usr/local/var
	chown grubuser grub

Now go and edit the file /usr/local/etc/grub.conf and change the client ID number to
the that was assigned to you when you signed up via our sign-up form.  You did sign up 
on our site, right?  It's http://www.grub.org/signup.php if you didn't.

You can also edit the number of crawlers that the client runs at a time.  Crawlers are
forked off once when a batch is starts.  Once the run is complete, the crawlers will die 
off one by one, the client will send the results back to the server, and then the process 
will repeat itself.  Specifying a high number of crawlers will cause your machine to crawl 
more sites per time unit, and thus use more bandwidth.  We suggest using between 10-20 
crawlers, for your crawls.

The amount of bandwidth should be set to the upper limit of what you want the client
to be able to access.  The bandwidth setting should be set in bits per second.  For
instance, if you had a 256Kbps connection, you might want to give the client 64000
bits per second, or 1/4 of your total bandwidth.  If you have a T1 or greater, then
just leave it at the defaults or pump it up if you like.   Client to server (grub
server that is) bandwidth is NOT limited at this time, so please be aware that this
may cause you problems when it uploads data to our server.  Bandwidth limiting on
the return is TOP PRIORITY for the next incremental release.

Some of the settings in your conf file are subject to change based on what our config
server sends to you when the client first starts up.  This process is detailed further
in the grub.conf file.  Suffice to say that our settings will NEVER be greater than
what you define in the configure file, and are only used if we really need the network
to stop crawling so much or bandwidth limit to a certain amount.  

To Run
======
Grub has four modes that it can be run in:
	
	"grubclient"		- GUI Information Mode
	"grubclient -v"		- Semi Verbose Mode (only errors and crawls shown)
	"grubclient -vv"	- Full Verbose Mode (errors, crawls and pickups shown) 
	"grubclient -q"		- Completely Quiet (good for running in background)
	
If you installed as root, grub should be in /usr/local/bin, which should be in your 
path. In that case the client can most likely be run by typing "grubclient" at any
place in your directory structure.  

GUI Mode
========
To start the client in GUI mode, just type "grubclient" anywhere in your path.
The GUI will fork off and take control of the console, and the rest of the program
will run in quiet mode, so as not to screw the GUI display up.  The GUI communicates
with the client through a shared structure, so it is possible to write your own display
method, if you so choose.  The following info is currently available through the GUI:

	Set Completion %  - bar graph of completion of current crawl

	Connection Status - shows what the crawler is doing - may lag behind 
                            a bit at times, due to the animation sequences.

	Client Status     - shows uptime, max crawlers, number of crawlers
                            currently crawling (engaged), bandwidth limit,
                            current bandwidth usage, and the host protect
                            indicator, which shows whether or not the client 
                            is being "nice" to remote hosts, by only crawling
                            a few of their pages at a time.
	
	URL Stats         - stats on the URLs that have been crawled.

	Client Stats      - pulled from the server via http, a view of the stats 
                            that are being kept about your client.  Shows number
                            crawled (which is locally incremented for effect), 
                            number crawled in the last day, average per hour for 
                            the last day, overall ranking and daily ranking.

Verbose Modes
=============
If you run the client in either verbose mode, "grubclient -v" or "grubclient -vv",
you will recieve a line-by-line blow of what the crawler is doing.  The "-vv" option
shows both the crawled URLs and the URLs being picked up/submitted by the coordinator.

If you want verbose mode output, but still want the GUI up, you can tail the logfile
to see a similar output as that of the verbose modes:

	tail -f grubclient.log

The logfile is normally kept in the /usr/local/grub/var directory.

Quiet Mode
==========
If you want grub to run in the background, just type "grub -q &".  If you want to 
run grub only at certain times, you can create a cron job:

0 23 * * * /usr/local/bin/grubclient -c 1>/dev/null 2>/dev/null
0 6 * * * /usr/local/bin/killall -USR1 grubclient 1>/dev/null 2>/dev/null

Put the above in a file and this issue a 'crontab <filename>' to load it.

This cron job specifies starting the grub client at 11pm and then killing it at
6am.  "killall" will send a USR1 signal to grub, who should exit gracefully after 
finishing the crawl and cleaning up after itself.  Do NOT run grub from root's 
cronjob!  We haven't tested grub enough to declare it safe to run as root.

Grub can also be run with a URL or byte limit.  For example, if you want your
client to crawl 10,000 URLs per run, you would use "grubclient -q -n 10000".
If you wanted to limit it to 100MBytes of data crawled, you would enter the
following, "grubclient -q -N 100000000".  You can also use a combination of
the two, and run it in verbose or GUI modes as well.

Troubleshooting:
================
If you recieve an error about grub not being able to make a directory or permission
denied on a particular file, it is likely that the user under which you are running
the client does not have full permissions to the archive directory.  The directory
is, by default, located under /usr/local/var/grub.  To change the permissions on
the directory use:  chown -R <user> /usr/local/var/grub

If the client still does not run, you can safely delete anything in the archive 
directory.  You can do this by issuing a "rm -rf arch" in the /usr/local/var/grub 
directory.  Be careful witht "rm -rf" it can do some serious damage if missused.

File and Directory Structure
============================
/archive -- 	Provides temporary storage for the client in what we call the ClientDB(CDB).
		Version 0.3.0 of the client is implementing a new CDB.  The new CDB
		is written using api calls from metakit database software.  It 
		proves to be less buggy and much simpler to use and mantain.
		A big change from the previous CDB is that it can be used for
		storing binary data.

/com --     	Communications modules and classes which provide an easy communication 
	   	mechanism with the grub server. The communication is compressed to provide
	    	better bandwidth efficiency for the grub server.

/cstat --	The modules in cstat provide logging and verbose output capabilities.   	 
	
/crawler --	Module that controls the handing out of URLs and processes the results.

/protocol --	This directory contains all the protocol implementations.  We recommend 
		first looking through the protocol specifications located in the /docs 
		directory.

/src -- 	Contains the main function in the grub.cpp file. 

/util --	Utility classes such as CRC computation class and a linked list class
		are held in this directory.  The GUI interface is in here as well.

/docs -- 	Documents regarding the project, the protocol, or structure are held here.

/guiface --	Library to allow communication between the crawler and a GUI.


Only the most essential files are described above.  Browse through the
subdirectories to learn more. Some libraries that have been GPL'd and LGPL'd 
are in use in the code.  The biggest contributions have been cURL which we use 
to retrieve the web pages, and Metakit which is used to store the data from
the crawls. 

Developers Needed
=================
If you can code in C++, then get to work.  Grub is available via CVS on SourceForge.net.
You can contact us at support@grub.org.  Check out the TODO for a list of things that
we need done.

Copyright Notice
================
On terms and conditions for use of this software and source code read the LICENSE file.   
If we have failed to give you credit for code that you wrote, then give us a good blasting
via email, and we promise that we will add your name(s) to the credits.

Rock on, Open Source!

Team Grub
support@grub.org
Copyright, 2001, 2002

