First off I have never had any experience with linux before trying clx.
Well when when figured out how to get linux installed and running then
on to clx, Think it was 3.02 that we started out on. We got that going
took the machine up to the remote site at our internet connection.
Got it all setup there and found out that we had the echo with the
telnet connection. I was kinda burnt out on all the setup at the time
so we just made the link passive to limit the effect of the echo and
lived with it. Since the only cure for it then was to get jnos running
and I was not up to the task of another piece of software at the time.
So our uptime was anywhere from 1 to 5 days because of the echo while
receiving mail. Or at least that what I thought.
Sometime back we noticed that we had alot of the RPC errors along with
<async-timeout> errors, They had been there all along we had just
never looked at the file and I attributed them to the echo and
the fact that we were running in swap space also.
Ok, well I finally decided that I was going to get that echo problem
solved. Which we did as I showed in the earlier message. Ok we are
ready to roll now. After 2 day the same mail crash happens. Ok we will
add more memory and take care of running in swap space. Now we were
really confident, but that only lasted about 2 hours before the same
crashes again, errors still in log also. Ok, well it has got to be that
the databases must be messed up, destroyed the databases and reindexed.
But the errors were still in log.
So I ended up looking in the /proc files looking for anything that
might be showing errors there. In /proc/cpuinfo I noticed this
bogomips : 3.04 On the one here at the house I showed 39.73
Both are 486 DX/4 100Mhz the cluster machine is a AMD chip and I
think the one here is a TexasInst. I remembered seeing a how-to about
it, hmmm, I gather from it that I have not got the motherboard
configured correctly of that the turbo switch is off. But I knew for
a fact that it was not the turbo switch. I downloaded the kernel
from the cluster machine here and booted it up and it showed 3.09
here at the house also. So I uploaded the kernel from the house
crossed my fingers and rebooted Wow 49.66
It has been up now almost 24 hours and no errors in the log.
It really threw me as both were compiled with the same options
in the .config and had no idea that one kernel could run that
much slower than another. It was easy to see the difference when I
booted it here on the home machine. But since the other machine is
at a remote location I never see it boot. In fact it has not had a
monitor or keyboard hooked to it since we installed it.
Now granted the echo was a big problem which masked, at least in
my mind, the bigger problem which was the slow kernal. Now I can
tell you the first thing that I am going to look at when I boot
a new kernel up :-)
Sri if I took a little long to tell that.
-- Stan Rongey - KJ5SF e-mail: ztyuhe@citicorp.com