My old stats machine
My old stats machine
I guess my old stats machine is about 5 years old now and has enjoyed its retirement from serving the stats for about a year now. During that time its been a dedicated boinc machine running both CPU and GPU apps under Fedora 9 linux.
Late last week it stared to lockup after 4 or 5 hours, sometimes wouldn't even boot.
I'd been thinking of upgrading its Fedora from v9 to something a bit more modern for quite a while but never got round to it. All this locking persuaded me to give it a go and maybe it would solve the locking.
On went Fedora 14 and the locks continued
hmmn :evil:
ok, I'll borrow some memory sticks from work and see if it's my memory sticks at fault.
Nope, it still locks up.
So yesterday I removed the CPU and heatsink, cleaned them both up and refitted them with new paste and so far so good, it's been up for almost 24hrs now :thumbup:
Strange thing is that the CPU temperature monitors weren't showing anything unusual and it didn't seem to make any difference if boinc was running or not.
Maybe CPU paste just gets old
Late last week it stared to lockup after 4 or 5 hours, sometimes wouldn't even boot.
I'd been thinking of upgrading its Fedora from v9 to something a bit more modern for quite a while but never got round to it. All this locking persuaded me to give it a go and maybe it would solve the locking.
On went Fedora 14 and the locks continued
hmmn :evil:
ok, I'll borrow some memory sticks from work and see if it's my memory sticks at fault.
Nope, it still locks up.
So yesterday I removed the CPU and heatsink, cleaned them both up and refitted them with new paste and so far so good, it's been up for almost 24hrs now :thumbup:
Strange thing is that the CPU temperature monitors weren't showing anything unusual and it didn't seem to make any difference if boinc was running or not.
Maybe CPU paste just gets old
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am
Very odd, but I'm glad that you've fixed the problem.
Out of curiousity, how do you monitor your temps John? I tend to use lm-sensors to monitor such things on my Ubuntu machine, but I've noticed that there are several different outputs for the CPU temps, and there is often quite a variation in the readings that they give, which is kind of weird. :shock:
James.
Out of curiousity, how do you monitor your temps John? I tend to use lm-sensors to monitor such things on my Ubuntu machine, but I've noticed that there are several different outputs for the CPU temps, and there is often quite a variation in the readings that they give, which is kind of weird. :shock:
James.
Yep, lm-sensors for me too.Joshrandom wrote:Out of curiousity, how do you monitor your temps John? I tend to use lm-sensors to monitor such things on my Ubuntu machine
I've not compared different sensor types or the same machine in different OSs but my 2 quads have always showed similar temps.
1 core has always run significantly hotter than the other 3 on both machines (like 10C hotter)
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am
On my quad, the variation between cores is rarely any more than around 6C (with the hottest of the cores reaching around 70C when crunching). However, on my lm-sensors, there are two check boxes for CPU temperature and then a further four boxes (one for each core), right now the cores are reading 62C 56C 58C and 60C respectively, while the two CPU temps are showing as 52C and 40C, but then I guess that I probably just set the app up wrong when I installed it. :lol:
When mine get up to 70C I give the fans a damned good cleanJoshrandom wrote:On my quad, the variation between cores is rarely any more than around 6C (with the hottest of the cores reaching around 70C when crunching).
Mine are at 55, 56, 52 & 63Cright now the cores are reading 62C 56C 58C and 60C respectively,
Intel Q6600 BTW with a Zalman heatpipe cooler and honking great fan
I've got 2 other sensors that report anything and they both show 35C, one of them is labelled MB temp.while the two CPU temps are showing as 52C and 40C, but then I guess that I probably just set the app up wrong when I installed it. :lol:
It's 20C in my computer room right now, which is fairly cool
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am
That was my thought back during the hot weather, cleaned out the fans etc, and even added a new 120mm chassis fan to blow cool air in, but the CPU temps on my Q6600 obstinately remained at 70C. The only thing that changed was that the CPU fan ran slower, only spinning up when the temps started to climb beyond 70C. :?Temujin wrote:When mine get up to 70C I give the fans a damned good clean
Hmm, it looks like I'm going to need to take another look at the airflow in my system. :(
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am
When you say it won't boot, do you mean that it's totally dead or just that it crashes while loading the OS?Temujin wrote:After being up for over 3 days, it locked up again this afternoon and now won't boot :evil:
Ho hum
If it's the latter, have you tried booting from cd?
I've had a couple of systems that failed at boot due to hardware issues, the first was caused by a failing PSU, the second by a failing HD5970, in both cases the only way that I could work out where the problem lay was to bite the bullet and start swapping suspect components between systems, a process that can be quite scary. :shock:
Whatever the problem, I'm sure that you'll track it down, good luck.
Good point there Mr Random.
It actually POSTs ok, then locks just after displaying "Verifying DMI Pool Data........"
Last thing I tried was swapping the boot order and then I got
Verifying DMI Pool Data........
Boot from cd......
I'm at home tomorrow afternoon so i'll work on it a bit then.
It could well be a faulty PSU or GPU, they'll be the things I swap out tomorrow having already tested the memory & CPU/heatsink last weekend.
The GPU is reasonably new (6 months or so) but I think the PSU may be the original Akasa I bought for it 5 years ago, so that'll be where I start.
It is nice and quiet in my computer room now though
It actually POSTs ok, then locks just after displaying "Verifying DMI Pool Data........"
Last thing I tried was swapping the boot order and then I got
Verifying DMI Pool Data........
Boot from cd......
I'm at home tomorrow afternoon so i'll work on it a bit then.
It could well be a faulty PSU or GPU, they'll be the things I swap out tomorrow having already tested the memory & CPU/heatsink last weekend.
The GPU is reasonably new (6 months or so) but I think the PSU may be the original Akasa I bought for it 5 years ago, so that'll be where I start.
It is nice and quiet in my computer room now though
Worth running ....
sfc /scannow
.... inside safe mode from a command window when you can get it that far in the boot sequence. The beast been running fine - as such - from what you have said here, then "randomly" locks up/crashes. That sounds like a software system fault slowly building up. SFC is a very good utility for checking system file integrity, and worth running periodically, let alone during present troubles.
Regards
Zy
sfc /scannow
.... inside safe mode from a command window when you can get it that far in the boot sequence. The beast been running fine - as such - from what you have said here, then "randomly" locks up/crashes. That sounds like a software system fault slowly building up. SFC is a very good utility for checking system file integrity, and worth running periodically, let alone during present troubles.
Regards
Zy
hmmn this is doing my head in
I've now replaced the motherboard with one borrowed from a guy at work.
Another fresh install of Fedora 14.
It is at least staying up but has highlighted a possible cause, or at least a "funny"
I have 3 computers in my computer room, 2x linux & 1x winxp
I only have 1 monitor, keyboard & mouse, so use a 4 port KVM switch to swap between them as required.
This new motherboard & fedora refuses to recognise the keyboard & mouse through the KVM, so i've plugged in an old imac usb keyboard & mouse directly into it and it then works fine.
The original keyboard & mouse still work fine with the other 2 machines through the KVM.
The KVM port I usually use for this machine has a usb connection for keyboard & mouse and I wondered if the cable was at fault. I changed to to the spare one which has the old fashioned PS/2 style plugs. The machine then refused to boot, just continually rebooted. Remove the PS/2 connectors and plug in the imac usb and it boots.
So it looks like there's something strange happening with the usb system somewhere.
But as it's still there with the new motherboard it looks like the problem is with the keyboard, mouse or cables
I've now replaced the motherboard with one borrowed from a guy at work.
Another fresh install of Fedora 14.
It is at least staying up but has highlighted a possible cause, or at least a "funny"
I have 3 computers in my computer room, 2x linux & 1x winxp
I only have 1 monitor, keyboard & mouse, so use a 4 port KVM switch to swap between them as required.
This new motherboard & fedora refuses to recognise the keyboard & mouse through the KVM, so i've plugged in an old imac usb keyboard & mouse directly into it and it then works fine.
The original keyboard & mouse still work fine with the other 2 machines through the KVM.
The KVM port I usually use for this machine has a usb connection for keyboard & mouse and I wondered if the cable was at fault. I changed to to the spare one which has the old fashioned PS/2 style plugs. The machine then refused to boot, just continually rebooted. Remove the PS/2 connectors and plug in the imac usb and it boots.
So it looks like there's something strange happening with the usb system somewhere.
But as it's still there with the new motherboard it looks like the problem is with the keyboard, mouse or cables
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am
Clearly you are a lot more capable with this sort of thing than I am John, but after reading your post there were a few things that occurred to me.
Oh, and what happens if you connect the keyboard and mouse from the KVM directly to the PC using the same ports that you use for the KVM leads?
How confident are you that this borrowed motherboard is in full working order, and that the problems it seems to have with the keyboard and mouse aren't just a red herring?Temujin wrote:I've now replaced the motherboard with one borrowed from a guy at work.
You say that the KVM switch works fine with the other 2 machines, but have you tried swapping the KVM connections from one of these working systems to the problem machine?Temujin wrote:I have 3 computers in my computer room, 2x linux & 1x winxp
I only have 1 monitor, keyboard & mouse, so use a 4 port KVM switch to swap between them as required.
This new motherboard & fedora refuses to recognise the keyboard & mouse through the KVM, so i've plugged in an old imac usb keyboard & mouse directly into it and it then works fine.
The original keyboard & mouse still work fine with the other 2 machines through the KVM.
Oh, and what happens if you connect the keyboard and mouse from the KVM directly to the PC using the same ports that you use for the KVM leads?
I once tried to add a USB card to one of my older systems, with the result that the PC would regularly crash or else fail to boot properly, when I removed the card the system became stable again. After thinking it through I came to the realisation that the new USB card was drawing more power than the PSU could comfortably supply, causing the crashing and other problems. With this in mind I have to say that the evidence still seems to suggest that your PSU might be behind all of your system's current issues, have you tried swapping it out yet? :?Temujin wrote:So it looks like there's something strange happening with the usb system somewhere.
But as it's still there with the new motherboard it looks like the problem is with the keyboard, mouse or cables
haha, not confident at all cos he didn't use it cos he couldn't get it to bootJoshrandom wrote:How confident are you that this borrowed motherboard is in full working order, and that the problems it seems to have with the keyboard and mouse aren't just a red herring?Temujin wrote:I've now replaced the motherboard with one borrowed from a guy at work.
But.. I reckon that's down to the PS/2 connections. As soon as I plug something into those the thing refuses to boot
that's next on the list, it's just that F1 qualifying and football have got in the way this afternoonYou say that the KVM switch works fine with the other 2 machines, but have you tried swapping the KVM connections from one of these working systems to the problem machine?
good ideaOh, and what happens if you connect the keyboard and mouse from the KVM directly to the PC using the same ports that you use for the KVM leads?
.....................
Yep, they work, nice one :thumbup:
Yep, it could well be down to the PSU and I still haven't got around to checking that.I once tried to add a USB card to one of my older systems, with the result that the PC would regularly crash or else fail to boot properly, when I removed the card the system became stable again. After thinking it through I came to the realisation that the new USB card was drawing more power than the PSU could comfortably supply, causing the crashing and other problems. With this in mind I have to say that the evidence still seems to suggest that your PSU might be behind all of your system's current issues, have you tried swapping it out yet? :?
I don't have a spare PSU so it'll mean shutting down another cruncher for the duration but I guess i'll just have to do it
good help there Josh, thanks :thumbup:
Right then, I've reverted back to my motherboard, using the one from work just complicated things.
I've swapped PSUs, so mis-behaving machine now has the OCZ 700w PSU from the other linux machine (which now has the possibly bad Akasa 850w)
Keyboard & mouse through the KVM still don't work but imac keyboard & mouse does.
I'll try get another KVM cable tomorrow.
Having thought about events over this last week, I've come to the conclusion that the locking was caused by a failing hard disk in its 2 disk striped raid. Any time the machine was up using the raidset it would lock. Since I split the raid and installed onto a single disk it hasn't locked, it's just been this weird KVM behaviour
I've swapped PSUs, so mis-behaving machine now has the OCZ 700w PSU from the other linux machine (which now has the possibly bad Akasa 850w)
Keyboard & mouse through the KVM still don't work but imac keyboard & mouse does.
I'll try get another KVM cable tomorrow.
Having thought about events over this last week, I've come to the conclusion that the locking was caused by a failing hard disk in its 2 disk striped raid. Any time the machine was up using the raidset it would lock. Since I split the raid and installed onto a single disk it hasn't locked, it's just been this weird KVM behaviour
-
- Posts: 5602
- Joined: Sat Jun 23, 2007 1:00 am