GreekChat.com Forums

GreekChat.com Forums (https://greekchat.com/gcforums/index.php)
-   Greek Life (https://greekchat.com/gcforums/forumdisplay.php?f=24)
-   -   (Resolved) GreekChat Recent Outages & Server Issues (https://greekchat.com/gcforums/showthread.php?t=240332)

John 02-20-2018 07:32 PM

All fixed up again. Things might be a little slow for a while, though, as I backup and download various files from the server.

carnation 02-20-2018 08:03 PM

John, thank you for doing all this work! I must admit that I don't know what all a forum entails.

John 02-20-2018 08:50 PM

Quote:

Originally Posted by carnation (Post 2454070)
John, thank you for doing all this work! I must admit that I don't know what all a forum entails.

It's not so much a forum issue as it is a server hardware issue. More specifically, database corruption issues being caused by sporadic power cycling / reboots of the server.

I'm not yet sure what's causing that to happen. Maybe I'll get lucky and it will be something on the datacenter's side of things, such as a faulty power distribution unit or similar. Otherwise, if it's the server then it could be the server's power supply unit causing reboots when it hits certain limits or maybe capacitors on the motherboard starting to go bad and causing reboots in specific circumstances. Might even be a failed/failing RAM module.

At this point my plan of action is to determine if it's on the server side, or datacenter, then proceed from there.

If it's definitely the server going haywire I'll most likely look into renting a server elsewhere and move things there for a while, rather than building a new server for GC. I'll likely build a new server for GC again at some point, but now is not the time for that.

GC's current server has been in 24/7 operation since late 2013, so we're getting close to 5 years on this hardware.

NinjaPoodle 02-20-2018 10:35 PM

Many Thanks John!

John 02-24-2018 08:06 PM

Following are details from the server log files showing what's been going on:

Code:


root    pts/1        pool-###-##-##-# Sat Feb 24 13:18:40 2018  still logged in
root    pts/0        pool-###-##-##-# Sat Feb 24 13:16:20 2018  still logged in
root    pts/1        pool-###-##-##-# Fri Feb 23 20:49:48 2018 - Sat Feb 24 03:37:44 2018  (06:47)
root    pts/0        pool-###-##-##-# Fri Feb 23 18:58:05 2018 - Fri Feb 23 20:50:05 2018  (01:52)
root    pts/1        pool-###-##-##-# Fri Feb 23 14:42:58 2018 - Fri Feb 23 18:24:10 2018  (03:41)
root    pts/0        pool-###-##-##-# Fri Feb 23 14:41:41 2018 - Fri Feb 23 18:22:58 2018  (03:41)
root    pts/1        pool-###-##-##-# Thu Feb 22 23:54:23 2018 - Fri Feb 23 00:52:59 2018  (00:58)
root    pts/0        pool-###-##-##-# Thu Feb 22 23:52:58 2018 - Fri Feb 23 00:52:45 2018  (00:59)
root    pts/0        pool-###-##-##-# Wed Feb 21 19:22:58 2018 - Wed Feb 21 19:23:10 2018  (00:00)
root    pts/0        pool-###-##-##-# Wed Feb 21 12:23:52 2018 - Wed Feb 21 12:23:59 2018  (00:00)
root    pts/1        pool-###-##-##-# Tue Feb 20 17:06:43 2018 - Wed Feb 21 01:21:08 2018  (08:14)
root    pts/0        pool-###-##-##-# Tue Feb 20 16:43:54 2018 - Wed Feb 21 01:23:16 2018  (08:39)
runlevel (to lvl 3)  2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
reboot  system boot  2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)

root    pts/1        ##-###-###-###.d Sat Feb 17 23:24:29 2018 - Sun Feb 18 00:48:24 2018  (01:23)
root    pts/0        ##-###-###-###.d Sat Feb 17 23:21:06 2018 - Sun Feb 18 00:48:03 2018  (01:26)
root    pts/0        ##-###-###-###.d Sat Feb 17 22:48:24 2018 - Sat Feb 17 23:19:25 2018  (00:31)
runlevel (to lvl 3)  2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Mon Feb 19 10:15:08 2018 (2+08:03)
reboot  system boot  2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Sat Feb 24 13:27:03 2018 (7+11:15)

runlevel (to lvl 3)  2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 17 02:11:54 2018  (04:03)
reboot  system boot  2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 24 13:27:03 2018 (7+15:19)

root    pts/1        pool-###-##-##-# Wed Feb 14 18:04:21 2018 - Wed Feb 14 18:15:28 2018  (00:11)
root    pts/0        pool-###-##-##-# Wed Feb 14 17:45:05 2018 - Wed Feb 14 18:15:34 2018  (00:30)
root    pts/0        pool-###-##-##-# Thu Feb  8 05:26:42 2018 - Thu Feb  8 05:51:09 2018  (00:24)
runlevel (to lvl 3)  2.6.32-696.20.1. Thu Feb  8 05:23:45 2018 - Fri Feb 16 22:07:56 2018 (8+16:44)
reboot  system boot  2.6.32-696.20.1. Thu Feb  8 05:23:45 2018 - Sat Feb 24 13:27:03 2018 (16+08:03)
shutdown system down  2.6.32-696.16.1. Thu Feb  8 05:22:31 2018 - Thu Feb  8 05:23:45 2018  (00:01)
runlevel (to lvl 6)  2.6.32-696.16.1. Thu Feb  8 05:22:19 2018 - Thu Feb  8 05:22:31 2018  (00:00)

root    pts/0        pool-###-##-##-# Thu Feb  8 03:45:56 2018 - Thu Feb  8 05:09:05 2018  (01:23)
root    pts/5        pool-###-##-##-# Thu Feb  8 02:05:21 2018 - Thu Feb  8 05:05:51 2018  (03:00)
root    pts/4        pool-###-##-##-# Thu Feb  8 02:04:40 2018 - Thu Feb  8 03:36:24 2018  (01:31)
root    pts/3        pool-###-##-##-# Thu Feb  8 02:03:54 2018 - down                      (03:18)
root    pts/2        pool-###-##-##-# Thu Feb  8 01:14:14 2018 - Thu Feb  8 02:32:39 2018  (01:18)
root    pts/1        pool-###-##-##-# Thu Feb  8 01:10:49 2018 - Thu Feb  8 02:33:22 2018  (01:22)
root    pts/0        pool-###-##-##-# Thu Feb  8 00:10:15 2018 - Thu Feb  8 02:09:24 2018  (01:59)
root    pts/0        pool-###-##-##-# Wed Feb  7 22:40:47 2018 - Wed Feb  7 23:02:51 2018  (00:22)
runlevel (to lvl 3)  2.6.32-696.16.1. Wed Feb  7 03:24:10 2018 - Thu Feb  8 05:22:19 2018 (1+01:58)
reboot  system boot  2.6.32-696.16.1. Wed Feb  7 03:24:10 2018 - Thu Feb  8 05:22:19 2018 (1+01:58)

runlevel (to lvl 3)  2.6.32-696.16.1. Tue Feb  6 21:20:12 2018 - Wed Feb  7 03:24:10 2018  (06:03)
reboot  system boot  2.6.32-696.16.1. Tue Feb  6 21:20:12 2018 - Thu Feb  8 05:22:19 2018 (1+08:02)

root    pts/0        pool-###-##-##-# Sat Feb  3 01:21:42 2018 - Sat Feb  3 01:22:13 2018  (00:00)
runlevel (to lvl 3)  2.6.32-696.16.1. Thu Feb  1 20:16:02 2018 - Tue Feb  6 21:20:12 2018 (5+01:04)
reboot  system boot  2.6.32-696.16.1. Thu Feb  1 20:16:02 2018 - Thu Feb  8 05:22:19 2018 (6+09:06)

root    pts/1        pool-###-##-##-# Thu Dec 28 23:43:22 2017 - Fri Dec 29 01:02:03 2017  (01:18)
root    pts/0        pool-###-##-##-# Thu Dec 28 23:36:30 2017 - Fri Dec 29 01:01:55 2017  (01:25)
root    pts/0        pool-###-##-##-# Thu Dec 28 22:24:56 2017 - Thu Dec 28 23:36:17 2017  (01:11)
runlevel (to lvl 3)  2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb  1 20:16:02 2018 (34+21:59)
reboot  system boot  2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb  8 05:22:19 2018 (41+07:05)
shutdown system down  2.6.32-696.1.1.e Thu Dec 28 22:15:39 2017 - Thu Dec 28 22:16:53 2017  (00:01)
runlevel (to lvl 6)  2.6.32-696.1.1.e Thu Dec 28 22:15:18 2017 - Thu Dec 28 22:15:39 2017  (00:00)

root    pts/3        pool-###-##-##-# Thu Dec 28 19:50:37 2017 - down                      (02:24)
root    pts/2        pool-###-##-##-# Thu Dec 28 15:59:54 2017 - Thu Dec 28 22:15:10 2017  (06:15)
root    pts/1        pool-###-##-##-# Thu Dec 28 15:49:58 2017 - Thu Dec 28 22:05:46 2017  (06:15)
root    pts/0        pool-###-##-##-# Thu Dec 28 15:30:31 2017 - Thu Dec 28 22:05:52 2017  (06:35)
runlevel (to lvl 3)  2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
reboot  system boot  2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)

root    pts/0        pool-###-##-##-# Mon Dec 18 20:41:14 2017 - Mon Dec 18 20:45:12 2017  (00:03)
root    pts/0        pool-###-##-##-# Mon Dec 18 19:46:20 2017 - Mon Dec 18 20:27:35 2017  (00:41)
root    pts/0        pool-###-##-##-# Tue Dec 12 20:46:45 2017 - Tue Dec 12 21:01:20 2017  (00:14)
root    pts/0        pool-###-##-##-# Tue Dec 12 16:34:05 2017 - Tue Dec 12 16:43:43 2017  (00:09)
root    pts/2        pool-###-##-##-# Mon Dec 11 14:53:00 2017 - Mon Dec 11 20:18:45 2017  (05:25)
root    pts/1        pool-###-##-##-# Mon Dec 11 14:52:09 2017 - Mon Dec 11 15:32:06 2017  (00:39)
root    pts/0        pool-###-##-##-# Mon Dec 11 14:50:27 2017 - Mon Dec 11 20:18:29 2017  (05:28)

Seven server hard reboots (power disruptions?) on Dec 27, Feb 1, Feb 6, Feb 7, Feb 16, Feb 17 & Feb 19.

In the quote above, the problematic reboots are in red. The reboots highlighted in blue are what they are supposed to look like. Reboots are supposed to be preceded by shutdown messages, showing a clean/graceful reboot of the server.

With the preceding shutdown log messages missing, that pretty much means the server either crashed or had a power disruption that caused the server to immediately power off and reboot.

Now for the possibly good news: The datacenter staff where I colocate the server checked on the power distribution unit after I requested, below is one of the replies I received about it:

Quote:

The failure indicators on the PDU that I'm seeing are, one of the banks has failed completely and is usable, the display on the unit is reading error rather displaying it's current power usage. These are typically signs that the PDU heading towards complete failure.

The unit does have a management port but I'm unsure if it actually logs issues these types of issues, I'm also hesitant to console the unit or attempt to reset it as I have witnessed that cause complete failure once they start going bad.
Quote:

In regards to the failing PDU, I will contact management about replacing it since we would need to schedule a maintenance widow with multiple clients in order to swap it out.
With that, I think it's very likely the reboots were caused by the failing power distribution unit. We should be in the clear once the datacenter PDU is replaced but I'll do some additional hardware diagnostic testing afterwards to make sure.

NinjaPoodle 02-24-2018 10:59 PM

And I say again, Thank you John for keeping the ship sailing.

FSUZeta 02-25-2018 07:27 AM

Ditto! Many thanks to you John.

PGD-GRAD 02-25-2018 10:55 AM

THANK YOU for your dedication to the good in Greek Life and your work to allow us to voice our opinions and—hopefully—offer some positive information and assistance to those who are seeking to join our ranks.

naraht 02-26-2018 10:47 AM

Thank you very muchly!

John 02-26-2018 09:00 PM

Happened again...

Code:

runlevel (to lvl 3)  2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018  (11:41)
reboot  system boot  2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018  (11:41)

Taking GC offline for maybe 30 minutes or so to check on things.

John 02-26-2018 09:43 PM

Whenever the spontaneous reboots occur it usually causes some crashed tables and other tables not being closed properly. Fortunately, as far as I'm aware, so far none of this has caused any major database problems.

Anyhow, everything should be back in order and functioning properly again.

Sciencewoman 02-26-2018 11:27 PM

Thanks for everything, John!

John 02-27-2018 09:19 PM

Had another spontaneous server hard reboot today...

John 03-02-2018 03:24 AM

2 spontaneous reboots on March 1st. One around 9:30 AM and the other around 10:00 PM. That's what took GC offline for a while last night. Some database issues just slow things down a bunch but others take the site completely offline.

Just finished fixing things up.

With the reboots happening more often, I might move GC to a different server temporarily until things with the current server or datacenter equipment are sorted out.

AOIIalum 03-02-2018 07:33 PM

Thanks for keeping us updated, John. Gotta love hardware!


All times are GMT -4. The time now is 09:03 AM.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.