![]() |
Happened again 8pm tonight. All sorted out now.
Yesterday I set things up so I receive text and email notifications immediately after unexpected server reboots which will speed up how fast I can get any resulting issues corrected when/if it occurs again. Also yesterday I discovered that all these server hard reboots are causing plenty of other problems that I'll probably be needing to sort out sometime soon as well. I'll post details regarding these other issues either later tonight or tomorrow. |
You're the best John.
|
Another hard reboot around an hour ago... All fixed up again.
Quote:
|
What you do is totally voodoo to me.
I do appreciate it! |
Quote:
Quote:
The forum software we use here at GC, similar to most forum type software, uses the MySQL database software. MySQL, at least when this version of the forum software we are on was developed, defaulted to the MyISAM database storage engine. And it turns out that the MyISAM database storage engine is not particularly resilient to sudden power loss as has happened with GC's server quite a few times in the past month. Essentially, if the database server was in the process of saving any pertinent information when the power was disrupted, only part of the data may have saved and the other part lost/corrupted. Which may or may not cause corruption to various important data in the database. Up until March 1st this, as far as I can tell, wasn't a big issue since problems seemed to always impact non essential areas of the database. But, on March 1st the two reboots crashed the user database table. After checking with the forum software developer, this sort of crash (despite being "repaired" using MySQL's repair functions) may have corrupted some GCer account records which may then not be recoverable and for impacted accounts, they would need to start a new account. I'm definitely not okay with that, so will be doing everything I can to ensure GC data is minimally impacted once all the server issues are sorted out. Nobody has emailed me so far about problems accessing their GC account, so maybe no account corruptions so far. Also, I don't know for certain that the MySQL repair functions leave data without issues untouched. So maybe there is data corruption that is currently undetected. This is something that I'll be looking into. --- What I'll be doing: 1. Stabilizing the GC hosting environment. Currently I'm waiting for the datacenter to replace a faulty/failing power strip/distribution unit. After that I'll test the server hardware to determine if these problems are due to the server going bonkers or if it's the datacenter's PDU that caused the problems. 2. I've been researching what changes to make and I will either reinstall the current server or setup a new server in such a way where GC's database will be resilient (or at least significantly more resilient) to future power disruptions. 3. Possible data corruption. I'll try to determine if there is data corruption. If not, then we should be good from that point. However, if there is data corruption I might restore the last trusted database backup (which is from just before the first hard reboot back in December) and will merge all of the new stuff from then to current back into that known good copy of the database. What that will do is limit any potential resulting data corruption issues to only the past 3 months rather than the entire history of GC. Unsure about that part but it's something I'm considering. --- And one last piece of info in this extra long message: Code:
# ls -f | wc -l All those emails also aren't likely unique errors. There may just be a few dozen errors each repeated thousands of times each. If it becomes necessary for me to look through the errors I'll write a software program to sort through all that and return just one message for each unique error. --- That's it for now. Thanks for staying tuned in to GC! |
^^^ I'm very glad it makes sense to you. Still voodoo to me. I don't speak that language!
Thanks again for everything you do, it is appreciated. |
John, I do not understand even a little bit of what you said, but I am very thankful for all you do!!
|
Quote:
Then, while writing, the paper is abruptly yanked away. Now your message is only half written with part of it not legible and that's how it must remain. That's sort of what happens when there is a power outage with the web server. Anything in the process of being saved when the power is cut might end up a mess / corrupted and only partially saved to the database. Corrupted data could result in some things not working correctly on the website or maybe not at all. Although I'm not certain, so far it seems that we may be in the clear regarding any data corruption. |
^ I like that explanation, John.
I've worked with MySQL, and my personal preference for engine is InnoDB, not MyISAM. Mainly because InnoDB supports foreign keys and transactions. I'm guessing you don't have control over which engine is used. Do you have any tools available to you to analyze DB performance? (e.g. NewRelic) Thank you again for everything you do for us. |
Quote:
Once the power issues are sorted out hopefully there won't be any related problems again for a long time. But, if there are problems at least I'll know InnoDB may be able to handle it much better. In addition, I'm going to test ZFS with my setup and if it works out well I'll place the MySQL data folder on a zpool for the additional data integrity benefits. Quote:
Back when this version of the forum software was developed they decided to go with MyISAM. InnoDB was available then, but MyISAM was the default for MySQL at the time so maybe that's why they went with it. I recall back then that InnoDB wasn't necessarily recommended for vBulletin, unsure exactly why but one of the issues mentioned was relating to full text search being available in MyISAM but not in InnoDB (which I read this week that it does now have that feature). Apparently, though, full text in MyISAM only impacted the search engine of the forum software but instead of fixing the search to work with InnoDB they just went with MyISAM tables. Anyhow, there is a path to switching over to InnoDB which works with the current software that I'll be looking more into. (Although, I'm not planning to keep GC on this forum software much longer, but that is an entirely different topic that I'll be starting a new thread about soon.) Quote:
|
Quote:
We're still not in the clear yet with the server reboot problems, though. The datacenter is taking excessively long with replacing the power distribution unit. GC's server is still powered through that failing PDU but they did reduce the load on it by moving any servers off of it that they could. It was a month ago when I notified their staff of the issue and they were able to confirm the PDU is failing. Until they replace the PDU it causes uncertainty as to whether any of the reboot issues are due to GC's server hardware or if it's solely due to their PDU being on its way out. I have a temporary server that I'll be moving GC to soon. Then after the datacenter PDU is taken care of I'll get things set back up on the current web server again. |
On the blue header bar of each message above the join date, next to the "report spam " icon", is the computer icon, when you move the cursor over it, it shows the IP addy.
|
GC was offline for a while Saturday evening. This time the culprit was not a power disruption & reboot, although it was the indirect cause.
Each time the server reboots due to the power issues the server saves some messages into the system error log which is part of the BIOS. Not much space there for logs, so this one could only hold 512 messages. And, it turns out, that once that log fills up it will cause the system to wait at a specific startup screen simply to notify about the full system error log. I had to press the F1 key to get it going again. Not quite what I was expecting when I saw that the server was completely unresponsive. But glad it was a relatively easy fix. |
Thanks for the update!
|
All times are GMT -4. The time now is 09:23 AM. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.