We currently have 15 hosts offline, most of which have been offline the entire
weekend. This is simply disastrous for our gaming communities!
There hasn't been a single response in this thread from anyone from Mammoth
in close to two months now, which makes me really really worried. Frankly I
see no other alternative than to start looking for other options, and will do so
starting this following week. Gamecreate is just too unstable for us to even
remotely rely on, and with no fix on the horizon I see no other option than to
drop it. We've been with you guys for about three years now, I think, and
it makes me sad to see that the wonderful solution that GC is has deteriorated
into an unusable buggy mess.
I am willing to do anything it takes to help you guys out with fixing these problems,
but in order to do so you really have to answer my emails or this thread. Just
tell me if there's anything you'd like me to try, and I'll try it. Please, help us
help you - this buggy platform is really horrible advertisement for your
colocation option.
Just to reiterate: The current problem which is killing us is that the GC client
connects, but receives no pongs. It tries three times, then gives up and
reconnects:
Code:
LOG: Sun Nov 09 14:15:58 2008: Starting GameCreate Client 5.2.4
LOG: Sun Nov 09 14:15:58 2008: Hostname: gabriella. IP: XXX.XXX.XXX.XXX.
LOG: Sun Nov 09 14:15:58 2008: Master: eu1.master.gamecreate.com,eu2.master.gamecreate.com
LOG: Sun Nov 09 14:15:58 2008: Starting FTP server thread
LOG: Sun Nov 09 14:15:58 2008: SSL pipe not connected, attempting to connect
LOG: Sun Nov 09 14:15:58 2008: Master address(es): eu1.master.gamecreate.com,eu2.master.gamecreate.com
LOG: Sun Nov 09 14:15:59 2008: Got ip of '85.17.42.116' for eu1.master.gamecreate.com
LOG: Sun Nov 09 14:15:59 2008: Begin connection attempt
LOG: Sun Nov 09 14:15:59 2008: Connect TCP socket
LOG: Sun Nov 09 14:15:59 2008: TCP connected, create new SSL instance
LOG: Sun Nov 09 14:15:59 2008: Bind SSL to TCP socket
LOG: Sun Nov 09 14:15:59 2008: Begin SSL negotiation
LOG: Sun Nov 09 14:16:02 2008: SSL negotiation complete
LOG: Sun Nov 09 14:16:02 2008: Attempting domain login: XXX/XXX
LOG: Sun Nov 09 14:16:06 2008: FTP connections are permitted
LOG: Sun Nov 09 14:16:29 2008: Received PING request
LOG: Sun Nov 09 14:17:02 2008: Sending PONG, waiting for response
LOG: Sun Nov 09 14:18:05 2008: Sending PONG, waiting for response
LOG: Sun Nov 09 14:19:08 2008: Sending PONG, waiting for response
LOG: Sun Nov 09 14:19:20 2008: PONG response not received, closing connection
LOG: Sun Nov 09 14:19:21 2008: SSL pipe not connected, attempting to connect
LOG: Sun Nov 09 14:19:21 2008: Master address(es): eu1.master.gamecreate.com,eu2.master.gamecreate.com
LOG: Sun Nov 09 14:19:22 2008: FTP connections are not permitted
LOG: Sun Nov 09 14:19:22 2008: Got ip of '85.17.42.116' for eu2.master.gamecreate.com
LOG: Sun Nov 09 14:19:22 2008: Begin connection attempt
LOG: Sun Nov 09 14:19:22 2008: Connect TCP socket
LOG: Sun Nov 09 14:19:22 2008: TCP connected, create new SSL instance
LOG: Sun Nov 09 14:19:22 2008: Bind SSL to TCP socket
LOG: Sun Nov 09 14:19:22 2008: Begin SSL negotiation
LOG: Sun Nov 09 14:19:24 2008: SSL negotiation complete
LOG: Sun Nov 09 14:19:24 2008: Attempting domain login: XXX/XXX
LOG: Sun Nov 09 14:19:28 2008: FTP connections are permitted
LOG: Sun Nov 09 14:19:30 2008: Received PING request
LOG: Sun Nov 09 14:20:24 2008: Sending PONG, waiting for response
...etc
I am sure it's Monday morning i Australia now, and someone just checked in.
So obviously there is some way to "reset" the problem.
Any possibility you can automatically run the "reset-procedure" by cron every
6 hours or so until you are able to isolate and fix the issue? That would help
us out a lot since it would put a cap on how long hosts are unable to communicate
with the master.
Just to be clear: It is not the master that goes offline. Or at least it doesn't
look like that from where I am sitting. What happens is that some clients
lose their connection and are unable to connect for a prolonged period of
time. Other clients are able to maintain their connection, and sometimes
even reconnect if I manually stop and restart the client while this is happening
(though most often a disconnected client will not be able to reconnect).
So the master is obviously there, it is just refusing to acknowledge a number
of hosts. And as time goes, more and more hosts fall into the group of
connectionless hosts until the problem is magically resolved and they all
pop back within a couple of minutes.
Every now and then a disconnected host will be able to connect for just
enough time to fire up a dead server (or stop a running server), but that
happens like once in 100 connection attempts. And a few seconds later it
is disconnected again, as shown above.
The problems with disconnects that last for a few hours, not to mention a few
days, are, in a semi-descending order:
- Stopped/crashed servers can't be restarted.
- Servers can't be reconfigured.
- Software can't be upgraded.
- New servers can't be created.
We don't use temporary servers in our domains, but I am sure that would
have been a problem as well.
Please let me know if there's anything I can do to help.
Restarting the client does not make any difference at all when this happens.
There are instances when restarting the client sometimes helps, such as
when there's an SSL-error ( link ). But when the lack of pong response
happens ( link ), I don't think I've ever experienced a fix by restarting the
client. Usually what happens is that suddenly all hosts are well again.
FYI: No hosts have so far become stuck since you did whatever you did
Monday morning. But it's just a matter of time before it happens again, and
I believe it is because of something you guys do, such as this Monday
morning Australian time.
PM has been sent.
Last edited by Kybber on Fri Nov 14, 2008 5:21 pm; edited 1 time in total
Sometime earlier this week I shut off report generation to try and stabilize the website which we were receiving timeout alerts on. I've left it shut off since then; this perhaps coincides with when you saw things come good.
I'm going to get an alert put on your domain so I can see if you drop below 80% hosts online and leave that in place over the weekend.
If it still seems OK we'll re-enable reporting for a bit and keep monitoring it. There's some discussion internally that perhaps we just need to upgrade to a beefier server.
We understand its a bit frustrating to have these problems; and we obviously want GameCreate to be rock solid stable - as it has been for Australia, so we are looking in to getting this fixed for Europe.
Our hosts are still trouble-free since last Monday. So your problem is most
likely unrelated to the problems described in this thread.
How far does the log-on process go before you presumably get an error?
What is printed in the GC window (if you use Windows) or the GC log (if you
use Linux)?
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum