ForumsNewsServer issues May 18, May 22


Server issues May 18, May 22
Author Message
mdrain

Posted: May 22, 2017
Score: 0 Reference
Posted by Jake:
We have no evidence that Toodledo has not been hacked.


:)
Jake

Toodledo Admin
Posted: May 22, 2017
Score: 0 Reference
Typing too fast without thinking :) The "not" should not be there.
strativourakis

Posted: May 22, 2017
Score: 0 Reference
Posted by fireworks:
Jake could I put in a suggestion for background syncing in iOS? I don't use Toodledo very often on my iPhone but as soon as I couldn't access Toodledo yesterday I went to my iPhone to do some work and realized that since I hadn't opened it in 3 weeks it was hopelessly out of date.
With background sync we would always have a current backup available for future situations.

Fred


I second this, x1,000,000!
:-)
strativourakis

Posted: May 22, 2017
Score: 1 Reference
Hacked?? You mean my grocery list might be out there in the wild???????

LOL....
Jake

Toodledo Admin
Posted: May 23, 2017
Score: 0 Reference
We have now deployed some better tools and have a better idea of what is going on. It shouldn't happen again, but if it does, it will be a much more brief event.
Jake

Toodledo Admin
Posted: May 23, 2017
Score: 0 Reference
The problem has been solved. And it turns out that it was entirely our fault. We were attacking ourselves. How embarrassing.

Technical Details:
************************
When Toodledo was experiencing problems on the 18th and 22nd, we saw two things. First, we saw that our load balancer had a huge spike in the number of active connections to our servers. We're talking a 100x increase. This isnt something you can fix by just adding another server or two. The second thing we saw was on the web servers. It was a similar thing, where each server was hitting its max connections limit and then slowing down and dropping connections. This is what was causing people to get broken pages and SSL errors. Our natural first assumption was that our site was under some sort of external attack. This was the only thing that we could think of that would cause a 100x increase in connections. So we went down this path and implemented some tools to help us with this scenario. Which worked. Within an hour of deploying our tools, we were able restore the site to normal operation. And on the second occurrence, it was even faster.

Now that the immediate danger was taken care of, we started taking a look at the problem from different directions and noticed that all of the bad connections were to the same URL in the new tasks section. This seemed like an odd place for an attack, so we investigated. It turns out that the new tasks section had a bug that was causing a double infinite loop! When attempting to complete a repeating task with certain attributes, the server would get into an infinite loop processing that action and eventually timeout after a minute and return an error. The web browser would see this error and immediately retry the action, again and again and again. Double infinite loop all the way!

So essentially, many regular Toodledo users were making repeated connections to our website, each of which would tie up that connection for 30 seconds at a time. This was a pretty rare thing, so it had to build up over time to be a problem. And we have a bunch of extra capacity built into Toodledo, so it took about 4 days for enough users to experience the bug and for those stalled connections to build up to the tipping point where it caused trouble. At that point, there were so few connections left unused, that all the regular normal connections to the website would back up at the load balancer waiting for their turn. The bug was born on Monday of last week. 4 days later the symptoms appeared on the 18th. We reset everything and then 4 days later on the 22nd it happened again. But we have just deployed the real fix. No more bug. So it won't happen again on the 26th. And if it does, I'll eat my hat.
************************

On the plus side, we now have some really good tools to help us detect and circumvent an event where someone was attacking us. And we also made a bunch of small optimizations to make the site generally faster and more stable. So its not all bad. For those of you with a Platinum Subscription, we will be sending out credits in the next few days.

I am very sorry for any inconvenience that this may have caused to anyone. I really want Toodledo to be a reliable website that people trust with their important tasks, and for about 4 hours this month we failed. Thanks for everyone's patience as we worked through this.
pawelkaleta

Posted: May 24, 2017
Score: 0 Reference
Posted by Jake:
When attempting to complete a repeating task with certain attributes, the server would get into an infinite loop


Thank you for this type of information we can always count on from your side. Hopefully it wasn't any of my repeating tasks I had completed which trigerred all of this! :-)
JD Osterman

Posted: May 24, 2017
Score: 0 Reference
I really appreciate that detailed explanation; also your persistent investigation & resolution. You weren't alone in thinking that it was a DDoS attack. Toodledo is far too popular and important to avoid that kind of thing forever !

Just out of curiosity, what are you guys working on to confront that eventuality?

Of course you shouldn't share your 'secrets' but it would be nice to know, in kind of general terms -- it is mighty scary when you're 'electronic brain' suddenly locks up on you :-)


This message was edited May 24, 2017.
Salgud

Posted: May 25, 2017
Score: 0 Reference
Thanks, Jake, I appreciate your candor. If you have to eat your hat next month, I'll offer some ketchup and some mustard to make it a little more palatable!
You cannot reply yet

U Back to topic home

R Post a reply

Skip to Page:  1   2  

To participate in these forums, you must be signed in.