Author |
Message |
Jake
Toodledo Admin
|
Score: 1
-
Jake (Admin)
- Posted: Jun 12, 2009
-
Score: 1
Thanks again, everyone, for all the nice remarks.
I just wanted to give you guys an update on what steps we are taking to prevent this from happening again.
1) We have reconfigured our existing master and backup databases in such a way that a power outage is much less likely to cause this type of problem again.
2) We are now running the tape-recorder system (that I described in my first post) on both the master database and the backup database. This will protect us in the event that the master database has a hard drive failure at the same time that the backup database gets corrupted. It would take several simultaneous failures for this to happen, but you can't be safe enough right?
3) Our worst case scenario has always been losing 24 hours of data, since we run nightly offsite backups. We are now running these offsite backups every 12 hours.
4) We are exploring options to get a battery backup and we are thinking about different database configurations that might be more robust and rebuild faster in the event of another similar issue.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 12, 2009
-
Score: 0
Yes, we intend to make the "add task" box have the fields in the same order that you have set the columns. This is on our to-do list for a future update.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 12, 2009
-
Score: 0
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 12, 2009
-
Score: 0
Thanks for the suggestions.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 12, 2009
-
Score: 0
We definitely plan to export the subtask/parent information, and we are stepping up our plans for this in light of our recent server outage.
|
Jake
Toodledo Admin
|
Score: 1
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 1
Thanks everyone for the positive comments and encouragement. It really does help a ton.
Ok, I'm going to sleep now.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
Yes, more flexible reminders will be coming soon.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
A few replies:
So far, no refunds. I'm amazed. Thanks again, everyone, for all your support. It has made this much easier to deal with emotionally.
I forget who mentioned it, but yes, I did have a panicky thought about following in Ma.gnolia's footsteps last night.
Also, takizoo63kk and several others reported that UTF8 characters (like Japanese) were not saving properly after coming back online. This has been corrected.
This message was edited Jun 11, 2009.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
Thanks for the suggestion.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
Yes, we've meant to do this for a while, but now it's looking like this needs to be bumped up in priority.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
You are correct. We shouldn't list it if it is currently broken. I have temporarily removed it from the page until we can get it fixed.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
You'll need to create a regular email/password login for use with the Firefox addon. You can do this in your account settings.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 0
I'm not sure if we can fix that, since the calendar on the Firefox Addon is a widget built into Firefox, so they would have to fix it.
|
Jake
Toodledo Admin
|
Score: 1
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 1
Sorry, no update yet.
|
Jake
Toodledo Admin
|
Score: 3
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 3
I just wanted to let people who are reading page 3 know that I posted a lengthly explanation in my first post, so go back to page 1.
Also, to answer the one question I saw already. Yes, if we had been unable to recover anything, you could have used your iPhone or any of our other backup options to restore your data. I believe in giving customers a choice, so I make it easy to export of your data to serve as a backup, or to take elsewhere if you decide that Toodledo is not for you. I expect that our personal backup tools will be getting a lot of use today. Hopefully for the purposes of backup and not for going elsewhere :)
|
Jake
Toodledo Admin
|
Score: 25
-
Jake (Admin)
- Posted: Jun 11, 2009
-
Score: 25
I've been working so hard that I haven't had time to pre-type what I was going to announce, so I am doing that now, but wanted to let everyone know quickly that everything has been restored. I'll update this topic shortly with a longer explanation.
UPDATE:
So here is the long story of what has been happening over the last 16 hours. I've built Toodledo on the principal of being completely open and honest about everything, so I'm going to lay everything out there, skeletons and all.
Our servers are hosted by Rackspace, which is a great company with excellent support and top notch datacenters. At 7:15pm CDT yesterday, a severe storm was coming through and Rackspace decided to switch power to generators. During the switch there was a mechanical failure that caused some servers to lose power unexpectedly.
When the servers came back online, we found that our database had become corrupted. Apparently, this is because the database was configured to write data to the filesystem, but the filesystem was configured to flush this to disk every 1 second. During that 1 second, that data was only stored in memory. So when the power went off, that data was lost. When the power came back on, the database freaked out because of that missing second. During this freakout, unknown bad stuff happened and the main database got corrupted beyond repair.
Luckily, we have a live backup database (called a slave) where all the data is replicated in real time. The purpose of a slave is to act as a backup in the event that the master dies. Unfortunately, the slave is an exact identical copy of the master, so when the power went out, the slave had the exact same problem. So now our backup was toast too.
I should say here, that this 1-second buffer was a mistake and I take full responsibility for this. It was this oversight that is likely the cause of the problems. The way that it was setup, it would have been easy to recover if the master or the slave failed independently. A simultaneous failure was unrecoverable. I admit that I did not anticipate a scenario where both the master and slave would fail simultaneously, and I did not understand the ramifications of the 1-second buffer . The database is now configured to flush to disk immediately, which should greatly help in the short term. We are also exploring other options for long term changes.
So, now we were in the sorry state of having to rely on our nightly offline backup, which is done at 4am every day. First, we had to transfer this huge file in from offsite, which took forever. Then we had to import all this data back into the database, which also took forever. This got us restored to 4am yesterday morning. Now, what we needed to do was replay the logs from 4am onward. The logs are like a big tape recorder. Every modification to the database gets logged in a linear fashion to the log. So, if we rewind the tape recorder and then play it back into the database, it won't know the difference from real user interaction and recorded interaction. This replay took forever. When it was done, we ran some tests and came back online.
Fortunately, all of the data has been restored. When I say "all" I should qualify that by saying that we did lose that 1-second buffer. So, if you were using the website at 7:15 CDT last night, there is a slight chance that you may have lost the last thing that you did. There is also a slight but unverifiable chance that people who were using the website at 4:00am CDT yesterday morning might have a few edits missing. This is due to the nature of switching from the offsite backup to the tape recorder playback. The data loss should be extremely minimal, and only for a handful of people using the website yesterday at exactly 4:00am or 7:15pm.
I would just like to say that there is nobody (nobody) more horrified by this than myself. I was sick to my stomach all night; still am a little. Even though no data was lost, 16 hours of downtime is completely inexcusable and unacceptable. I know how important it is to have your to-do list available at all times.
I fully expect to be issuing a lot of refunds and losing customers over this issue. The only thing that I can say is that I am deeply deeply sorry and I am doing everything in my power to prevent this from happening again. Coincidentally, just last night Amazon had a similar weather related outage that affected a huge number of customers, so it can affect even the largest companies. I know that that is no excuse, I just wanted to put things in perspective.
As a small token of appreciation for people who are willing to stick with Toodledo, I will be giving all existing Pro and Pro Plus subscribers a free month on the end of their subscriptions. Also, for the next thirty days, new Pro and Pro Plus subscribers will be getting 13 months instead of the usual 12 for their subscription payment.
I really appreciate all the positive remarks that I have received so far from users.
I am happy to answer questions below.
Thanks,
Jake
This message was edited Jun 11, 2009.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 10, 2009
-
Score: 0
We'll just have to wait and see what it looks like when 3.0 comes out ;)
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 09, 2009
-
Score: 0
Which iPhone app are you using? Pocket Informant perhaps? If so, you'll need to contact the developer of that application for support, since it is a third-party product that may have bugs.
This message was edited Jun 09, 2009.
|
Jake
Toodledo Admin
|
Score: 2
-
Jake (Admin)
- Posted: Jun 09, 2009
-
Score: 2
If you use subtasks, you can easily duplicate an entire checklist. We announced this functionality recently.
If you need to use Folders, it will be a multi-step process. You would need to use our export tools to export a CSV file. Then open this in Excel and delete everything that you don't want to copy. Then change the folder name for all the tasks and import the resulting file back into Toodledo.
|
Jake
Toodledo Admin
|
Score: 0
-
Jake (Admin)
- Posted: Jun 09, 2009
-
Score: 0
Thanks for the suggestion.
|