DevOps is not easy
I'm writing this post purely to inform anyone interested in what it takes to keep services on the web running. When I started my journey I wasn't a professional developer or any kind of systems administrator. I just had a bunch of skills gathered over 40 years of tinkering with computers and the ability to search the web and watch YouTube videos.
Today I find myself running a service which a growing number of people are using on a regular basis and for which I've been compensated for building by the Hive DHF. Here's a little story about a glitch which stopped @v4vapp v4v.app from working properly for a couple of hours last night.
Support Proposal 244 on PeakD
Support Proposal 244 with Hivesigner
Support Proposal 244 on Ecency
Vote for Brianoflondon's Witness KeyChain or HiveSigner
Sweden Network
My @v4vapp service is run mostly from 3 servers hosted by @privex who I'm very happy with. They're small, independent and Hive focused. They take payment in HBD which is a huge bonus. They should also accept Lightning but they don't and I'm working on that.
For some reason I woke up at 4am my time. I know I shouldn't, but I can't help myself, I looked at the internal messaging system I use for monitoring my services and got a screen full of horror. My server which watches the Hive blockchain was unable to see Hive. That means that if someone sends me some Hive to pay a Lightning invoice, nothing will happen as the processes will not start.
Panic
I knew this wasn't good but I could see that the system I wrote to detect network problems was working to some extent. However because I was getting these error messages I knew that the servers were still online.
Fortunately I had the presence of mind to check on @privex Discord and there I found the likely cause of my problems. Fortunately I seem to have woken up pretty soon after the last message.
Less Panic
What then follows is a period where I wait to see if the error detection and re-start code I wrote actually works. My design is that it should just keep trying until it starts to work. And that is exactly what happens.
Lightning Invoice expiry
My system is working between Hive and Lightning. Hive is a blockchain as you may know, but Lightning isn't. Nevertheless my Lightning Node is running all the time and so the idea is that if something like this goes wrong and my service is unable to read Hive blocks or connect to my Lightning Node, it should just wait until it can. As soon as it can see Hive again, it will pick up reading blocks where it stopped.
That's exactly what happened and a few minutes after the data centre was back to normal, my system started picking up Hive messages which came in during the outage. There was one particular attempt to pay a Lightning invoice which failed because the Lightning invoice expired. Exactly as my system is designed to do, it returned all the Hive.
Everything Normal
Basically all this is to say, I did nothing and everything is working again. But that's only because of a large amount of work getting to this state. There are things I would change if doing this again, and things I will change in my push to open source and get the service documented to a point where someone else can run it.
Show some love for the people who keep the web running!
All this is to say that these services which we all use take a lot of work to keep going and to build in resilient ways. I have a lot of respect for the people who run systems especially those much larger and more heavily used than my own!
Once again the only reason I'm able to do this and keep this stuff going is because of the continued support of the DHF so I thank you all again for your votes and look forward to seeing more of you use @v4vapp .
Support Proposal 244 on PeakD
Support Proposal 244 with Hivesigner
Support Proposal 244 on Ecency
Vote for Brianoflondon's Witness KeyChain or HiveSigner