Fedia stablility update
Fedia.io had a few issues over the past 24 hours - sometimes working find till you click on certain posts, which result in an error 500, and other times just getting an error 500 no matter what....
Admin
Fedia.io had a few issues over the past 24 hours - sometimes working find till you click on certain posts, which result in an error 500, and other times just getting an error 500 no matter what....
The server fedia.io had been running on started developing stability problems overnight from Thursday April 4 to Friday April 5. By Saturday (today), the system was completely unbootable. After attempting to resolve the hardware issue with Hetzner (the ISP) for about 6 hours, I gave up and moved the site to a new server. All...
Fedia.io is now running on it's own stand-alone server. This server is costing me $210/month, so any contributions are welcome. Please let me know of any issues.
I am going to be rehoming fedia.io to a new server. Today, fedia.io runs on shared app server and shared database server. I will be moving fedia.io to a dedicated 48 core/96 thread AMD epyc server with 256GB of RAM, which will house both the app and the database. This is less about fedia.io and more about the other apps...
You have no doubt noticed that federation is breaking again. I am painfully aware of it. The issue is with the symphony queue runner that processes incoming messages from other instances. Occasionally, the server receives a message that causes the queue runner to die. I have to manually remove the offending message out of...
Hi all. I was recently made aware that people have been getting error 429's and other error 500's when visiting fedia.io. My hope/expectation is that those will no longer happen now that I've moved back to a bare metal install, but if you do experience that, please comment below, or if that doesn't work, send me an emai to...
Hi all. It took a while for me to get to it, but I have just completed rehoming fedia.io from containers back to a bare metal install. This should fix the issue with federation stopping. I'll keep an eye on it, though.
Hello. I will be taking fedia.io offline on Saturday, Feb 3 to rehome it to a different server. Since moving fedia.io to a container-based platform, federation has been very unreliable and slow. I'll be moving to a non-container-based/bare metal install, which should address the federation issues.
Hi all. I am going to be taking fedia.io offline for a bit today to move to a new server, switch to a containerized version, and some other enhancements. Thanks for patience.
I was happily working on a different set of servers when apparently the database server for fedia.io became unresponsive. Once I realized it, I forced a reboot and everything is back to normal. Now that the code and database are stable(ish), I will be moving the site to actual server hardware in the coming days while I have...
Hello everyone. I will be taking fedia.io offline in a few hours (timing dependent on work meetings) to apply the latest mbin updates, which should, among other things, fix 2FA....
I spent an incredible amount of time over the past few days unf*cking the fedia.io database. It should be clean now and that should both dramatically improve performance and fix all or most all of the remaining error 500’s
First, I want to thank those who pointed me to mbin. I spent about 14 hours today with help from the mbin team on and off and found/fixed many problems....
While I’m skeptical there are many people still using fedia.io, I have been continue to try to fix the error 500 problems. These errors are caused by kbin thinking it stored database records with pointers to images (like the avatar image, images attached to posts and the like) but not actually finishing. So when certain...
After really hosing things up yesterday, I seem to have solved at least some of the image problems. Not all of them. I rebuilt the mercure and messenger configs from the ground up using the most recent bare metal config templates and that seems to have helped. Also, I continue to live on the edge with the development channel...
The issue with broken images seems to be getting far worse, and that is the cause of the error 500's. kbin doesn't just skip a missing image, it just dies when trying to render a page. If I don't null out the incorrect database entries often enough, darn near everything starts breaking. I have yet to figure out why this is...
I appreciate your continued patience. Today I applied a few additional fixes, including to nginx, which were contributing to the error 500’s. The good news is that the only errors I see in the kbin app logs are related to inbound post federation from lemmy-based instances. I am sure there are many more issues to, but I’m...
I am hesitant to say that I have good news or that I have fixed anything......
Due to an unplanned sinkectomy today, I didn't have time to spend on fedia as I planned. Tomorrow, I will carve out some time to look through recent code commits to see if any would address the issues we're experiencing.
So here's the deal with kbin: kbin uses of symphony messenger processes, which are roughly equivalent to sidekiq in mastodon....
The site is mostly functioning and respectably fast again....
https://lemmynsfw.com/pictrs/image/d4b63087-1f04-4ad7-b359-cccb8c66627b.webp...
It turns out the way I set up file sharing between the front end web server and the app server created immense slowdowns on fedia. I consolidated some things (web server on app server now), which isn't a long term solution, but will work in the very short term....