jerry avatar

jerry Admin

@jerry@fedia.io

Scheduled downtime for fedia.io

Hello. I will be taking fedia.io offline on Saturday, Feb 3 to rehome it to a different server. Since moving fedia.io to a container-based platform, federation has been very unreliable and slow. I'll be moving to a non-container-based/bare metal install, which should address the federation issues.

jerry Admin ,
jerry avatar

I'm going to rebuild fedia on a non-container server which should fix the problem

jerry Admin ,
jerry avatar

The problem is that federation keeps dying every few hours - well, it doesn't entirely die, but becomes incredibly slow.

jerry , to Random stuff
@jerry@infosec.exchange avatar

I feel like there have been more IT workers laid off in the past year than there were IT workers 10 years ago. Totally unscientific observation

jerry Admin ,
jerry avatar

@jerry x

jerry Admin ,
jerry avatar

I think you are unlucky enough to hit it while I am restarting the services to get the federation working again.

jerry Admin ,
jerry avatar

Hi all. I am working on it. I am trying to figure out what the problem is this time rather than just restarting the containers.

jerry Admin ,
jerry avatar

I am going to apply this upgrade this weekend. I spent a few hours looking through logs this morning but didn't see anything obvious (there are a breathtaking amount of logs) so gave up and restarted the containers. Will look again when it fails after I make log storage persistent

jerry Admin ,
jerry avatar

lemmy.world is having some issues

Flaky GIF uploads

Uploading GIFs on this site has been hit-or-miss for the few months I've been here, but where they used to fail with an error page a while back, the GIF is now quietly discarded from the thread. Posting GIFs to this instance is broken from kbin.social as well (where the whole post fails) so it's not a front-end issue.

jerry Admin ,
jerry avatar

thanks - I wasn't aware of the issue. I will try to figure out why.

jerry Admin ,
jerry avatar

How large are the gifs you're posting? So far, my test uploads have been successful and I'm not able to get it to fail...

jerry Admin ,
jerry avatar

That gives me something to go on. Thank you.

jerry Admin ,
jerry avatar

if/when you see things like this, please tag my @jerry account - the notifications on fedia are a bit lacking.

jerry Admin ,
jerry avatar

Kbin/mbin is still very much a work in progress and there are certain features that aren’t implemented yet, unfortunately

jerry Admin ,
jerry avatar

Most were suspended for child porn at various points - though I think it was not necessarily something they condone, but lemmy has damn near no ability to moderate or remove images once they land on a server

jerry OP Admin ,
jerry avatar

Ok. Maintenance completed. I've moved fedia.io to a container-based setup that is now running on my main infrastructure - a 48 core/96 thread AMD epyc zen4 genoa with 256GB of DDR5 ECC ram and 2x4TB nvme SSDs, backed by a dedicated database server with the exact same specs, on a 10gbps network.

It's always something... [sorry for the downtime]

I was happily working on a different set of servers when apparently the database server for fedia.io became unresponsive. Once I realized it, I forced a reboot and everything is back to normal. Now that the code and database are stable(ish), I will be moving the site to actual server hardware in the coming days while I have...

jerry Admin ,
jerry avatar

I’ll take a look. That seems to coincide with when I applied the latest mbin updates

jerry Admin ,
jerry avatar

I am working with the most excellent mbin team trying to figure out why this is happening...

jerry Admin ,
jerry avatar

ok - it's fixed now. My apologies for the problems.

jerry Admin ,
jerry avatar

My best recommendation is to create issues in their github repo here: https://github.com/MbinOrg/mbin/issues

jerry Admin ,
jerry avatar

I think the issue now is with the containerized version of mbin. I tried saving money by running it in my container host, but it's clear to me that is not going to work out. So I will be rolling it back to a bare metal server :/

jerry Admin ,
jerry avatar

I am going to wait till it happens again and try to debug. I really don’t want another $100/month server bill if I can avoid it

jerry Admin ,
jerry avatar

That would be great. Thank you!

iamnomad , (edited ) to Random stuff
iamnomad avatar

@jerry My comments and boosts from before the downtime last week have disappeared from this instance. They still appear on other instances, and they are still counted in the comment section on magazines here but are invisible (the comment count on my profile only includes the ones I made after the server came back online).

jerry Admin ,
jerry avatar

@iamnomad some of existing data was so malformed that I couldn’t easily fix it in a way that would allow the site to continue operating. I have been (slowly) attempting to piece it back together, though I am not convinced it’ll work in many cases.

jerry Admin ,
jerry avatar

@iamnomad do you get the error when attempting to submit the contact form or when visiting the page?

jerry OP Admin ,
jerry avatar

Maintenance completed

jerry OP Admin ,
jerry avatar

Is your account @socialspirit?

jerry OP Admin ,
jerry avatar

thanks. I am working on it

jerry OP Admin ,
jerry avatar

Can you give it another try? It looks to be correlated with php running out of memory (not sure why your session would require more that 512MB of ram, but lets confirm that is the case first.

jerry OP Admin ,
jerry avatar

ok - I bumped up the memory limit in php to 1gb and that appears to have fixed the problem. I have no idea yet why it takes so much memory, though. But I'm glad it worked

jerry OP Admin ,
jerry avatar

no - this is the first time - and as far as I can tell, the only time it's happened according to my logs...

jerry Admin ,
jerry avatar

I’ll take a look at that

jerry Admin ,
jerry avatar

Many thanks!

jerry Admin ,
jerry avatar

I will run an update tonight. There are some other updates I want to pull in, like removing noise from logs.

jerry Admin ,
jerry avatar

I did not update last night - there are many in flight code changes and I’d prefer not to cause more problems, so letting that settle down for a bit before I update.

jerry Admin ,
jerry avatar

This should be working now.

jerry Admin ,
jerry avatar

Is it still happening?

jerry Admin ,
jerry avatar

Awesome! It took a lot to clean things up, and I am not going to be surprised if I missed something

jerry OP Admin ,
jerry avatar

I am aware that visiting the notifications link gives an error 500. I will work on it.

jerry OP Admin ,
jerry avatar

Ok, this problem is fixed now as well

jerry OP Admin ,
jerry avatar

Thanks. I feel bad that it took this long to get here, though

jerry OP Admin ,
jerry avatar

They should be. I’ve not seen any since I changed the nginx config the prevent that domain from pointing to the mbin instance.

jerry OP Admin ,
jerry avatar

As part of the database clean up, I had to remove many corrupt records, so the answer is yes there are missing posts and comments, however I do have backups have I have been attempting to find ways to reinsert them without recreating the problems. I’ve not had much luck with that yet.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • supersentai
  • WatchParties
  • Rutgers
  • MidnightClan
  • Lexington
  • cragsand
  • mead
  • RetroGamingNetwork
  • mauerstrassenwetten
  • loren
  • xyz
  • PowerRangers
  • AnarchoCapitalism
  • kamenrider
  • Mordhau
  • itdept
  • neondivide
  • steinbach
  • AgeRegression
  • WarhammerFantasy
  • Teensy
  • learnviet
  • bjj
  • electropalaeography
  • space_engine
  • khanate
  • jeremy
  • fandic
  • All magazines