[Box-Admins] Squeak Server state of the Union or Hold Your Horses but Not Too Long

Thu Oct 5 12:39:11 UTC 2017

Dear All

As the primary person responsible for the current server setup and maintaining some contact with the sfc admins, I'd like to give a bit insight into the situation.

I. The Past
===========

Around this time last year we started migrating our services from Gandi to Rackspace.

This was because (a) the Gandi service was not very responsive and (b) about to be shut down December 2016. But also (c) our server setup had been "worn out" over time[1] and (d) the SFC had for a long time offered to host the services on Rackspace.

Seeing that time was about to run out, I stepped forward and, admittedly, relatively singlehandedly decided the new setup. Having the disk-shortage, interference between services and the effect of 'dragging down' other services when dying in mind, I decides to "spread out" all services as it seemed justifiable. This was mainly due to the _very_ generous offer that we were allowed to use Rackspace services up to an amount of about $2000. 

Therefore, I created a little network of Squeak-related services. I had separated the services into web/files, mail, squeaksource (source.squeak.org and squeaksource.com separate), mantis, others (wiki, map), and later planet.squeak and codespeed. Every server is backed up individually. We make also use of one Rackspace MySql database.
 The ideas here were:
 1. Have only ONE ssh entrypoint to the squeak services (this is why we have the ssh gateway)
 2. Isolate the services as good as possible
 3. Avoid being offline when (a) a server crashes or (b) someone breaks in or (c) some admin makes a mistake.
 4. Have backups of the services.
 5. Have headroom on each server to be future-proof.

Additionally, I had hoped that this service would continue for a long time, so that only minor adjustments would be necessary by the admins. Also, I wanted to minimize attack surface with this setup, so that one compromised service wouldn't expose others. The logs suggest that this threat is not theoretical.

All that contributed to me setting up 8 servers with 1..2GB Memory, and 20..300GB HDD and 1 or 2 CPUs each (plus a Rackspace managed MySQL DB). They are all backed up by Rackspace.

And, oh, our DNS is managed there, too.

Since September 2016, we hat one infrastructure outage, which was the database for one hour in the middle of the night.

So far so good.

II. The Present
===============

The setup as we use it would cost around $500–$600 per month if we had to pay it ourselves[2].

But now we most probably have to leave[3].

To be clear: I _do not_ suggest that we continue in the very setup we have and try to replicate it elsewhere.

However, I think it is a good idea to _know_ what we are actually using. I tallied all services and servers and so on[4] and came to the conclusion, that we currently use the equivalent of

 - 2-2.5GB RAM
 - 200 GB Disk
 - 4-6 CPUs	

[NOTE: THESE ARE NOT EQUIVALENT TO REQUIREMENTS FOR A NEW SETUP!]

These services should therefore be satisfiable with a setup <$100 month.

III. The Future
===============

I think that it is perfectly possible to have a setup with 2 (Two) or 3 (Three) servers.

I would not suggest to consolidate everything, at least web/files and mail should be separated. But if there's no room for that, yes, we would be able to have everything on one server.

That being said here are a few things to consider for the Future Squeak Infrastructure:

* We need at least a month to migrate.

That means, we must start end of the month THE LATEST.

* We need a new DNS provider fast.

I strongly advise against running a DNS server ourselves again[5].
Also, Networksolutions has a very poor user experience.
I personally use INWX[6] and it works well but any other decent registrar should work.
However, this is to be coordinated with Dan Ingalls and Goran Krampe, as Dan owns squeak.org.
Similar stuff to be done for squeakfoundation.org. (and maybe squeakland because of mails).

* We need a new hosting provider.

We have two or three main ways: Pay for ourselves, ask people nicely, (and: hope the SFC figures something out in time, but I wont bet on that)

  - Paying for ourselves could mean several things, like renting servers, webspace, or the like.
    We would probably try to reduce our service footprint even further, by, eg. hosting part of the website with github.com (via pages or the like) or moving from Mantis to github issues, trello, frogbuz or so, but I don't think that would do much in terms of footprint.

    NOTE: I do estimates with 2 servers w/ 2-4 CPUs, 8 or 16 GB, >200GB HDD each.====

    There are several options, but I personally only know a few (Euro-centric, because, you know...()
    # hetzner.de has been our host for some time and we would probably come around some 40-100€/month
    # netcup.de is what I use personally and they are a tad cheaper
    Those have the problem that we would have to care for backup manually (they both offer space for that, at a fee, but without automatization)
    I think there are tons of others, so
	======== Someone™ should survey that field ======
    Also, there the usual suspects:
    # AWS (amazon)
    # GCE (google)
    # heroku
    I think all of those would need a lot of learning and drastic changes to our setup, even when consolidates.
    However, they might be lots cheaper.

    # last not least, rackspace.
    We could ask them, what a consolidated setup would cost us.

  - Asking People nicely

    We as Squeak community could ask people or providers nicely whether they would host us or even sponsor us!

    # netcup.de has an explicit opportunity for non-provits to be sponsored[7]
    This is especially interesting, as we have a registered non-profit in Germany, Squeak e.V. which can accept tax-deductable donations, and that would fit nicely.
    == They need a written application ===
    I don't know whether they would sponsor wholly or partly.

    	===========Please look around whether you know hosters that would sponser..  ============

IV. The End
===========

TL;DR: No tl;dr, please read the thing. 

You may panic now.

Best regards
	-Tobias

[1]: please recall the box2 incident, where, after a service outage, we hat to resurrect box2 piggy-backed on box4. This setup kind-of worked, but as we know, isKindOf should be avoided.
[2]: Yes, this is a lot.
[3]: I don't want to elaborate the why's and wherefore's of the SFC/Rackspace thing, just how we could deal with it.
[4]: Details:
IAN	MEM: 10M	HDD: 10M	LOG: 1.2G
ALAN	MEM: 100M	HDD: 53G	LOG: 1.8G	(nginx)
ADELE	MEM: 1G 	HDD: 10.5G	LOG: 1.5G	(postfix,amavis,clamav,mailman,nginx,fcgiwrap,postsrsd,)
ANDREAS	MEM: 200M 	HDD: 55G	LOG: 300M	(source.squeak.org)
DAN	MEM: 120M	HDD: 25G	LOG: 1.1G	(squeaksource.com)
TED	MEM: 50M	HDD:  6G	LOG: 800M	(wiki,squeakmap)
DAVID	MEM: 20M	HDD:  8G	LOG: 1.2G	(nginx,planet)+(old stuff)
SCOTT	MEM: 400M	HDD:  2G	LOG: 1.2G	(nginx,php(mantis),codespeed)
(MYSQL)	MEM: 50M	HDD: 200M	LOG: 10M	GUESS
OVERHEAD
	MEM: +120M 	HDD: +5G	LOG: -300M each	(ssh,psad,fail2ban,system)
[5]: I run my personal one, it is a nightmare.
[6]: https://www.inwx.com/en/
[7]: https://www.netcup.eu/ueber-netcup/public-relations.php