[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
On 18/04/20 18:44, comrade meowski wrote: > On 18/04/2020 18:20, Michael Everitt wrote: >> Yeah, I can see all the signs point to something automatic that's supposed >> to be "smart" being "dumb" somehow .. and this is where the KISS principle >> always wins out, as I'm sure you know. Each layer of complexity*will* work >> *flawlessly* in isolation, but as you add the layers up, you add potential >> for a new "edge case" to emerge at each new 'stage'. I don't need to tell >> you (if even I could) how to 'drill down' as you seem to have a good idea >> where you're going (where many of us would struggle to know where to START) >> .. so all I can do is wish you 'Good luck' on your investigations.... :p > > You're dead right of course - and the depth of complexity here (my "home" > network is pretty full on as I use it for testing everything that goes > into production) is the issue. I've sniffed through all layers so far > looking for the smoking gun but can't isolate the fault to any one thing. > Last week everything was the same and worked flawlessly: the major > changes since then are the LACP, some VLAN and the entire switch is new. > But all known quantities. I really want to blame that bit... but the > evidence doesn't support it. > > I know I've said I _think_ it's about 3 different things by now but > Vbox's handling of bridged VMs has been historically flaky - there were > unofficial patches for years to work around bridged VMs not getting DHCPs > over a wifi host adaptor for example. There are outstanding VBox bugs > from years ago specifically that look a lot like my issue - perhaps a > regression (6.1.6 also came out in the last week)? The thing is I'm > picking on Vbox here because it's where the fault first showed itself - > there are some similar, but subtly different, issues with my other > hypervisors too. Gah! > > > What happens, out of interest, if you dump all the geo- and round-robin > > crap, and hard-code in some known-good stuff? Is that gonna help/hinder > any > > stable configs? > > I can't - that's how the top level mirrors work. As I said, if I manually > edit individual effected VMs to point to specific repos instead of the > load balancers suddenly DNS stops timing out and they work. I tore my DNS > to bits looking for the caching error but there isn't one. Thing is, I'm > not doing that for obvious reasons - partly because it would suck as a > sidestep and not an actual fix. Mostly however because VBox is endemic > among my clients for work stuff, much as I'd like them to use a more > grown up solution like KVM or even Xen. It's free, simple, cross-platform > and "just works" - plus it's the normal hypervisor integration tool the > more advanced ones are using for automation/orchaestration/CI stuff > (think Vagrant, Boxes, Jenkins, etc). 9 out of 10 subcontracted devs they > have turning up to do specific jobs rock up with Mac or Linux laptops and > everything prototyped in VBox. So sucky or not, it has to work. And it > has to work in a standard "proper" managed network which automatically > means a server with bonded interfaces, VLANs and split horizon DNS. > > Holy crap, I've just thought of something - if the load balancers are > doing reverse lookups on me as they geolocate my VMs then my DNS server > may feed them back different results depending on how they "see" my DNS. > > I know full well at this point tcpdump and setting up a switch port > mirror or two to dump full PCAPs is going to be needed to get to the > bottom of this. Might even make my first post to r/sysadmins at this > rate, I'm not above asking for help by any means! > Again, rubber-ducking (and/or simply dissecting the problem to explain it) can often lead to spotting an inconsistency, so feel free to continue either via list or DM, I don't mind. Of course I don't really know enough about the innards, but some of the explaining may shine light on a corner that you've missed so far. Often the exercise is just enough to reveal the cause/solution .. so all good here .. whatever you feel is useful ..
Attachment:
signature.asc
Description: OpenPGP digital signature
-- The Mailing List for the Devon & Cornwall LUG https://mailman.dcglug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/listfaq