IRC log for #evergreen, 2022-01-20

All times shown according to the server's local time.

Time	Nick	Message
00:22		jvwoolf joined #evergreen
06:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:30		rjackson_isl_hom joined #evergreen
08:13		mantis1 joined #evergreen
08:39		mmorgan joined #evergreen
08:55		Dyrcona joined #evergreen
09:12		Keith__isl joined #evergreen
09:24		Keith_isl joined #evergreen
09:26	csharp_	anyone have experience changing/setting the "timeout" listen option in ejabberd? (https://docs.ejabberd.im/admin/configuration/listen-options/#timeout)
09:27	csharp_	default is 5 seconds (per the current docs - having trouble finding docs for 18.01, the version on Ubuntu 18.04)
09:27	csharp_	there's also "send_timeout", which is set to 30 seconds
09:28	csharp_	sorry - 15 seconds
09:42		jvwoolf joined #evergreen
10:09	csharp_	2022-01-19 17:51:27 brick04-head open-ils.auth: [INFO:80766:transport_session.c:653:1642626979107748628] Received <error> message with type cancel and code 503 - is another one of my dead ends
10:26	JBoyer	csharp_, I've seen some references to that code also being sent for a login failure. Do all of your opensrf_core.xml files have the correct user / pass for all of the ejabberd users / instances? And with that, are all of the expected accounts registered everywhere they should be?
10:28	Dyrcona	JBoyer: I would think if that were the case, csharp_ would have more consistent issues with whichever brick it was misconfigured on, but I could be wrong.
10:30	JBoyer	Me too, but that is a weird error to see.
10:30	JBoyer	Or ejabberd is really unhappy about... something.
10:30	JBoyer	I suppose another potential avenue is did the OS get upgraded or are these new VMs with a fresh install?
10:31	csharp_	these were fresh installed back in October
10:33	csharp_	my current paths are 1) something is different with ejabberd 18.04 and we need to add/tweak a config option 2) something in perl 5.26 is breaking something deep in the guts of OpenSRF 3) some specific type of message coming from the client is formatted in a way that OpenSRF/ejabberd chokes on
10:33	csharp_	or something at the Linux kernel/resource level
10:35	csharp_	the "Received <error> message with type cancel and code 503" message may be a red herring - it's occurred 60 times this hour and while we've seen the open-ils.actor breakage a couple of times, we haven't seen it 60 times
10:36	csharp_	plus, looking through some of the ejabberd debug logs I gathered yesterday, that's happening for services aside from open-ils.actor
10:42	csharp_	TCP: request_sock_TCP: Possible SYN flooding on port 5222. Sending cookies. Check SNMP counters.
10:45	csharp_	I need to rule this^^ out as a cause before proceeding I think
10:48	Dyrcona	We're using Ubuntu 18.04 and occasionally have this problem. I suspect that jeffdavis was on the right track with file descriptor limits or something like that. You just have too much traffic for the machine.
10:48	Dyrcona	We're also still on Evergren 3.5 in production.
10:49	Dyrcona	csharp_: You just upgraded production didn't you? Was it to 3.8 or 3.7?
10:53	Dyrcona	csharp_: You're running OpenSRF 3.2.2, right?
10:54	csharp_	Dyrcona: OpenSRF 3.2.2 - Evergreen 3.8.0
10:54	csharp_	we saw it occasionally pre-upgrade - now constant
10:56	Dyrcona	csharp_: My gut thinks that the issues with the web staff client making excess backend calls have increased with 3.8. My gut could be wrong. I am hungry at the moment.
10:57	Dyrcona	csharp_: Have you tried increasing any of the file descriptor limits as suggested in the article shared by jeffdavis yesterday?
10:59	Dyrcona	I guess it matters, too, if you have increased any of the max children settings when you upgraded. Having more children running would lead to more connections as requests are handled.
11:01	Dyrcona	FWIW, we're on OpenSRF 3.2.2 and Evergreen 3.5.3(ish) in production on Ubuntu 18.04 and we get these messages, but not so much that it interferes with production. (No ticket, no problem. Right?)
11:02	JBoyer	And yeah, if there's an open files limit issue that could potentially lead to those SYN flooding messages as things keep trying to connect. They can connect to port 5222, but when the port's file descriptor is dup'd for a new child to listen to it will fail at that point and won't look like a connection refused.
11:02	JBoyer	(Some paraphrasing from memory in there, but even if the specifics are off the end result would be the same)
11:02	Dyrcona	Yeahp.
11:07	Dyrcona	The places in the OpenSRF code where the errors are coming from look like it would be on first connection or getting the first response from ejabberd.
11:09	Dyrcona	RE Chrome issues from yesterday: I've also had to reload GMail more frequently to it to autofill email recipients.
11:11	csharp_	ulimit shows 'unlimited' for every user we've tested opensrf, root, ejabberd
11:11	csharp_	@quote add < Dyrcona> csharp_: My gut could be wrong. I am hungry at the moment.
11:11	pinesol	csharp_: The operation succeeded. Quote #221 added.
11:12	csharp_	oh - didn't mean to keep the csharp_ part :-)
11:17	Dyrcona	:)
11:18	Dyrcona	"unlimited" doesn't mean unlimited. It means use the system limits, which you have to change possibly in multiple places depending on if you're using PAM or not, which you probably are on Ubuntu 18.04.
11:20	Dyrcona	csharp_: That post from metajack that jeffdavis shared tells you what to do: https://metajack.im/2008/09/23/file-descriptors-are-yummy-or-common-pitfalls-of-ejabberd/
11:21	Dyrcona	I highly recommend trying those steps and seeing what happens.
11:25	Dyrcona	"unlimited" usually equals 1024.
11:39	csharp_	Dyrcona: thanks for that info
11:40	csharp_	I've applied the changes suggested in the article - nothing's broken immediately, so there's that :-)
11:41	csharp_	nope - still busted
11:42	Dyrcona	You rebooted?
11:46	csharp_	yes
11:46	csharp_	gonna have to walk away from this for a while - I'm despondent
11:47	Dyrcona	Yeah, I'm going to get some lunch.
12:03		jihpringle joined #evergreen
12:07	csharp_	at this point I'm tempted to revert the new cataloging UI stuff
12:07	csharp_	or move back to Ubuntu 16.04 or something
12:07	csharp_	anyway - haven't gotten to lunch yet, so walking away for real now
12:08	jvwoolf	@bartender csharp_
12:08	* pinesol	fills a pint glass with Samuel Adams Boston Ale (Stock Ale), and sends it sliding down the bar to csharp_ (http://beeradvocate.com/beer/profile/35/1193/)
12:17	* Dyrcona	intercepts the beer for csharp_ and send him a sparkling waster instead.
12:17	Dyrcona	Are any sites on 3.7 seeing this issue?
12:19	mmorgan	Dyrcona: We're on 3.7, not seeing the same issue as csharp_
12:24	jvwoolf	FWIW, we ARE seeing the same thing on 3.6.5 running OpenSRF 3.2.2 and Ubuntu 18.04
12:25	jvwoolf	We just upgraded to opensrf 3.2.2 as part of our 3.6.5 upgrade
12:25	* mmorgan	notes that we are running debian, not ubuntu
12:29	csharp_	mmorgan: what version of debian?
12:29	csharp_	Dyrcona++ # saving me from the beer :-)
12:29	jvwoolf	csharp_: Apologies!
12:30	csharp_	jvwoolf: nah - no worries
12:30	csharp_	I'm 10.5 years sober - it's no longer a struggle for me
12:31	jvwoolf	@tea csharp_
12:31	* pinesol	brews and pours a pot of Wild Snow Sprout Tea, and sends it sliding down the bar to csharp_ (http://ratetea.com/tea/wild-tea-qi/wild-snow-sprout-tea/6447/)
12:31	jvwoolf	Is that better?
12:31	csharp_	there you go!
12:31	Dyrcona	That sounds interesting. I may have to try Wild Snow Sprout Tea.
12:32	Dyrcona	csharp_: Have you checked how many files ejabberd has open? I just checked our brick 1, and it has 506 open files at the moment.
12:32	mmorgan	csharp_: debian 10
12:32		jihpringle joined #evergreen
12:33	Dyrcona	mmorgan: Thanks for letting me know. How painful is it for production?
12:33	csharp_	mmorgan: thanks
12:34	Dyrcona	Also, Debian 10 is about the same age as Ubuntu 18, right? It's the same ejabberd version more or less.
12:35	csharp_	822, 739, 640, 669, 879, 765 (bricks 1-6)
12:35	Dyrcona	OK. That's close to the default limit but not there. I wonder if it's just some other resource limit?
12:38	mmorgan	Dyrcona: Do you mean how painful is 3.7? If so, not too (knocks wood).
12:39	mmorgan	Disclaimer: I'm not the sysadmin, so don't have all the gory details of the upgrade/reingest/etc. :)
12:42	Dyrcona	mmorgan: OK. I think that answers my question. I was wondering how painful this issue is for your users on 3.7.
12:43	* Dyrcona	starts to wonder if it is some other erlang bug or ejabberd issue.
12:43	mmorgan	Dyrcona: We don't seem to be experiencing the same issue as csharp_
12:44	* Dyrcona	needs an upgrade.
12:44	Dyrcona	I misread your earlier statement.
12:50	mmorgan	@tea Dyrcona
12:50	* pinesol	brews and pours a pot of Earl Grey Decaffeinated Black Tea, and sends it sliding down the bar to Dyrcona (http://ratetea.com/tea/bigelow/earl-grey-decaf/87/)
12:51	csharp_	lager_file_backend dropped 1 messages in the last second that exceeded the limit of 100 messages/sec
12:53	Dyrcona	erlang logging framework.....
12:53	csharp_	eh - ok
12:53	csharp_	logger/lager - hilarious
12:55	Dyrcona	It looks like ejabberd uses it. Not sure that's the root of the problem but might be worth trying to rule it out.
12:59		mmorgan1 joined #evergreen
13:00	Dyrcona	I found a bunch of other erlang processes running on one of our bricks, but each only has 3 files open.
13:03		mmorgan joined #evergreen
13:09		Ohiojoe joined #evergreen
13:12	Dyrcona	y'know. That limit is per user and those 3 files each would add up to the total. Sure enough, they're all running as ejabberd.
13:20	Dyrcona	In my case, that's an additional 48 to 50 files open.
13:24		mantis1 joined #evergreen
13:29		ohiojoe joined #evergreen
13:31		terranm joined #evergreen
13:35		rjackson_isl_hom joined #evergreen
13:53		ohiojoe joined #evergreen
13:55	ohiojoe	hello out there
13:55	terranm	I just found out that since the upgrade to 3.8 the Notification Action Triggers we have that are creating messages for the Patron Message Center are setting Patron Visible to No
14:20	mmorgan	terranm: Maybe because the database table default is pub = false?
14:22	terranm	Probably
14:22	terranm	There's no way to control that in the Notification Action Triggers interface
14:22	terranm	Trying to find where the code for that is.,..
14:29	Dyrcona	It might be Open-ILS/src/perlmods/lib/OpenILS/Application/Trigger/Readtor/ProcessMessage.pm but that looks like it just processes a template.
14:32	csharp_	it's in Application/Trigger/Event.pm in the react sub
14:32	terranm	Looking at that one now
14:33	csharp_	it specifies there that pub should be "t", but I can see that it's coming through in the logs as undef and NULL
14:33	terranm	It looks like it's trying to set pub to t
14:33	terranm	Jinx
14:35	csharp_	ac037f5143b3 is the relevant commit
14:35	pinesol	csharp_: [evergreen\|Jason Etheridge] lp1846354 toward consolidated patron notes - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=ac037f5>
14:35	terranm	Is it something like Perl wanting the boolean to be t instead of 't' ... or 1?
14:36	csharp_	I think 't' should work...
14:37	Dyrcona	What's there looks correct. You use 't' for a database boolean, not 0 or 1.
14:37	terranm	hmm
14:38	terranm	Oh, there's also EventGroup.pm and it looks like that one is neglecting to set the boolean
14:38	Dyrcona	Ah, there you gol!
14:38	Dyrcona	Go, even.
14:38	terranm	Fix pending...
14:39	csharp_	terranm++
14:40	Dyrcona	I don't like that it's redundant code. That ought to be refactored to a single method to set the user message values that could be called from either Event.pm or EventGroup.pm, but I'll leave that as an exercise for myself for later. :)
14:45	terranm	+1
14:45	Dyrcona	terranm++
14:46	terranm	https://bugs.launchpad.net/evergreen/+bug/1958573
14:46	pinesol	Launchpad bug 1958573 in Evergreen "Action triggers that create messages for Patron Message Center are setting visiblity to false" [High,New]
14:46	terranm	patch ready for testing
14:46	Dyrcona	Yeah, I got the email. I've not seen it in the wild, but I think looking at the code qualifies me to confirm it.
14:49	mmorgan	terranm++
14:54		terranm joined #evergreen
14:56		rjackson_isl_hom joined #evergreen
16:31	terranm	Perl change in place and verified that it's working in our production environment with grouped PMC messages
16:33		jihpringle joined #evergreen
17:03		mmorgan left #evergreen
17:52		mantis1 joined #evergreen
18:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
19:08		jihpringle joined #evergreen
19:44		jihpringle joined #evergreen
23:08		Keith_isl joined #evergreen