Evergreen ILS Website

IRC log for #evergreen, 2022-01-20

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
00:22 jvwoolf joined #evergreen
06:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:30 rjackson_isl_hom joined #evergreen
08:13 mantis1 joined #evergreen
08:39 mmorgan joined #evergreen
08:55 Dyrcona joined #evergreen
09:12 Keith__isl joined #evergreen
09:24 Keith_isl joined #evergreen
09:26 csharp_ anyone have experience changing/setting the "timeout" listen option in ejabberd? (https://docs.ejabberd.im/admin/con​figuration/listen-options/#timeout)
09:27 csharp_ default is 5 seconds (per the current docs - having trouble finding docs for 18.01, the version on Ubuntu 18.04)
09:27 csharp_ there's also "send_timeout", which is set to 30 seconds
09:28 csharp_ sorry - 15 seconds
09:42 jvwoolf joined #evergreen
10:09 csharp_ 2022-01-19 17:51:27 brick04-head open-ils.auth: [INFO:80766:transport_sessio​n.c:653:1642626979107748628] Received <error> message with type cancel and code 503 - is another one of my dead ends
10:26 JBoyer csharp_, I've seen some references to that code also being sent for a login failure. Do all of your opensrf_core.xml files have the correct user / pass for all of the ejabberd users / instances? And with that, are all of the expected accounts registered everywhere they should be?
10:28 Dyrcona JBoyer: I would think if that were the case, csharp_ would have more consistent issues with whichever brick it was misconfigured on, but I could be wrong.
10:30 JBoyer Me too, but that is a weird error to see.
10:30 JBoyer Or ejabberd is *really* unhappy about... something.
10:30 JBoyer I suppose another potential avenue is did the OS get upgraded or are these new VMs with a fresh install?
10:31 csharp_ these were fresh installed back in October
10:33 csharp_ my current paths are 1) something is different with ejabberd 18.04 and we need to add/tweak a config option 2) something in perl 5.26 is breaking something deep in the guts of OpenSRF 3) some specific type of message coming from the client is formatted in a way that OpenSRF/ejabberd chokes on
10:33 csharp_ or something at the Linux kernel/resource level
10:35 csharp_ the "Received <error> message with type cancel and code 503" message may be a red herring - it's occurred 60 times this hour and while we've seen the open-ils.actor breakage a couple of times, we haven't seen it *60* times
10:36 csharp_ plus, looking through some of the ejabberd debug logs I gathered yesterday, that's happening for services aside from open-ils.actor
10:42 csharp_ TCP: request_sock_TCP: Possible SYN flooding on port 5222. Sending cookies.  Check SNMP counters.
10:45 csharp_ I need to rule this^^ out as a cause before proceeding I think
10:48 Dyrcona We're using Ubuntu 18.04 and occasionally have this problem. I suspect that jeffdavis was on the right track with file descriptor limits or something like that. You just have too much traffic for the machine.
10:48 Dyrcona We're also still on Evergren 3.5 in production.
10:49 Dyrcona csharp_: You just upgraded production didn't you? Was it to 3.8 or 3.7?
10:53 Dyrcona csharp_: You're running OpenSRF 3.2.2, right?
10:54 csharp_ Dyrcona: OpenSRF 3.2.2 - Evergreen 3.8.0
10:54 csharp_ we saw it occasionally pre-upgrade - now constant
10:56 Dyrcona csharp_: My gut thinks that the issues with the web staff client making excess backend calls have increased with 3.8. My gut could be wrong. I am hungry at the moment.
10:57 Dyrcona csharp_: Have you tried increasing any of the file descriptor limits as suggested in the article shared by jeffdavis yesterday?
10:59 Dyrcona I guess it matters, too, if you have increased any of the max children settings when you upgraded. Having more children running would lead to more connections as requests are handled.
11:01 Dyrcona FWIW, we're on OpenSRF 3.2.2 and Evergreen 3.5.3(ish) in production on Ubuntu 18.04 and we get these messages, but not so much that it interferes with production. (No ticket, no problem. Right?)
11:02 JBoyer And yeah, if there's an open files limit issue that could potentially lead to those SYN flooding messages as things keep trying to connect. They can connect to port 5222, but when the port's file descriptor is dup'd for a new child to listen to it will fail at that point and won't look like a connection refused.
11:02 JBoyer (Some paraphrasing from memory in there, but even if the specifics are off the end result would be the same)
11:02 Dyrcona Yeahp.
11:07 Dyrcona The places in the OpenSRF code where the errors are coming from look like it would be on first connection or getting the first response from ejabberd.
11:09 Dyrcona RE Chrome issues from yesterday: I've also had to reload GMail more frequently to it to autofill email recipients.
11:11 csharp_ ulimit shows 'unlimited' for every user we've tested opensrf, root, ejabberd
11:11 csharp_ @quote add < Dyrcona> csharp_: My gut could be wrong. I am hungry at the moment.
11:11 pinesol csharp_: The operation succeeded.  Quote #221 added.
11:12 csharp_ oh - didn't mean to keep the csharp_ part :-)
11:17 Dyrcona :)
11:18 Dyrcona "unlimited" doesn't mean unlimited. It means use the system limits, which you have to change possibly in multiple places depending on if you're using PAM or not, which you probably are on Ubuntu 18.04.
11:20 Dyrcona csharp_: That post from metajack that jeffdavis shared tells you what to do: https://metajack.im/2008/09/23/file-descripto​rs-are-yummy-or-common-pitfalls-of-ejabberd/
11:21 Dyrcona I highly recommend trying those steps and seeing what happens.
11:25 Dyrcona "unlimited" usually equals 1024.
11:39 csharp_ Dyrcona: thanks for that info
11:40 csharp_ I've applied the changes suggested in the article - nothing's broken immediately, so there's that :-)
11:41 csharp_ nope - still busted
11:42 Dyrcona You rebooted?
11:46 csharp_ yes
11:46 csharp_ gonna have to walk away from this for a while - I'm despondent
11:47 Dyrcona Yeah, I'm going to get some lunch.
12:03 jihpringle joined #evergreen
12:07 csharp_ at this point I'm tempted to revert the new cataloging UI stuff
12:07 csharp_ or move back to Ubuntu 16.04 or something
12:07 csharp_ anyway - haven't gotten to lunch yet, so walking away for real now
12:08 jvwoolf @bartender csharp_
12:08 * pinesol fills a pint glass with Samuel Adams Boston Ale (Stock Ale), and sends it sliding down the bar to csharp_ (http://beeradvocate.com/beer/profile/35/1193/)
12:17 * Dyrcona intercepts the beer for csharp_ and send him a sparkling waster instead.
12:17 Dyrcona Are any sites on 3.7 seeing this issue?
12:19 mmorgan Dyrcona: We're on 3.7, not seeing the same issue as csharp_
12:24 jvwoolf FWIW, we ARE seeing the same thing on 3.6.5 running OpenSRF 3.2.2 and Ubuntu 18.04
12:25 jvwoolf We just upgraded to opensrf 3.2.2 as part of our 3.6.5 upgrade
12:25 * mmorgan notes that we are running debian, not ubuntu
12:29 csharp_ mmorgan: what version of debian?
12:29 csharp_ Dyrcona++ # saving me from the beer :-)
12:29 jvwoolf csharp_: Apologies!
12:30 csharp_ jvwoolf: nah - no worries
12:30 csharp_ I'm 10.5 years sober - it's no longer a struggle for me
12:31 jvwoolf @tea csharp_
12:31 * pinesol brews and pours a pot of Wild Snow Sprout Tea, and sends it sliding down the bar to csharp_ (http://ratetea.com/tea/wild-tea​-qi/wild-snow-sprout-tea/6447/)
12:31 jvwoolf Is that better?
12:31 csharp_ there you go!
12:31 Dyrcona That sounds interesting. I may have to try Wild Snow Sprout Tea.
12:32 Dyrcona csharp_: Have you checked how many files ejabberd has open? I just checked our brick 1, and it has 506 open files at the moment.
12:32 mmorgan csharp_: debian 10
12:32 jihpringle joined #evergreen
12:33 Dyrcona mmorgan: Thanks for letting me know. How painful is it for production?
12:33 csharp_ mmorgan: thanks
12:34 Dyrcona Also, Debian 10 is about the same age as Ubuntu 18, right? It's the same ejabberd version more or less.
12:35 csharp_ 822, 739, 640, 669, 879, 765 (bricks 1-6)
12:35 Dyrcona OK. That's close to the default limit but not there. I wonder if it's just some other resource limit?
12:38 mmorgan Dyrcona: Do you mean how painful is 3.7? If so, not too (knocks wood).
12:39 mmorgan Disclaimer: I'm not the sysadmin, so don't have all the gory details of the upgrade/reingest/etc. :)
12:42 Dyrcona mmorgan: OK. I think that answers my question. I was wondering how painful this issue is for your users on 3.7.
12:43 * Dyrcona starts to wonder if it is some other erlang bug or ejabberd issue.
12:43 mmorgan Dyrcona: We don't seem to be experiencing the same issue as csharp_
12:44 * Dyrcona needs an upgrade.
12:44 Dyrcona I misread your earlier statement.
12:50 mmorgan @tea Dyrcona
12:50 * pinesol brews and pours a pot of Earl Grey Decaffeinated Black Tea, and sends it sliding down the bar to Dyrcona (http://ratetea.com/tea/big​elow/earl-grey-decaf/87/)
12:51 csharp_ lager_file_backend dropped 1 messages in the last second that exceeded the limit of 100 messages/sec
12:53 Dyrcona erlang logging framework.....
12:53 csharp_ eh - ok
12:53 csharp_ logger/lager - hilarious
12:55 Dyrcona It looks like ejabberd uses it. Not sure that's the root of the problem but might be worth trying to rule it out.
12:59 mmorgan1 joined #evergreen
13:00 Dyrcona I found a bunch of other erlang processes running on one of our bricks, but each only has 3 files open.
13:03 mmorgan joined #evergreen
13:09 Ohiojoe joined #evergreen
13:12 Dyrcona y'know. That limit is per user and those 3 files each would add up to the total. Sure enough, they're all running as ejabberd.
13:20 Dyrcona In my case, that's an additional 48 to 50 files open.
13:24 mantis1 joined #evergreen
13:29 ohiojoe joined #evergreen
13:31 terranm joined #evergreen
13:35 rjackson_isl_hom joined #evergreen
13:53 ohiojoe joined #evergreen
13:55 ohiojoe hello out there
13:55 terranm I just found out that since the upgrade to 3.8 the Notification Action Triggers we have that are creating messages for the Patron Message Center are setting Patron Visible to No
14:20 mmorgan terranm: Maybe because the database table default is pub = false?
14:22 terranm Probably
14:22 terranm There's no way to control that in the Notification Action Triggers interface
14:22 terranm Trying to find where the code for that is.,..
14:29 Dyrcona It might be Open-ILS/src/perlmods/lib/OpenILS/Applic​ation/Trigger/Readtor/ProcessMessage.pm but that looks like it just processes a template.
14:32 csharp_ it's in Application/Trigger/Event.pm in the react sub
14:32 terranm Looking at that one now
14:33 csharp_ it specifies there that pub should be "t", but I can see that it's coming through in the logs as undef and NULL
14:33 terranm It looks like it's trying to set pub to t
14:33 terranm Jinx
14:35 csharp_ ac037f5143b3 is the relevant commit
14:35 pinesol csharp_: [evergreen|Jason Etheridge] lp1846354 toward consolidated patron notes - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=ac037f5>
14:35 terranm Is it something like Perl wanting the boolean to be t instead of 't' ... or 1?
14:36 csharp_ I think 't' should work...
14:37 Dyrcona What's there looks correct. You use 't' for a database boolean, not 0 or 1.
14:37 terranm hmm
14:38 terranm Oh, there's also EventGroup.pm and it looks like that one is neglecting to set the boolean
14:38 Dyrcona Ah, there you gol!
14:38 Dyrcona Go, even.
14:38 terranm Fix pending...
14:39 csharp_ terranm++
14:40 Dyrcona I don't like that it's redundant code. That ought to be refactored to a single method to set the user message values that could be called from either Event.pm or EventGroup.pm, but I'll leave that as an exercise for myself for later. :)
14:45 terranm +1
14:45 Dyrcona terranm++
14:46 terranm https://bugs.launchpad.net/evergreen/+bug/1958573
14:46 pinesol Launchpad bug 1958573 in Evergreen "Action triggers that create messages for Patron Message Center are setting visiblity to false" [High,New]
14:46 terranm patch ready for testing
14:46 Dyrcona Yeah, I got the email. I've not seen it in the wild, but I think looking at the code qualifies me to confirm it.
14:49 mmorgan terranm++
14:54 terranm joined #evergreen
14:56 rjackson_isl_hom joined #evergreen
16:31 terranm Perl change in place and verified that it's working in our production environment with grouped PMC messages
16:33 jihpringle joined #evergreen
17:03 mmorgan left #evergreen
17:52 mantis1 joined #evergreen
18:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
19:08 jihpringle joined #evergreen
19:44 jihpringle joined #evergreen
23:08 Keith_isl joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat