Evergreen ILS Website

IRC log for #evergreen, 2021-12-15

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
06:01 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:42 collum joined #evergreen
08:07 rjackson_isl_hom joined #evergreen
08:15 mantis joined #evergreen
08:36 mmorgan joined #evergreen
09:14 rfrasur joined #evergreen
09:20 Dyrcona joined #evergreen
09:21 Dyrcona Who else thinks the pivot option should be removed from the reporter?
09:26 csharp_ I think it doesn't offer a lot of value - Excel(orwhatever) does it so much better
09:27 csharp_ most end users have no idea what it is, even if they wanted to use it
09:27 rhamby I can agree with that.
09:27 csharp_ we've tried to train our libraries to see the Evergreen reporter as just a source of raw data and not be depended on for end-user-friendly output
09:27 Dyrcona It also crashes on our reports server with only 32GB of RAM.
09:28 csharp_ oh - hmm
09:28 rhamby If you're going to use a pivot table it's better to do it in the spreadsheet software.
09:28 csharp_ huh - our reporter currently has 16GB of RAM and we don't see that trouble - it's running on blazing fast blades at ITS though
09:29 Dyrcona I'm looking at what it takes to rip it out. I'll open a Lp bug after I do that.
09:29 csharp_ storage is basically RAM in that case'
09:29 csharp_ Dyrcona: I'm for it
09:29 csharp_ as long as it doesn't have tendrils leading into a full redesign of the reporter
09:29 Dyrcona Someone tried to run two circ reports yesterday with the pivot table options, and both reports exhausted the RAM on the server and the reports were put down by the OOM killer.
09:30 Dyrcona Well, a full redesign of the reporter would not be a terrible idea.
09:30 csharp_ very much agreed - just didn't want the removal of an annoying value-add as a driver for that redesign :-)
09:31 Dyrcona Pretty much any time a report dies, two conditions have been met: 1) the reports server ran out of RAM and 2) the pivot options were used.
09:31 csharp_ how many concurrent reports, btw?
09:32 Dyrcona We allow up to 6 concurrent reports, usually never have more than two or three, except after several hours down time for an upgrade or whatever.
09:32 Dyrcona I'll check the schedule for yesterday.
09:32 csharp_ gotcha
09:32 csharp_ we do 6 at a time too
09:32 csharp_ at one point we were doing 12, but that was trouble
09:37 Dyrcona We only had these two reports running at the time. Same report, probably. Looks like user scheduled it again at 7:00 pm when the one started at 5:35 pm hadn't finished, yet.
09:38 Dyrcona Started getting memory warning emails around 6:30 PM, but I wasn't paying attention to work email at that time or after.
09:39 Dyrcona Oh, well. I have a meeting in 20 minutes. I should at least look at the agenda.
09:41 jvwoolf joined #evergreen
09:48 Keith-isl joined #evergreen
09:58 bshum joined #evergreen
10:35 miker fwiw, removing pivot should be a matter of hiding some html elements. making it smarter (at UI reimplementation time) and showing it as "compare [aggregate colun] across [non-agg column]", and only when it's reasonable, would be possible, too
11:19 csharp_ following up on yesterday's ejabberd findings - I can't tie the occurrences I'm looking at with a specific OpenSRF call yet
11:19 csharp_ the message in the ejabberd log is 2021-12-15 10:20:42.390 [info] <0.31885.41>@ejabberd_c2s:process_terminated:271 (tcp|<0.31885.41>) Closing c2s session for opensrf@private.brick03-head.gapines.org/open-ils.ac​tor_listener_brick03-head.gapines.org_32971: Connection failed: timeout
11:20 csharp_ when I've turned up ejabberd logging to debug it's apparently too much data and I end up with a truncated log where logging just stops midstream
11:23 Dyrcona csharp_: I only ever use debug logging for brief periods, and I typically truncate the log before.
11:24 Dyrcona It's not so useful in production. Better in a controlled environment where yours are the only requests happening.
11:25 jihpringle joined #evergreen
11:25 Dyrcona miker: The widgets look like their hidden to start with, and there is code to unhide them. I didn't get very far before my 10am meeting. I'm inclined to just yank all the pivot code. Let them use Excel (or LibreOffice)!
11:26 Dyrcona there they're their, choose wisely. :)
11:29 * Dyrcona sidles off to get lunch.
11:29 csharp_ Dyrcona: yeah, that's the problem - this happens only sporadically and I have no idea what's causing it
11:29 csharp_ so it's all or nothing :-/
11:29 * csharp_ digs around in the ejabberd source code for clues to where timeout is defined/handled
11:36 miker csharp_: did you recently upgrade your perl?
11:36 csharp_ yes - from 5.22? to 5.26
11:36 csharp_ (16.04 to 18.04 ubuntu)
11:36 csharp_ currently on v5.26.1
11:38 csharp_ yes, 16.04 was on 5.22
11:40 miker csharp_: can you scan your logs for instances of "server: died with error" ?  of particular interest are "Use of freed value" and "Can't kill a non-numeric process ID"
11:40 miker re https://bugs.launchpad.net/opensrf/+bug/1953047 and https://bugs.launchpad.net/opensrf/+bug/1953044
11:40 pinesol Launchpad bug 1953047 in OpenSRF "Perl services can crash with a "Can't kill a non-numeric process ID" error" [Medium,New]
11:40 pinesol Launchpad bug 1953044 in OpenSRF "Perl services can crash with a "Use of freed value in iteration" error" [Medium,Confirmed]
11:41 miker in a high-drone-turnover environment, '044 can happen when a drone exits after max-requests at an inopportune time
11:41 csharp_ I see the first two, but not 'Use of freed value'
11:42 csharp_ I would *gladly* update OpenSRF to fix this though
11:44 miker fwiw, perl 5.24 seems to be the boundary version where those two fixes are relevant.
11:44 csharp_ ok, then I will apply them, by god
11:45 csharp_ since they're perl, I can just hot patch them and rollback if necessary
11:46 miker and they're both restricted to 1 file, so, easy peasy
11:46 csharp_ yep
12:23 mmorgan Has anyone gathered some experience with 3.7+ in a production database with saving/loading bib records? We're trying to get an idea of the effect of the symspell dictionaries for did you mean.
12:26 csharp_ mmorgan: we're going to 3.8 next month - running it on a training/testing server that is production-ish, datawise
12:26 csharp_ what should we be looking out for?
12:26 jeff We're not relying on "did you mean" and did not populate the related tables as part of our upgrade. I've noticed a deadlock or two which have made me want to look into disabling the relevant triggers until I have time to look into it more.
12:28 jihpringle mmorgan: we're running 3.7.0 and do not have "did you mean" turned on.  In testing with it on we couldn't save any MARC records
12:29 jihpringle we haven't tested with any of the post 3.7.0 "did you mean" fixes yet
12:29 mmorgan csharp_: Our big concern is loading records with 856 links. We do a lot of that, but general saving and loading of bib records via vandelay is our concern, too.
12:31 mmorgan We've just begun testing with production data and so far are seeing the records with 856 links taking MUCH longer, also getting general.unknown error for some records that have failed to load.
12:31 mmorgan jeff: Not sure what you mean by a 'deadlock' ?
12:32 mmorgan jihpringle: We've loaded some of the post 3.7.0 fixes.
12:32 berick jeff: if you decide to disable, mind sharing what all you disable?
12:33 mmorgan We're very early in testing, and were just wondering if others were ahead of us and had experiences regarding saving/loading records.
12:38 mmorgan jihpringle: Have you done anything other than turning off "did you mean"? My thinking is that even with it off, the sysmpell dictionaries in the database would still be updated without disabling triggers like jeff mentions.
12:38 csharp_ mmorgan: a deadlock is a postgresql level problem where two processes are waiting on each other to release the lock on a paritcular tuple ("row")
12:39 mmorgan csharp_: Ah. Ok, thanks. I think I've seen log entries complaining about tuples.
12:39 jeff mmorgan: the postgresql logs will log a deadlock when the database detects that process X and process Y (and maybe process Z) are all waiting on things that each other are waiting on. It's one of the reasons that pingest doesn't do certain bib-related ingest things in parallel.
12:39 collum joined #evergreen
12:40 jeff looks like we had one weird one on actor.usr_setting (which I don't have enough information to reproduce at this point), and the only other one was:
12:40 jeff Process 456530: INSERT INTO biblio.record_entry...
12:40 jeff Process 456523: SELECT  asset.merge_record_assets("bre"...
12:41 jeff that one's close enough that I'd look into symspell triggers being a contributing factor, but it might be something else also.
12:41 csharp_ fwiw, when addressing bug 1931737 we disabled all the maintain_symspell_entries_tgr triggers on the metabib.*_entry tables
12:41 pinesol Launchpad bug 1931737 in Evergreen "Did you mean breaks parallel reingest" [Undecided,Confirmed] https://launchpad.net/bugs/1931737
12:41 jeff If I improperly maligned symspell triggers in my deadlock mention earlier, I'm sorry. :-)
12:42 csharp_ @blame symspell triggers
12:42 pinesol csharp_: symspell triggers caused the white screen of death!
12:42 jeff berick: I hadn't looked into it yet, but the triggers mentioned in the bug csharp_ just linked are the ones I had first in mind to look at disabling. It doesn't currently look like we're having enough issues to warrant me looking at it immediately, though.
12:42 jeff now pinesol is improperly maligning the triggers!
12:43 csharp_ @blame pinesol for blaming the triggers
12:43 pinesol csharp_: itself wants the TRUTH?! itself CAN'T HANDLE THE TRUTH!! for blaming the triggers
12:43 jeff is "Improperly maligning triggers..." the new "Reticulating splines..."?
12:43 csharp_ @ana Improperly maligning triggers
12:43 pinesol csharp_: Gleamingly gripping terrorism
12:44 jeff Ah, that's the problem right there: your triggers are out of malignment!
12:45 csharp_ @quote add <jeff> Ah, that's the problem right there: your triggers are out of malignment!
12:45 pinesol csharp_: The operation succeeded.  Quote #219 added.
12:46 csharp_ when it comes down to it, aren't we all just "waiting for more data from parent"?
12:48 * csharp_ is referring to the server error mentioned yesterday http://irc.evergreen-ils.org/​evergreen/2021-12-14#i_497049
13:00 mmorgan testing a file of 1000 records with 856 links on our 3.7 test system is not looking good. 6% progress after an hour, and most failed.
13:03 JBoyer There are a couple dym-related patches that you may not have depending on your 3.7.X version. Since they're all just function updates they can be applied anytime and absolutely should be if you don't have them.
13:03 JBoyer Sadly, I don't actually have a list handy to refer to...
13:05 jeff ...and Launchpad search returns 101 open tickets on a search for "did you mean"... :-P
13:05 jeff bug 1931626
13:05 pinesol Launchpad bug 1931626 in Evergreen "Did you mean: search suggestions exist for deleted records and can result in no hits" [Medium,Confirmed] https://launchpad.net/bugs/1931626
13:05 jeff bug 1931625
13:05 pinesol Launchpad bug 1931625 in Evergreen "Did you mean: diacritics cause erroneous search suggestions, resulting in no hits" [Undecided,New] https://launchpad.net/bugs/1931625
13:06 JBoyer The big ones are lp 1931162 (3.7.2) and lp 1947173 (3.7.3)
13:06 pinesol Launchpad bug 1931162 in Evergreen "Did You Mean optimization fails for some data sets" [High,Fix released] https://launchpad.net/bugs/1931162
13:06 pinesol Launchpad bug 1947173 in Evergreen 3.7 "Did You Mean Symspell dictionary updates can significant slow record ingest" [High,Fix committed] https://launchpad.net/bugs/1947173
13:07 jeff bug 1947173
13:07 JBoyer sometimes Launchpad *can* search if you only want to look at fix committed and fix released. :D
13:07 jeff looks like bug 1931162 made it into 3.7.2
13:07 pinesol Launchpad bug 1931162 in Evergreen "Did You Mean optimization fails for some data sets" [High,Fix released] https://launchpad.net/bugs/1931162
13:07 mmorgan 1947173 we don't have in place, so that's one place to start
13:08 JBoyer Yeah, the () was the version they were released in. I couldn't remember if they both made it out yet.
13:08 mmorgan We should already have 1931162, but will double check
13:11 * mmorgan confirms we do have 1931162
13:14 JBoyer taking another look at the search results it looks like the released patches are the only ones I was really worried about, but if you don't have both you really, really want to get both asap.
13:15 JBoyer The two jeff posted aren't fixed yet but that appears to be it for now.
13:16 mmorgan So we can apply 1947173, and next try disabling the maintain_symspell_entries_tgr triggers to see if those steps affect the behavior we're seeing. That should shed more light.
13:16 mmorgan jeff++
13:16 mmorgan JBoyer++
13:16 mmorgan csharp_++
13:16 mmorgan jihpringle++
13:16 mmorgan If anyone discovers anything more, I'd be interested!
13:19 JBoyer Fun fact about 1931162: it was initially found when accidentally searching for the word "metarecord" but it turns out that pretty much every ISBN search will trigger it, leaving you with Pg processes spinning for ages as they gather ISBNs in your system to recommend...
13:24 jeff Well that's interesting... Chrome tells me that my https://host.foo.example.org/eg/opac/home and https://host.foo.example.org/eg/staff/home connections are secure and have valid certs, but if I dig down into [lock icon] -> Connection is secure -> Certificate is valid, for the /eg/staff/home tab it shows an expired certificate.
13:25 jeff The certificate has not changed in the lifetime of my browser session.
13:25 jeff The hostname in question points at a single IP fronted by a single nginx instance pointing at a single backend server.
13:26 JBoyer I've seen that before also, but not tracked it down. How's the cert that apache's using look?
13:26 jeff If I had to guess, I'd say that the expired certificate in question might be on the backend host... I'm just wondering how it's making its way to the browser, yet in a non-fatal way.
13:27 jeff Hrm. Also need to determine if the paths in question are both going to the same backend. I think they are, but I should check. Again though, same question: why is my browser even getting a whiff of the backend cert?
13:30 jeff co-worker reproduced, then did a shift-reload on the staff url, and the expired cert was no longer there.
13:32 JBoyer Fun caching interactions, perhaps. I think it can take a shift-reload to force certs to be redownloaded
13:32 jeff nope, backend apache instance is using a self-signed cert.
13:33 jeff the cert in question would have been replaced in... February.
13:33 jeff which predates this laptop (but not the account that Chrome is currently using for sync, so...)
14:44 jeff Interesting. Record doesn't appear in keyword search but does show in series search for same terms. metabib.keyword_field_entry appears to have the keywords in the search, which is: who would win
15:03 jeff somewhere along the process of looking at this, I found a special number of metabib.keyword_field_entry rows that matched one of my debug queries: 404.
15:03 jeff :-P
15:05 jeff also, I'm not sure I noticed before today that there's a reference in PostgreSQL documentation to The Sudbury Neutrino Detector.
15:07 mmorgan :)
15:17 * mmorgan is having another Weird Wednesday issue. Has anyone else had issues emailing bib records from the opac?
15:17 mmorgan I get the preview, but when I click Email Now, no email, ever.
15:19 mmorgan I can see the preview action trigger event in the database, that is complete, but that preview trigger is the only one I see.
15:26 mmorgan Can anyone else confirm that they can successfully email bib records from their opac?
15:38 jihpringle100 joined #evergreen
15:54 collum mmorgan:  I can confirm.  I tried several times in our opac and did not receive an email. The preview event is in the database.
15:56 mmorgan collum++
15:56 mmorgan Thanks, I will open a LP bug.
15:56 * mmorgan was hoping it was just me :-(
16:36 csharp_ @decide everyone or just me
16:36 pinesol csharp_: go with just me
16:36 jvwoolf left #evergreen
16:38 berick almost sounds like a Prince song
16:39 csharp_ and log4j sounds like a Prince logging utility :-)
16:39 csharp_ Nothing Compares 2 Log4J
16:49 JBoyer I quite enjoyed this log4j take: https://twitter.com/leanrum​/status/1470954707120181253
17:01 mmorgan left #evergreen
18:00 jihpringle joined #evergreen
18:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
19:01 eglogbot joined #evergreen
19:01 Topic for #evergreen is now Welcome to #evergreen (https://evergreen-ils.org). This channel is publicly logged.
19:32 jihpringle joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat