IRC log for #evergreen, 2021-01-26

All times shown according to the server's local time.

Time	Nick	Message
01:56		awitter joined #evergreen
06:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:18		rjackson_isl_hom joined #evergreen
07:51		mantis1 joined #evergreen
08:00		mantis1 joined #evergreen
08:04		rfrasur joined #evergreen
08:39		jvwoolf joined #evergreen
08:40		mmorgan joined #evergreen
08:50		collum joined #evergreen
09:03		Dyrcona joined #evergreen
09:57	Dyrcona	Truncate action_trigger.event and a test of the daily a/t runner goes really fast.
10:00		Christineb joined #evergreen
10:08	Bmagic	Dyrcona++ # burn it all down
11:14	Dyrcona	Well, it still takes a while to churn through 14,286 events. But collecting them into the table was really fast. :)
11:48	csharp	berick: new fix applied to PINES production - so far so good, drone-wise - I'll let you know if we hear complaints
11:53	berick	csharp: cool
12:03		jihpringle joined #evergreen
13:01	jeffdavis	csharp++ # sometimes you gotta test in production
13:06	csharp	jeffdavis: yep, it's the only way to see if something that only emerges in production actually works!
13:36	jeffdavis	updated bug 1896285 - I think the 3.5 version is OK to go in rel_3_5
13:36	pinesol	Launchpad bug 1896285 in Evergreen 3.5 "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285
13:46	csharp	jeffdavis: +1
14:17		alynn26 joined #evergreen
14:33		sandbergja joined #evergreen
14:53	Bmagic	jeffdavis: I rolled that patch into our production 3.5 system last night. It's been more stable today
14:58	Bmagic	csharp: did you put opensrf_bug 1912834 on production as well?
14:59		mantis1 left #evergreen
15:06		pinesol` joined #evergreen
15:11		sandbergja joined #evergreen
15:20	csharp	wow - something appears very broken here - something sent apparently hundreds of null ids to a staff search and it went for 6.5 minutes with the accompanying NOT CONNECTED errors
15:21	csharp	this is the activity.log call: https://pastebin.com/VnNexDjf
15:23	csharp	Bmagic: yes, we've applied that too
15:24	Dyrcona	@blame tsbere
15:24	pinesol	Dyrcona: tsbere stole bshum's tux doll!
15:24	csharp	(well, that's the "new fix" I was telling berick about earlier)
15:24	csharp	@seen tsbere
15:24	pinesol	csharp: tsbere was last seen in #evergreen 3 years, 36 weeks, 4 days, 1 hour, 3 minutes, and 17 seconds ago: <tsbere> er, not anoning
15:25	berick	csharp: that api is from the staff catalog. should be a quick fix
15:25	Dyrcona	It's nothing major, just you have to restart open-ils.circ after changing the circ.opac_renewal.use_original_circ_lib global flag.
15:26	Dyrcona	According to git blame, tsbere added the code that caches the setting.
15:26	csharp	Dyrcona: UNBELIEVEABLE
15:26	csharp	berick: awesome
15:28	Dyrcona	The code has been there since 2011, so you'd think I would have known....
15:28	Dyrcona	@blame Dyrcona
15:28	pinesol	Dyrcona: Dyrcona is probably integrated with systemd
15:28	Bmagic	csharp: did you determine the reason for the cataloging (bucket?) issue?
15:31	csharp	Bmagic: berick did the work, I just applied it this morning
15:31	berick	the issue was the patch needed fixing
15:31	Bmagic	right, so, are you saying the bucket issue was related?
15:31	csharp	the bucket issue? sorry I may not be following
15:32	csharp	what the patch does is throttle concurrent OpenSRF requests so as not to overwhelm services like open-ils.actor
15:33	csharp	so a cataloger adding 50+ items to a bucket was creating 50+ actor drones and if you multiply that by $pines_catalogers, it gets hairy real fast
15:34	berick	csharp: you said something about buckets yesterday when you were finding misc. bugs from the patch
15:34	berick	(though none of the bugs were really bucket related)
15:37	csharp	berick: ah - thanks
15:37	Bmagic	right on - well, I'm going to merge that opensrf branch into production this evening as well. Having bug 1896285 - we noticed an improvement, but today, we still saw a couple of machines get out of control and die
15:37	pinesol	Launchpad bug 1896285 in Evergreen 3.5 "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285
15:37	csharp	Bmagic: yeah, I think the bucket thing was just shrinking resources affecting all kinds of things
15:37	csharp	the complaints were varied but all dealt with batch actions
15:37		sandbergja joined #evergreen
15:38	csharp	and today has been relatively quiet after applying the latest version of the fix
15:38	Bmagic	throttling the number of requests to 5 at a time should help ALOT
15:38	csharp	yes, system-side has been back to what I think of as "normal"
15:38	Bmagic	that's encouraging, patching now...
15:39	berick	csharp: mind checking something? that null-blast issue, do you see vandelay api calls around the same time (just prior)?
15:39	Bmagic	Thinking back - I noticed a huge impact on the servers moving from XUL to the web based client
15:39	berick	just found that vandelay also uses that api, and may be more likely the culprit
15:40	csharp	berick: looking now
15:40	Bmagic	at that time, we upped the hardware on all of the bricks, fleet-wide, to cope with the onslaught of requests that the web client seemed to be doing. It seems this "shot-gun" blast from the Evergreen web client is not new
15:40	berick	Bmagic: IIRC, the XUL client natively limited the number of XHR requests to something like 8 at a time, so it always had a baked in limit
15:41	Bmagic	berick: I was wondering
15:44	csharp	berick: this looks like the culprit: open-ils.search open-ils.search.biblio.multiclass.query.staff {"limit":1001,"offset":0}, "(keyword:the prophets) site(PINES)
15:45	berick	csharp: huh, was trying stuff like that couldn't make it happen. looks like also protected in the code, but i'm clearly missing something..
15:45	csharp	I'll try that call from srfsh to see what happens
15:45	csharp	or maybe it won't act the same from srfsh?
15:46	csharp	that came back super fast, so I guess not
15:49	berick	yeah, i can't get the catalog to send nulls, strange
15:50	csharp	open-ils.acq open-ils.acq.purchase_order.retrieve "<REDACTED>",, 7030, {"flesh_price_sum
15:50	csharp	mary":true,"flesh_provider":true,"flesh_lineitem_count":true}
15:50	csharp	that's right before as well
15:51	csharp	well, 2 seconds before
15:52	berick	csharp: does this return 0? select count(*) from vandelay.bib_match where eg_record is null;
15:53	berick	0 is exptected, but just to rule that out
15:53	Bmagic	zero for me
15:53	csharp	0 here too
15:53	berick	k
15:54	csharp	I can confirm that the record IDs before all the nulls all lead to records with "the prophet" somewhere in the record
15:54	berick	csharp: ok, good, that settles that
15:56	berick	csharp: any local diffs to origin/master for Open-ILS/src/eg2/src/app/share/catalog/catalog.service.ts or search-context.ts ?
15:56	csharp	lemme look
15:56	csharp	2021-01-26 14:59:41 brick01-head gateway: [ACT:60271:osrf-websocket-stdio.c:559:1611691160602718] [127.0.0.1] [] open-ils.search open-ils.search.biblio.record.catalog_summary.staff 191, [null,null,null,null,null,null,null,null,null,null]
15:56		jonadab joined #evergreen
15:56	csharp	another example of fewer
15:56	csharp	it's only happened seven times today
15:57	berick	interesting
15:58	berick	or bib-record.service.ts
15:59	csharp	ah...
15:59	csharp	I think I found the issue
16:00	csharp	to allow searching in the (goddamned) search box on the staff page, I changed "let query = ts.query[idx];" to "let query = decodeURIComponent(ts.query[idx]);" around line 510 or so
16:01	csharp	I have a feeling that's causing an unexpected side effect
16:01	csharp	we may just have to remove that box until we can safely redirect to the eg2 version
16:02	csharp	that's in Open-ILS/src/eg2/src/app/share/catalog/search-context.ts btw
16:05	csharp	apparently we can't win on the OpenSRF throttling either - getting complaints of things not loading in acq - I'll try to get more specifics
16:06	csharp	argh! it was such a smooth upgrade server-side - these piddly issues are going to be the death of me
16:07		sandbergja joined #evergreen
16:56		sandbergja joined #evergreen
17:00	Bmagic	csharp: JS cache client-side might account for the patch not "taking hold" on some workstations
17:26		mmorgan left #evergreen
17:49		Cocopuff2018 joined #evergreen
18:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:02		book` joined #evergreen
19:04		Dyrcona joined #evergreen
19:09		sandbergja joined #evergreen
20:21		jonadab joined #evergreen
20:41		Dyrcona joined #evergreen
21:10		sandbergja joined #evergreen
21:20		sandbergja joined #evergreen
23:14		jamesrf joined #evergreen
23:38		Cocopuff2018 joined #evergreen