IRC log for #evergreen, 2019-07-26

All times shown according to the server's local time.

Time	Nick	Message
07:07		rjackson_isl joined #evergreen
07:11	csharp	Happy SysAdmin Day to all of my sisters and brothers
07:12	csharp	and others
07:27		Dyrcona joined #evergreen
08:37		collum joined #evergreen
08:37		mmorgan joined #evergreen
08:43	Dyrcona	mmorgan: Setting old PO JEDI events to complete is a bad idea. It will break the pusher in at least two ways: 1) It will blow at line 142 if the purchase order in the target field no longer exists and 2) it will blow at line 161 if there are no acq.edi_message entries and the template output doesn't exist.
08:43	* Dyrcona	follows up on a conversation from a few days ago in a different channel.
08:44	Dyrcona	In fact, messing the state of old events seems like a bad idea, now that I've done it.
08:47	mmorgan	Dyrcona: Hmm. Good to know. I'm pretty sure I've only messed with the state of notice triggers, have not needed to do anything with PO JEDI events.
08:49		yboston joined #evergreen
08:50	Dyrcona	I plan to start purging old events when I'm back from vacation in August.
08:50	Dyrcona	I set up the retention intervals and all that last night.
09:06		bwillis joined #evergreen
09:08	csharp	purging events was taking too long for us so we stopped doing it (though maybe berick had a fix for that? not remembering clearly)
09:09	csharp	but the purge query was running for days
09:15		jvwoolf joined #evergreen
09:18	Dyrcona	Whee, fun with EDI continues: Can't call method "message" on an undefined value at /usr/local/share/perl/5.22.1/OpenILS/Utils/RemoteAccount.pm line 586.
09:20	Dyrcona	OK, that's a line that I modified to try getting the actual server messages, looks like a certain vendor wouldn't let us connect and the pusher just plows ahead trying to login and PUT files despite there being no FTP connection.
09:21	Dyrcona	Dumb as a box of chocolates...
09:21		nfBurton joined #evergreen
09:21	csharp	yeah - that mechanism isn't really robust :-/
09:22		yboston joined #evergreen
09:24	Dyrcona	I know. I have opened Lp bugs on it and plan to fix it if I ever have time after cleaning up the mess and doing other things that are apparently a higher priority, though I can't get to work on them, because I spend half my day resetting PO JEDI events and ORDERS messages.
09:28	Dyrcona	And, yes, another log line says the vendor is not allowing our "user" with 70-some odd accounts to login because of the brain deadness of the pusher.
09:28	Dyrcona	Well, it doesn't say why, but we get "User X cannot login." before we get to sending them the password.
09:39	csharp	Dyrcona: another thing to consider is moving to EDI attributes - we just recently got to a point where all of our acq libraries are using that mechanism
09:40	Dyrcona	csharp: We're getting there, but that ain't my department.
09:41	csharp	yeah - I started pushing for that really hard after doing the PO JEDI resets to the point of insanity
09:41	Dyrcona	Our real problem is the fetcher. I think it makes so many connections that it causes this vendor to block us for a while.
09:42		khuckins joined #evergreen
09:49	* Dyrcona	wrote a Perl script to reset the events and messages for purchase orders. I put it on my pastebin.
09:51		sandbergja joined #evergreen
09:54	berick	csharp: did you do an initial purge to get things sort of under control?
09:56	csharp	berick: yeah
09:56	berick	csharp: also, for reasons I don't recall -- likely speed fixes in the heat of battle -- the SQL I ended up using locally is slightly different. note the join and temp table. https://gist.github.com/berick/16a20b3da23c73fabe97575d50d601ae
09:57	csharp	I was running the purge nightly on a cron, but it started piling up because each query was taking > 72 hours
09:57	berick	ours only runs a few minutes each night
09:57	berick	omg
09:57	csharp	huh
09:57	berick	do you have the output indexes?
09:57	berick	on action_trigger.event
09:58	berick	e.g. "atev_async_output" btree (async_output)
09:58	* berick	looks for index discrepencies
09:58	csharp	oh - no I don't have that index
09:59	* mmorgan	doesn't have that index either
09:59	Dyrcona	I don't have any indexes on any of the output columns, apparently.
09:59	berick	huh, those should be in stock
10:00	berick	yeah, they're in the stock sql
10:00	Dyrcona	I'm looking with PgAdmin at the moment and it's not showing them.
10:00	berick	could have been missed in an upgrade
10:00	mmorgan	Looks like they're new in 3.2.6?
10:01	csharp	yeah, I see them in the stock def
10:01	berick	mmorgan: yep
10:01	csharp	haven't looked at outputs
10:01	csharp	er.. upgrades, I mean
10:01	mmorgan	They're in version upgrade 3.2.5-3.2.6, we're on 3.2.4
10:01	csharp	oh - ok
10:01	csharp	yeah we're on 3.2.3
10:02	csharp	with selected backports
10:02	Dyrcona	We're on 3.2.4 with selected backports
10:02	berick	those indexes will help /a lot/
10:02	mmorgan	Ditto on the selected backports :)
10:02	berick	apparently I also added an index to speed up the purging...
10:02	berick	"atev_def_state_update_time_idx" btree (event_def, state, update_time)
10:03	Dyrcona	I also have 3 odd tables in action_trigger: new_environment, new_event_def, and new_params.
10:03	Dyrcona	berick: That index should probably get bugged on Lp.
10:04	berick	Dyrcona: agreed
10:04	* berick	will add the lp shortly
10:04	* csharp	adds the index
10:04	csharp	(es)
10:05	csharp	berick++
10:05	Dyrcona	Looks like those odd tables come from something someone did locally a while ago and I can probably drop them.
10:07	Dyrcona	They are also quite empty. I will dop them on Tuesday night.
10:14	berick	huh, the (event_def, state, update_time) index was in our DB before my time.
10:15	berick	though it does look like it would help. maybe i'll wait and see if those indexes solve everyone's slowness first.
10:32	Dyrcona	I have (event_def, state), but not with update_time.
10:43		khuckins_ joined #evergreen
10:53		stephengwills joined #evergreen
11:02	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
11:15		jvwoolf joined #evergreen
11:22	rjackson_isl	just discovered that Cricket (phone plan) service provider changed their sms address of preference. Will changes to config.sms_carrier be "seen" without any type of autogen or restarting of services?
11:34	Dyrcona	I think so. I don't recall anywhere that gets cached by autogen.sh.
11:35	Dyrcona	Alos, adding those a/t event output indexes made my PO JEDI reset script noticeably faster on my test VM.
11:35	csharp	cricket--
11:36	Dyrcona	Well, cheap wireless in general.....
11:36	csharp	they've done that several times over the last few years - at one point we just removed them from the public listing
11:36	csharp	rjackson_isl: no autogen.sh required, but it may be cached by the browser
11:37	jeff	cricket++ (even though they're just AT&T by another name)
11:37	rjackson_isl	csharp++ Dyrcona++ - now waiting to see if the change worked :)
11:38	csharp	jeff: my only experience of them is from complaining patrons regarding Evergreen, thus my decrementing :-)
11:38	Dyrcona	AT&T isn't AT&T any more... :)
11:38	jeff	Yeah. I can't fault them for us doing it wrong. :-)
11:40	Dyrcona	And, I think my a/t runner is faster, too!
11:40	Dyrcona	Could be that there were only a handful of events to process, though.
11:57		sandbergja joined #evergreen
12:10	Dyrcona	Looking at the action_trigger.purge_events function, it's probably linked_outputs query that's taking so long. It might be faster to do the whole delete in a loop on the output of a select statement.
12:11	Dyrcona	Better than deleting "where not in ..."
12:14	khuckins_	It looks like the latest version of Chrome might be breaking OPAC auto-suggest functionality, is anyone else able to confirm?
12:25	jeff	is there a literal WHERE NOT IN?
12:25	* jeff	looks
12:30	jeff	yeah. i'd try moving the union queries from being in a CTE to being in a subselect and changing the NOT IN to NOT EXISTS.
12:30	* jeff	thinks
12:32	jeff	or just three not exists, even.
12:36		collum joined #evergreen
12:37	jeff	well, it makes an improvement selecting from a small sample where there are 5515 orphaned outputs. from 5832.446 to 3600.746 ms.
12:40	phasefx_	bshum++
12:43	jeff	(adding the missing indexes would also be required to speed up the delete on event_output)
12:43	jeff	but the DISTINCT + CTE + NOT IN still has potential to slow things down, especially in a larger-than-our db.
12:44	Dyrcona	I was away for a bit, but yeah.
12:44	nfBurton	Hey. So I have been trying to add my live data to my development server and can't seem to get opensrf to cooperate. If I create the default DB as part of the Evergreen installation srfsh checks work fine. Restoring from my pg_dump and trying the same thing with srfsh gives me "received Data: 'x'". The request completes successfully but isn't actual
12:44	nfBurton	ly working, even if I reset the password in the database the srfsh login "Login_failed" is the response. Is there something I may be missing I need to modify?
12:44	Dyrcona	I think the lack of index on update_time is also a big factor.
12:45	Dyrcona	nfBurton: How did you reset the password in the database? Also, I'm not sure, exactly, what your problem is.
12:45	Dyrcona	jeff: I'll play with it some more next week, but after adding the output indexes, the db function finished in 31 minutes for me.
12:46	Dyrcona	I've been deleting individual events and outputs in functions lately, so I've been getting the output id into a variable, then deleting the event, then deleting the output.
12:47	Dyrcona	It also deleted 13.6 million ate rows.
12:52	nfBurton	I just did an update on the row and figured the trigger would take care of it. Or do I need to use a specific function?
12:52	nfBurton	I've narrowed it down to something to do with the DB
12:52	bshum	phasefx: It took longer than I thought to work out all the kinks, but I'm glad it's mostly working fine again :)
12:53	Dyrcona	nfBurton: You need to use a couple of functions, but I've got it into one.
12:53	nfBurton	Because the default loaded database works fine. It's just when I Drop it, run the Create Database line and reload from the pg_dump that it doesn't work. I can see my OUs but the searches/logins don't seem to work
12:54	Dyrcona	nfBurton: https://pastebin.com/YTLh7pC9
12:54	Dyrcona	nfBurton: Is the dump from the same version of Evergreen?
12:55	nfBurton	I'm dumping from 3.2.4 to HEAD. I've tried doing the upgrade scripts to match version too
12:56	Dyrcona	nfBurton: You're doing it wrong.
12:56	nfBurton	srfsh logs have no errors either
12:56	nfBurton	Oh? That's why I'm here lol
12:56	Dyrcona	If you have a complete dump of 3.2.4, then just pg_restore that as a new database, then rung the upgrade scripts.
12:57	Dyrcona	Don't create the database before hand. Chances are the restore will either drop it or things'll be fubar.
12:57	nfBurton	oh okay. I've been using psql -d -f
12:57	nfBurton	with the database and filename of course
12:58	Dyrcona	That's not likely to work.
12:58	nfBurton	Okay. Good to know
12:58	jeff	Dyrcona: "Don't create the database before hand" is contrary to how I usually do it. What experience leads you to make that recommendation?
13:00	Dyrcona	jeff: I do the following on a weekly basis: /usr/lib/postgresql/9.5/bin/pg_restore -U evergreen -h localhost -C -c -d postgres -j 8 ${dumpfile}
13:00	nfBurton	su root
13:00	nfBurton	oops
13:01	Dyrcona	Note the -C and -c options, to create and drop the database.
13:02	Dyrcona	I have had problems in the past trying to load dumps into existing databases, particularly when the versions mismatched. It has been several years, so I don't remember all of the details.
13:03	nfBurton	Thanks. I'm going to try that
13:03	nfBurton	Also, thanks for the password update function
13:03	nfBurton	Dyrcona++
13:05	jeff	nfBurton: Be aware that that command will overwrite any existing database that has the same name as the database name contained within the dump file. If you're on a cluster with no other databases, that probably isn't a concern, but it's worth a warning.
13:06	Dyrcona	Also note the specific path to the versioned pg_dump. I'm on a server with multiple clusters of different Pg versions. :)
13:06	nfBurton	Yeah, this is a stand alone dev server. NBD if I wreck it lol
13:06	Dyrcona	But, if your database is named evergreen, it's probably what you want to do.
13:07	bshum	"I'm gonna wreck it!"
13:07	Dyrcona	I also figured you were preparing for an upgrade.
13:07	nfBurton	haha preparing for contribution. I just need oodles more data
13:07	Dyrcona	I need to make time to practice upgrading to 9.6, then to 10, and then to 11. I guess I'll install 12 when it comes out, too.
13:08	Dyrcona	yeahp. I'm using an old mail server with 8TB of disk space for a development database server.
13:08	Dyrcona	Mulitple copies of production data hanging around.
13:09	Dyrcona	BTW, if you want two or more copies of a database, I find its faster to pg_restore the main and then make the clones using create database with the first one as a template.
13:09	jeff	this was my minor speed-up of the orphaned action_trigger.event_output query: https://gist.github.com/jeff/50e287ddb80b416c15cac6775617884c
13:10	jeff	by itself that does not speed up the actual delete.
13:10	Dyrcona	I think it would be faster to pull the whole ate row in a loop, delete from the output table with a coalesce on the output fields, then delete from ate.
13:10	jeff	but especially in a larger db or with a >5k orphaned outputs situation, it could help further.
13:11	Dyrcona	But orphaned outputs is a slightly different problem. :)
13:11	Dyrcona	Though the purge events function looks to solve that problem, too. :)
13:12	jeff	well, the current purge is a two step delete where step 1 creates a bunch of orphaned output and then step 2 deletes that. :-)
13:12	Dyrcona	What's fun is I've found ate rows pointing to nonexistent output rows.
13:12	jeff	(but yes, it has a side effect of also removing outputs that were previously orphaned before step 1 ran)
13:12	Dyrcona	jeff: Yeahp.
13:14	* Dyrcona	tries jeff's second query on the reports db to see how many orphaned rows it turns up.
13:15	Dyrcona	I wonder if a single left join would be faster?
13:15	jeff	the indexes on action_trigger.event.{template_output,error_output,async_output} should be what speed up the delete most.
13:16	jeff	Dyrcona: left join where the output id is equal to template_output OR error_output OR async_output then have the delete WHERE restrict to where the joined table row is null? dunno, i'll try!
13:18	Dyrcona	I'll have to let this go for now. I've got something else that I should make sure I finished this morning. Been jumping from thing to thing.
13:19	jeff	I am not at all familiar with that feeling.
13:19	* jeff	ducks
13:19	jeff	the left join (at least, as I wrote it) is pretty terrible, it turns out.
13:21	Dyrcona	Yeah, I wondered. The the ORs probably slow it down considerably.
13:22	jeff	surprisingly terrible!
13:28	jeff	cancelled after 584388.216
13:53		sandbergja joined #evergreen
13:59		yboston joined #evergreen
14:10		khuckins joined #evergreen
14:58	Bmagic	Does SIP for Evergreen allow for "item limits" to be conveyed in the 64 message?
14:59	nfBurton	Don't believe so. It just blocks when you hit max. But there is no limit communicated
15:00	jeff	Bmagic: what field are you asking about?
15:00		mmorgan left #evergreen
15:00	Bmagic	bibliotheca seems to think that we should respond in the 64 message the item limit
15:01	Bmagic	"between the permanent location and valid patron are the listed limits for patron account."
15:01	nfBurton	I don't believe that is a standard SIP field
15:01	jeff	Bmagic: for what purpose, and do you know what field you're talking about?
15:01	Bmagic	"BZ" (if I'm reading this email right)
15:02	Bmagic	FID_HOLD_ITEMS_LMT is what our code says
15:02	jeff	okay, hold items limit.
15:03	nfBurton	Actually, it does exist in the 3M documentation
15:03	nfBurton	http://multimedia.3m.com/mws/media/355361O/sip2-protocol.pdf
15:03	nfBurton	But no, our 64 response doesn't include it
15:03	Bmagic	"not supported" by Evergreen?
15:05	jeff	SIPServer doesn't populate that field.
15:05	jeff	How does bibliotheca plan to use it?
15:05	nfBurton	Also, this tool is amazing for testing SIP if you don't have it https://clcohio.org/sip-testing-tool/
15:05	Bmagic	cool, (I sorta knew that because they are complaining about it)
15:06	Bmagic	awesome tool! I will get that for sure
15:07	Bmagic	I think they would use this information to artificially block patrons at their limit
15:11	Bmagic	Evergreen has this issue when using self check machines. Where the patron is allowed to go over their limit by 1. This is because, the self check machine will check the item out and then the patron will be blocked (once the penalties are calculated)
15:11	Bmagic	it needs to "digest" the patron account once
15:14	Bmagic	Luckily we don't have many libraries using self-check as primary. We've been getting around it by running a sql update every minute to basically "sync" this specific patron penalty for this specific branch
15:14	jeff	oh?
15:14	Dyrcona	I would be difficult to guessitmate a holds limit from Evergreen since it varies depending all kinds of factors.
15:14	Bmagic	Dyrcona: I agree, I think the problem is complicated
15:14	* Dyrcona	can't type today.
15:14	Dyrcona	Why is anyone wanting to use it?
15:15	Bmagic	Dyrcona: I think they would use this information to artificially block patrons at their limit
15:15	jeff	Bmagic: are you talking about holds or circs?
15:15	Dyrcona	That's fine, except we don't know what that limit is because it varies.
15:15	Bmagic	circs (I know the vendor is asking about the BZ field which is confusing things I htink)
15:16	* jeff	goes to test something
15:17	Dyrcona	Send them -1 and see what their software does. :)
15:19	Bmagic	Dyrcona++
15:23	jeff	My testing doesn't reproduce the issue.
15:25	jeff	I had an account with an item checked out. I changed the account to a profile which is limited to a very small number of items (3). I then checked out another item, bringing it to two. In a new session, I then tried to check out two more items, and I was blocked from checking out the fourth item.
15:27	Bmagic	hmmm
15:27	Bmagic	jeff++
15:28	Bmagic	maybe we are fixing an issue that isn't there (anymore)
15:28	jeff	what events are you configured to override in the SIPServer config?
15:28	Bmagic	COPY_ALERT_MESSAGE COPY_BAD_STATUS COPY_STATUS_MISSING
15:29	jeff	interesting. and are you able to reproduce the issue, or are you going on hearsay?
15:30	Bmagic	I reproduced it like 6 years ago
15:30	Bmagic	never thought about again until now
15:30	Bmagic	might be time to disable that cronjob :)
15:35		sandbergja joined #evergreen
16:08		jvwoolf left #evergreen
16:37		khuckins joined #evergreen
23:02	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>