Evergreen ILS Website

IRC log for #evergreen, 2022-12-09

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
07:07 kworstell-isl joined #evergreen
08:23 JBoyer joined #evergreen
08:31 mantis1 joined #evergreen
08:42 mmorgan joined #evergreen
08:52 Stompro joined #evergreen
09:06 Bmagic I can't seem to figure out where this index comes from: CREATE INDEX mrca_vlist_idx ON metabib.record_attr_vector_list USING gin (vlist evergreen.gin__int_ops);
09:07 Bmagic pg_restore throws that as an error. evergreen.gin_int_ops and evergreen.gin__int_ops doesn't appear in the Evergreen source
09:08 Dyrcona joined #evergreen
09:08 Bmagic wb Dyrcona
09:21 csharp_ Bmagic: I see it in three files: https://pastebin.com/4aAnhmWg
09:22 Bmagic csharp_++ # git grep ftw
09:22 csharp_ Bmagic: just learned that from Dyrcona
09:22 Bmagic it must have been dropped
09:22 csharp_ or maybe gmcharlt - not sure which
09:23 Bmagic I know there was a big switch from one type of index to another a couple versions ago
09:29 Bmagic so, trying to track down more information. I used: git log -p -- Open-ILS/src/sql/Pg/030.schema.metabib.sql
09:29 Bmagic I see the line: CREATE INDEX mrca_vlist_idx ON metabib.record_attr_vector_list USING gin ( vlist gin__int_ops );
09:30 collum joined #evergreen
09:30 Bmagic but nowhere in that history do I see CREATE INDEX mrca_vlist_idx ON metabib.record_attr_vector_list USING gin ( vlist evergreen.gin__int_ops );
09:30 Bmagic the difference being evergreen.gin__int_ops  vs.   gin__int_ops
09:32 Dyrcona Bmagic: gin__int_ops should not be in the evergreen schema.
09:33 Dyrcona It's part of the intarray extension.
09:33 Dyrcona And, this may help with our search issues on pg 12+, but I still have to look into that more today.
09:35 Dyrcona Bmagic: So, try to alter the index or drop it and recreate it with the line from the schema files. If I had to guess, I suspect something got botched during an upgrade.
09:39 Bmagic After the restore finished, I manually added the index without the evergreen qualifier, and it worked. That's not a problem. I was just diving deeper into the git history to try and figure out when* we updated our code to drop that qualifier off the CREATE INDEX statement
09:41 Bmagic so far, I've not been able to track it down. I assume it would appear in 030.schema.metabib.sql as evergreen.gin__int_ops at some point in history
09:42 Bmagic maybe postgres stuck that on there without us telling it to
09:43 Dyrcona That's because the evergreen qualifier was never there.
09:44 Bmagic I'm slowly realizing that.
09:44 Bmagic So, why in the world would PG have it like that?
09:45 Dyrcona grep -r intarray Open-ILS/src/sql/Pg might help provide a clue.
09:46 Dyrcona It could be that when one of the upgrade scripts was run to add intarray, the extension was added to the evergreen schema rather than the public schema.
09:46 Bmagic I'll buy that
09:47 dguarrac joined #evergreen
09:55 Dyrcona Talking to a read-only database: ERROR:  canceling statement due to conflict with recovery
09:57 Dyrcona OK, fine, but then try again in a couple of minutes: SSL connection has been closed unexpectedly. The connection to the server was lost. Attempting reset: Succeeded.
09:57 Dyrcona Guess that's not catastrophic after all.
10:03 Dyrcona miker: To answer your question from yesterday about how many bibs we have: deleted='t' 1,985,205; deleted='f' 2,237,018. This is the database used to generate the explain output on the different Pg releases. It's a dump of our production database from 12/04 and upgraded to 3.10.0.
10:18 Dyrcona It looks like it is the where on bre.vis_attr_vector. I just tried 'select id from biblio.record_entry where not ( int4range(0,268435455,'[]') @> ANY(vis_attr_vector);' and it is still running.
10:19 Dyrcona I'm going to add in the NOT NULL and run it again with explain after it returns.
10:20 Dyrcona Sorry, IS NULL rather than NOT NULL.
10:35 Dyrcona Nope....
10:36 Dyrcona Well, that took 3.7 seconds.
10:38 Dyrcona Not bad though, 3.7 seconds for a seq scan on 3.7 million records.
10:39 Bmagic better than a human
10:43 Dyrcona Adding an index on bre.vis_attr_vector doesn't make a difference in the explain output for the simple query.
10:44 Dyrcona Also, there's an error in the SQL I pasted before. I'm actually running this to test: select id from biblio.record_entry where vis_attr_vector IS NULL OR NOT int4range(0,268435455,'[]') @> ANY(vis_attr_vector);
10:44 Dyrcona I'm going to add an index on acvac.vis_attr_vector and try the big query again.
10:47 Dyrcona That doesn't seem to make a difference either, so I'll try the procedural index that miker suggested yesterday next.
10:54 Dyrcona "create index bre_not_deleted_idx ON biblio.record_entry (id) where deleted = 'f';" doesn't seem to help, either.
10:59 miker Dyrcona: since the EXPLAIN-reported predicate is (IIRC) "NOT deleted" you may want to try the index with that instead. but it may not help because an index is only used when it's expected to match less than a small part of the table
11:04 Dyrcona miker: All right. I'll remake the index. I just dropped it and the other two to run the original again. I noticed a 300,000ms difference, but I want to rule out caching.
11:05 Dyrcona Yeahp. Cache... The original with added indexes is finishing in the same time range.
11:06 Dyrcona create index bre_not_deleted_idx ON biblio.record_entry (id) where not deleted;
11:09 miker Bmagic: extensions can be created in specific schemas, and pre-extension contribs can be carried forward in upgrades. if you're restoring from a dump of a db where intarray (or the pre-extension contrib) was installed in the evergreen schema, but the db you're restoring into has it in public, you'll see what you're seeing
11:10 miker ah! I see Dyrcona came to the same conclusion/suggestion :)
11:12 Dyrcona miker: Performance is roughly the same with the index  (Execution Time: 114266.883 ms) as without (Execution Time: 114872.265 ms).
11:14 miker Dyrcona: re your issue, the other question I had was about the ingest.metarecord_mapping.preserve_on_delete internal flag. if that's true then the m.metarecord_source_map entries stick around on bib "delete", which is good if you want to be able to search for "deleted" bibs, but bad for this plan it would seem. is that on? (NTS: maybe we should check the flag and only use AND NOT deleted when it could have an effect?)
11:14 miker select * from config.internal_flag where name like '%delete%'; if you're in psql
11:16 Dyrcona I'm pretty sure that's on, but I'll double check.
11:17 Dyrcona Nope. enabled = 'f'.
11:17 miker kk
11:20 miker Dyrcona: a big change in pg 12 is the default for non-materialized CTEs. we can't just do it across the board yet, but do you have the full original query handy to test by hand? if so, in front of each CTE (WITH-clause), after the AS keyword, what happens if you add MATERIALIZED between AS and the open-( ?
11:21 Dyrcona I'll give that a shot. I have it open in my editor.
11:23 miker if that makes things happy -- and I can imagine it will because estimates of the number rows coming out of a tsearch CTE are ... usually way off -- then we /can/ add a PG version check and add it when it's available. there's precedent for that in the record attribute testing infrastructure.
11:27 Dyrcona It's worse, but I may have missed a couple: Execution Time: 115367.485 ms
11:31 Dyrcona OK! I missed c_attr and b_attr before. after adding materialized to those two, it was much better: Execution Time: 3123.997 ms
11:32 Dyrcona My guess is c_attr as materialized does the magic.
11:33 Dyrcona For the sake of satisfying my curiosity, I'm going to try it with materialized just on c_attr.
11:33 miker oh, interesting! but, not entirely surprising given what I /just/ saw in the docs at the end of this section:  https://www.postgresql.org/do​cs/14/queries-with.html#id-1.5.6.12.7 ("very_expensive_function()")
11:34 Dyrcona "That's a bingo!"
11:37 miker limit and offset are usually optimization fences ... Dyrcona, would you mind trying this? remove the MATERIALIZED on c_attr and add "OFFSET 0" just before the closing ) on that CTE
11:38 Dyrcona OK. I was about to say that we'd have to bump the minimum Pg version to 12 if we use materialized CTEs.
11:38 miker if that has the same effect, we should be able to (for now, until PG learns how to ignore useless limit/offset) do that.
11:39 miker as for versioning, we already test for the version in one other place and change query structure based on that. we could do it here if MATERIALIZED is the answer
11:40 miker we could also mark the function volatile to force materialization
11:40 Dyrcona Execution Time: 2675.806 ms
11:40 miker oh hey, look at that
11:40 Dyrcona So, the offset works, too.
11:40 miker and, just to clarify, right now that offset is the /only/ change to the query?
11:41 Dyrcona Yes.
11:41 miker right on
11:42 Dyrcona Which function were you thinking of marking volatile?
11:43 miker search.calculate_visibility_attribute_test
11:44 Dyrcona OK. I wasn't sure because asset.patron_default_visibility_mask is also used.
11:44 Dyrcona Should I try that, too, or are we happy with "offset 0"?
11:44 miker it is, but only in OPAC searches. staff search does use search.calculate_visibility_attribute_test thought
11:45 miker though, even
11:45 Dyrcona FYI, I'm testing on Pg 13, but it probably doesn't matter.
11:47 miker offset 0 is the more self-documenting variant that works for all pg version, for now. though, again, we can version-test and use materialized in 12+
11:49 miker testing offset 0 on an older version, just to be sure
11:49 Dyrcona I was talking about marking search.calculate_visibility_attribute_test as volatile.
11:50 miker right
11:50 Dyrcona I'm going to try it just because I can.
11:50 miker I'm saying "offset 0" is more self-documenting, because the query has a marker of "something odd here" ... that function isn't /actually/ volatile
11:50 miker +1, try away!
11:52 Dyrcona Volatile without offset 0 works, too, but I'll defer to your opinion on offset 0.
11:53 Dyrcona Execution time is about the same, 4ms difference which is probably not related to the change.
12:17 miker I got curious and checked the difference between ts_rank and ts_rank_cd ... it's bigger than I thought :(
12:18 miker in my test query, 3ms vs 90ms, respectively
12:22 jihpringle joined #evergreen
12:28 miker Dyrcona: do you want to branchify that, or shall I?
12:28 Dyrcona miker: ts_rank makes a roughly 67ms difference in execution time.
12:28 Dyrcona in my test environment.
12:29 Dyrcona If you want to branch it, that's fine with me. I was going to open a Lp bug, but I should actually look into yet another OOM on the db server.
12:29 Dyrcona It happened just before noon.
12:29 miker oh, well, that's not enough to warrant further investigation this moment. my test case may be unusual
12:29 miker I'll put together the patch and await your LP, then?
12:30 Dyrcona OK with me.
12:36 Dyrcona Is it possible to POST an OPAC search?
12:42 miker it is
12:43 miker (and, note, i was incorrect about staff search not using the patron vis test function -- it does, it just ignores that the the WHERE clause)
12:43 miker Dyrcona: https://git.evergreen-ils.org/?p=working/Ever​green.git;a=shortlog;h=refs/heads/user/miker/​optimization-fence-for-pg-12-CTEs-in-search for your eyeballs
12:45 Dyrcona miker++
12:46 Dyrcona Re POST: I guess we'll need to work out a mitigation for POSTS with "bad" queries in them.
13:33 kworstell-isl joined #evergreen
13:44 mschell joined #evergreen
14:01 jvwoolf joined #evergreen
14:24 kworstell-isl joined #evergreen
14:37 mschell Hi All,  I'm hoping someone might have some insight into a problem I'm working on.  If this isn't the proper forum just let me know (it's my first time on IRC).
14:37 mschell I'm trying to get the OPAC author display to show 'FirstName LastName' instead of 'LastName, FirstName'. There is a variable attr.author which pulls from MARC 100, which is what is displaying now but I wanted to see if using the MARC 245 $c would give me mostly what I want.  The detailed view of the title in the OPAC (attrs.title.extended) shows
14:37 mschell the 245 $a and $c so I could maybe get it from there but how to only show the $c info and how would it affect the search link that is generated.
14:37 mschell Does this approach make sense?  Is there a better way?
14:52 Bmagic mschell: you might be able to get away with a little regular expression in the OPAC to tease out the pieces that you need
15:02 Dyrcona miker | csharp_ : bug 1999274   I didn't link miker's branch because I thought you might like to add the bug # to the commit message.
15:02 pinesol Launchpad bug 1999274 in Evergreen "Performance of Search on PostgreSQL Versions 12+" [Medium,Confirmed] https://launchpad.net/bugs/1999274
15:08 Dyrcona mschell: You could use the 245 $c if you prefer, just alter the XPath expression. I'd like to add that some think that using MARC in the OPAC is a mistake and a future version of the OPAC may switch to using display fields.
15:08 sleary joined #evergreen
15:16 mschell Bmagic, thanks I'll try that.
15:17 mschell Dyrcona, that is a future version I look forward too :)
15:18 Dyrcona mschell: I wouldn't hold my breath. :)
15:19 Stompro Is the eg2 splash page configurable somehow?  It seems to not just be a template I can edit?
15:20 Stompro Ohh, staff portal page. see it now.
15:20 Bmagic Stompro: I don't think so.
15:20 Bmagic you'd need to edit the source, then run the npm build steps
15:21 jihpringle Bmagic: depends on the version, there's an admin interface starting in 3.8 or 3.9 I think (we definitely have it in 3.9)
15:21 Bmagic jihpringle++
15:22 mmorgan It's new in 3.9
15:22 Stompro I testing on 3.9, so I get to try that out.
15:25 jeffdavis In addition to the new staff portal config options, we add an iframe to the page so we can display news updates from another source: http://git.sitka.bclibraries.ca/gitweb/​?p=sitka/evergreen.git;a=commitdiff;h=4​5bfec6b5b1a7a8d0dfd898aa68926d17bb8f1ca
15:25 jeffdavis I should probably turn that into a feature request.
15:28 Stompro jeffdavis, thanks, that is interesting.
15:28 Dyrcona So, we've got the storage listener running on our utility server, but no drones. It happened while the fine generator was running. I've seen something like this before, but IIRC I think the listener was what got shut down that time. Does this sound like a case of the listener somehow crashing and turning into a drone?
15:29 Dyrcona Nifty! The listener is immortal.
15:30 Dyrcona kill -9 works.
15:31 kworstell-isl joined #evergreen
15:34 Stompro Shoot, "it won’t try to merge branch, system, and consortial-level entries" I cannot have System portal with the odd entry for a specific branch.  But I get that would be more complex to figure out.
15:53 jihpringle ya, if you want different portals for different org units with slight differences you have to re-create the portal for each org unit you want (but there is a clone button)
15:58 miker Dyrcona: I'll clean up the commit message and push another branch. if you're in a position to test the patch in both opac and staff search, it'd be a much appreciated cross check.
16:00 Dyrcona miker: Sure thing!
16:59 mmorgan left #evergreen
17:00 Stompro Who do I contact to add something to the tabular release notes?
17:19 jihpringle Stompro I believe jweston
17:37 stompro_home joined #evergreen
17:38 stompro__ joined #evergreen
18:45 jonadab joined #evergreen
19:58 jvwoolf left #evergreen
20:37 stompro__ joined #evergreen
20:38 stompro_home joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat