Evergreen ILS Website

IRC log for #evergreen, 2023-07-19

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
01:37 kworstell_isl joined #evergreen
01:40 Bmagic joined #evergreen
01:40 dluch joined #evergreen
01:40 scottangel joined #evergreen
07:36 BDorsey joined #evergreen
08:22 collum joined #evergreen
08:27 mantis joined #evergreen
08:29 mmorgan joined #evergreen
08:53 kworstell-isl joined #evergreen
08:55 sandbergja joined #evergreen
08:59 mantis Morning.  Tried to start a reingest on our test server with 3.11 installed and got back an error.  https://pastebin.com/Kv6tHz4m
08:59 mantis any advice appreciated
09:00 mantis ERROR:  index row size 3136 exceeds maximum 2712 for index "browse_entry_sort_value_value_key" HINT:  Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexi
09:08 sandbergja joined #evergreen
09:16 rfrasur joined #evergreen
09:22 sharpsie mantis: something in the record is too big - have you been able to identify which record it's dying on?
09:23 sharpsie potentially lots of garbage in MARC records between human error and machine corruption
09:44 mantis is the insert query in 1344.function.trim_trailing... able to locate those records?
09:44 mantis if so I didn't get a return on that
09:44 mantis when I ran the select of that rather
10:03 jvwoolf joined #evergreen
10:04 kworstell-isl joined #evergreen
10:19 eeevil mantis: 1344 isn't related to overall length, just adjusting trailing punctuation.  the problem is some config.metabib_field where browse=true is pulling a really long value out of the MARC.  you could register the substr function as an indexing normalizer, if it's not already, and have whichever browse field definition is providing that super long string (some title, probably?) get normalized with a negative position normalizer mapping for substr
10:19 eeevil with params of, say, '[1,1000]'. the
10:19 eeevil s/the$//
10:30 mantis eevil: this is all new to me; I do know that we sometimes have very large MARC records or dangling subfields when I validate the exports via MarcRecord
10:30 mantis do you have an example of a command that can be run via terminal or in pgadmin?
10:32 Dyrcona joined #evergreen
10:35 sharpsie mantis: the index it's failing on is named "browse_entry_sort_value_value_key", which doesn't appear in the stock code - possible a previous admin added it
10:36 sharpsie mantis: there is an index browse_entry_sort_value_idx though
10:36 sharpsie current definition is CREATE INDEX browse_entry_sort_value_idx ON metabib.browse_entry USING BTREE (sort_value);
10:38 sharpsie what eeevil appears to be suggesting is to re-create the index (drop index concurrently/create index concurrently if running on a live server) with (substring(sort_value from 1 for 1000)) or similar
10:39 sharpsie mantis: in psql, \d+ metabib.browse_entry to see the indexes - maybe pastebin it?
10:46 BrianK joined #evergreen
10:48 mantis https://pastebin.com/M8PkZJUA
10:49 sharpsie oh - hmm - we have that index too
10:50 sharpsie ours for reference: https://pastebin.com/7ue0YRh7
10:51 sharpsie so, yeah, the same
10:54 sharpsie argh - I hate when there's no record of changes - apparently whatever that index is was added out-of-band
10:54 mantis alright at least we know that
10:55 sharpsie mantis: nevertheless, eeevil's suggestion should help - basically it's saying "however long the actual value is, truncate it after 1000 chars"
10:56 sharpsie mantis: is this in a test environment or live?
10:56 mantis test environment
10:56 sharpsie good :-)
10:58 mantis haha
10:58 sharpsie in a test environment that no one else depends on, you can be cavalier
10:59 sharpsie the "safe" way to create/drop indexes on a live server is usually to create the new index with "concurrently" which doesn't lock the table and make everyone miserable
11:00 sharpsie then drop the "old" index (again, with "concurrently")
11:02 sharpsie in this case though, you can DROP INDEX CONCURRENTLY browse_entry_sort_value_value_key; since it is apparently unnecessary?
11:04 sharpsie then DROP INDEX CONCURRENTLY browse_entry_sort_value_idx; CREATE INDEX CONCURRENTLY browse_entry_sort_value_idx ON metabib.browse_entry USING BTREE (substring(sort_value from 1 for 1000)); (completely untested, just following substr docs)
11:04 mantis we're thinking of something like this
11:04 mantis -- ALTER TABLE IF EXISTS metabib.browse_entry DROP CONSTRAINT IF EXISTS browse_entry_sort_value_value_key;  ALTER TABLE IF EXISTS metabib.browse_entry     ADD CONSTRAINT browse_entry_sort_value_value_key UNIQUE (sort_value, value);
11:05 sharpsie oh, right - it's a constraint -hmm
11:05 sharpsie mantis: in that case, you'd still want to wrap both sort_value and value in a subtr() probably
11:06 sharpsie because the current issue is that without that safety feature, PG will stupidly just try to import huge fields without question
11:13 mantis How's this?
11:13 mantis ALTER TABLE IF EXISTS metabib.browse_entry DROP CONSTRAINT IF EXISTS browse_entry_sort_value_value_key;  ALTER TABLE IF EXISTS metabib.browse_entry     ADD CONSTRAINT browse_entry_sort_value_value_key UNIQUE (subsring(sort_value from 1 for 1000), substring(value from 1 for 1000));
11:15 sharpsie try it within BEGIN;/COMMIT; and see if it works?
11:15 sharpsie oh wait you can't
11:16 sharpsie create index has it's own transactional nature - sorry
11:16 sharpsie so yeah, give it a go
11:16 sharpsie an index won't hurt anything (except space)
11:18 sharpsie I have a feeling the issue is "value" rather than "sort_value", but I'm not sure
11:18 sharpsie this is the year I'm going to learn bib/auth stuff top to bottom :-)
11:19 Christineb joined #evergreen
11:20 jeffdavis bug 1695911
11:20 pinesol Launchpad bug 1695911 in Evergreen "Long browse entries cause index row size error" [Undecided,New] https://launchpad.net/bugs/1695911
11:21 sharpsie jeffdavis++
11:22 sharpsie mantis: this looks right^^ and even mentions the phantom unique index!
11:24 collum joined #evergreen
11:24 sharpsie yay for the EG community of 2017/2018! (except for the follow through part, but I'm certainly more guilty than anyone for that)
11:26 sharpsie dbs++ # https://bugs.launchpad.net/eve​rgreen/+bug/1695911/comments/3
11:26 pinesol Launchpad bug 1695911 in Evergreen "Long browse entries cause index row size error" [Undecided,New]
11:26 sharpsie so not always garbage, just title verbosity dialed to 11
11:27 sharpsie https://i.pinimg.com/600x315/e7/8b/94​/e78b945c23a4ada6b7812cf57e9df6d1.jpg
11:33 sharpsie still very confused about no results from "ack browse_entry_sort_value_value_key"/"grep -r browse_entry_sort_value_value_key"
11:36 Dyrcona sharpsie: Open-ILS/src/sql/Pg/upgrade/0816​.schema.bib-auth-browse.sql:7129
11:37 Dyrcona The name is generated by PostgreSQL. It's not in the code.
11:37 * jeff nods
11:37 sharpsie postgresql: why tho?
11:38 jeff also, I may be imagining it, but I think the logic that postgresql uses to generate the name has changed at least once.
11:38 Dyrcona Also, this rings a bell. We had a similar issue with some other index once, and the solution was to truncate the index to looking at 1,000 or so characters.
11:38 sharpsie was about to quote the "explicit is better than implicit" python thing
11:39 Dyrcona jeff: That's probably true. We have a number of indexes whose names don't match a new installation, i.e. our production index names are different what they get in a recent concerto db.
11:39 sharpsie programming challenge #2: naming things
11:40 Dyrcona sharpsie: 1 is naming things, 2 is off by one errors. :)
11:40 Dyrcona 0 is cache invalidation.
11:40 sharpsie Dyrcona: I knew you would help
11:41 Dyrcona :)
11:42 Dyrcona mantis: If you come up with a fix, would you mind adding a branch to the Lp bug shared by jeffdavis above? We should probably fix this for everyone.
11:42 sharpsie Dyrcona: I assigned the bug to myself, but mantis if you want it, it's all yours
11:45 mantis sharpsie: I'll add what I can if we figure out why the syntax of the query Jessica wrote is giving us an error
11:45 mantis sharpsie++
11:45 mantis Dyrcona++
11:45 mantis jeff++
11:46 jihpringle joined #evergreen
11:48 jvwoolf I really have no idea what I'm doing, so this is in better hands with sharpsie
11:49 jvwoolf sharpsie++
11:49 jvwoolf Dyrcona++
11:49 jvwoolf jeffdavis++
11:49 jvwoolf jeff++
11:49 jvwoolf mantis++
11:58 Dyrcona sharpsie: I agree about being explicit. It's hard to write a db upgrade script when an index or constraint could have different names.
11:59 sharpsie Dyrcona: might be a different bug, but worth looking at for future generations' sakes
12:30 collum joined #evergreen
12:52 kworstell-isl joined #evergreen
13:05 eeevil mantis / sharpsie / Dyrcona / jeff / jeffdavis: sorry, meetings and then lunch! glad you found the old bug, a bell was ringing... berick has a reasonable plan in that older bug, but one key to keep in mind is making sure that we can find duplicates quickly, as that's core to the browse phase of ingest. so, large dataset testing is necessary for this one, and ingest function changes might be needed. the configuration-offered fix of using substr() to
13:05 eeevil cap the length can retain the existing infrastructure (that's we're confident works) and can be automated in the upgrade script by just applying the normalizer to extant browse fields, but will require thought when adding new browse fields locally as Dyrcona implies via "fix it for everyone". I'd personally prefer avoiding encoding low level implementation restrictions in the indexing, but display-field truncation is something to consider as a trade
13:05 eeevil off there.
13:26 sharpsie eeevil: 10-4
13:29 Dyrcona joined #evergreen
13:30 jihpringle joined #evergreen
13:31 jeffdavis I had some recent trouble with that browse value/sort value index, the fixes proposed in the bug weren't working for me, need to find some time later today to collect details
13:58 Dyrcona Y'know what would be nice? LIMIT on a delete. I have 3 duplicate rows in a table and I'd like to delete 2 of them. That would be easier than deleting all of them and inserting one back.
13:58 jeffdavis yeah, our existing browse_entry_sort_value_value_idx (not _key) was causing issues during our upgrade to Postgres 14, and creating new equivalents using substr or md5 as suggested in the bug wasn't working (seemed to result in browse becoming unusably slow for some unidentified reason)
13:58 jeffdavis ... so we just dropped that index locally altogether
13:59 jeffdavis haven't tried the index normalizer approach
14:00 * Dyrcona should search Lp for related bugs. I'm pretty sure I opened one.
14:00 Dyrcona Different index/constraint though.
14:01 jvwoolf joined #evergreen
14:02 Dyrcona Whee. What a name: barcode_prefix_org_unit_prefix_prefix_stub_key
14:48 jeff Dyrcona: there are various tricks for doing that, especially if you don't care which row you keep. Most involve using ctid. :-)
14:52 kworstell_isl joined #evergreen
14:54 sharpsie like "Circ report (clone) (clone) (clone) (clone) (clone)"
15:09 jeffdavis wondering if there should be a way for certain users to be able to bypass MFA - like consortial support staff logging in with a Circ Desk Staff account at BR1 for support/testing purposes without having to use whatever second auth factor that would normally require
15:10 jeff arguably that should be a different account.
15:11 jeff but yes, that's just one of a pile of questions that remain to be addressed. :-)
15:12 jeffdavis yeah I'm sure we'll come up with more headaches^Wuse cases to think about
15:14 jeff including determining some hard guidelines around what requires a user to use MFA to auth (home_ou, any working location requiring MFA, workstation OU? other?), and given a user that does not require MFA, are they prevented from logging in in any scenarios, like with a workstation tied to an org unit that does require MFA, etc?
15:28 Dyrcona MFA for everyone.
15:30 Dyrcona jeff: On my delete thing from earlier, I'm just going to delete the 3 rows and then insert the data back once. (That's easy enough.) I'll then add a unique constraint. I've tested it, and it works.
15:48 mantis left #evergreen
16:23 jvwoolf joined #evergreen
16:23 jihpringle joined #evergreen
16:25 Guest20 joined #evergreen
17:16 mmorgan sharpsie++ # newdevs
17:16 mmorgan left #evergreen
17:52 jeffdavis MFA-- # for reasons unrelated to any pending Evergreen development, but may as well get an early start on the karma
18:50 jihpringle joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat