Time |
Nick |
Message |
01:37 |
|
kworstell_isl joined #evergreen |
01:40 |
|
Bmagic joined #evergreen |
01:40 |
|
dluch joined #evergreen |
01:40 |
|
scottangel joined #evergreen |
07:36 |
|
BDorsey joined #evergreen |
08:22 |
|
collum joined #evergreen |
08:27 |
|
mantis joined #evergreen |
08:29 |
|
mmorgan joined #evergreen |
08:53 |
|
kworstell-isl joined #evergreen |
08:55 |
|
sandbergja joined #evergreen |
08:59 |
mantis |
Morning. Tried to start a reingest on our test server with 3.11 installed and got back an error. https://pastebin.com/Kv6tHz4m |
08:59 |
mantis |
any advice appreciated |
09:00 |
mantis |
ERROR: index row size 3136 exceeds maximum 2712 for index "browse_entry_sort_value_value_key" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexi |
09:08 |
|
sandbergja joined #evergreen |
09:16 |
|
rfrasur joined #evergreen |
09:22 |
sharpsie |
mantis: something in the record is too big - have you been able to identify which record it's dying on? |
09:23 |
sharpsie |
potentially lots of garbage in MARC records between human error and machine corruption |
09:44 |
mantis |
is the insert query in 1344.function.trim_trailing... able to locate those records? |
09:44 |
mantis |
if so I didn't get a return on that |
09:44 |
mantis |
when I ran the select of that rather |
10:03 |
|
jvwoolf joined #evergreen |
10:04 |
|
kworstell-isl joined #evergreen |
10:19 |
eeevil |
mantis: 1344 isn't related to overall length, just adjusting trailing punctuation. the problem is some config.metabib_field where browse=true is pulling a really long value out of the MARC. you could register the substr function as an indexing normalizer, if it's not already, and have whichever browse field definition is providing that super long string (some title, probably?) get normalized with a negative position normalizer mapping for substr |
10:19 |
eeevil |
with params of, say, '[1,1000]'. the |
10:19 |
eeevil |
s/the$// |
10:30 |
mantis |
eevil: this is all new to me; I do know that we sometimes have very large MARC records or dangling subfields when I validate the exports via MarcRecord |
10:30 |
mantis |
do you have an example of a command that can be run via terminal or in pgadmin? |
10:32 |
|
Dyrcona joined #evergreen |
10:35 |
sharpsie |
mantis: the index it's failing on is named "browse_entry_sort_value_value_key", which doesn't appear in the stock code - possible a previous admin added it |
10:36 |
sharpsie |
mantis: there is an index browse_entry_sort_value_idx though |
10:36 |
sharpsie |
current definition is CREATE INDEX browse_entry_sort_value_idx ON metabib.browse_entry USING BTREE (sort_value); |
10:38 |
sharpsie |
what eeevil appears to be suggesting is to re-create the index (drop index concurrently/create index concurrently if running on a live server) with (substring(sort_value from 1 for 1000)) or similar |
10:39 |
sharpsie |
mantis: in psql, \d+ metabib.browse_entry to see the indexes - maybe pastebin it? |
10:46 |
|
BrianK joined #evergreen |
10:48 |
mantis |
https://pastebin.com/M8PkZJUA |
10:49 |
sharpsie |
oh - hmm - we have that index too |
10:50 |
sharpsie |
ours for reference: https://pastebin.com/7ue0YRh7 |
10:51 |
sharpsie |
so, yeah, the same |
10:54 |
sharpsie |
argh - I hate when there's no record of changes - apparently whatever that index is was added out-of-band |
10:54 |
mantis |
alright at least we know that |
10:55 |
sharpsie |
mantis: nevertheless, eeevil's suggestion should help - basically it's saying "however long the actual value is, truncate it after 1000 chars" |
10:56 |
sharpsie |
mantis: is this in a test environment or live? |
10:56 |
mantis |
test environment |
10:56 |
sharpsie |
good :-) |
10:58 |
mantis |
haha |
10:58 |
sharpsie |
in a test environment that no one else depends on, you can be cavalier |
10:59 |
sharpsie |
the "safe" way to create/drop indexes on a live server is usually to create the new index with "concurrently" which doesn't lock the table and make everyone miserable |
11:00 |
sharpsie |
then drop the "old" index (again, with "concurrently") |
11:02 |
sharpsie |
in this case though, you can DROP INDEX CONCURRENTLY browse_entry_sort_value_value_key; since it is apparently unnecessary? |
11:04 |
sharpsie |
then DROP INDEX CONCURRENTLY browse_entry_sort_value_idx; CREATE INDEX CONCURRENTLY browse_entry_sort_value_idx ON metabib.browse_entry USING BTREE (substring(sort_value from 1 for 1000)); (completely untested, just following substr docs) |
11:04 |
mantis |
we're thinking of something like this |
11:04 |
mantis |
-- ALTER TABLE IF EXISTS metabib.browse_entry DROP CONSTRAINT IF EXISTS browse_entry_sort_value_value_key; ALTER TABLE IF EXISTS metabib.browse_entry ADD CONSTRAINT browse_entry_sort_value_value_key UNIQUE (sort_value, value); |
11:05 |
sharpsie |
oh, right - it's a constraint -hmm |
11:05 |
sharpsie |
mantis: in that case, you'd still want to wrap both sort_value and value in a subtr() probably |
11:06 |
sharpsie |
because the current issue is that without that safety feature, PG will stupidly just try to import huge fields without question |
11:13 |
mantis |
How's this? |
11:13 |
mantis |
ALTER TABLE IF EXISTS metabib.browse_entry DROP CONSTRAINT IF EXISTS browse_entry_sort_value_value_key; ALTER TABLE IF EXISTS metabib.browse_entry ADD CONSTRAINT browse_entry_sort_value_value_key UNIQUE (subsring(sort_value from 1 for 1000), substring(value from 1 for 1000)); |
11:15 |
sharpsie |
try it within BEGIN;/COMMIT; and see if it works? |
11:15 |
sharpsie |
oh wait you can't |
11:16 |
sharpsie |
create index has it's own transactional nature - sorry |
11:16 |
sharpsie |
so yeah, give it a go |
11:16 |
sharpsie |
an index won't hurt anything (except space) |
11:18 |
sharpsie |
I have a feeling the issue is "value" rather than "sort_value", but I'm not sure |
11:18 |
sharpsie |
this is the year I'm going to learn bib/auth stuff top to bottom :-) |
11:19 |
|
Christineb joined #evergreen |
11:20 |
jeffdavis |
bug 1695911 |
11:20 |
pinesol |
Launchpad bug 1695911 in Evergreen "Long browse entries cause index row size error" [Undecided,New] https://launchpad.net/bugs/1695911 |
11:21 |
sharpsie |
jeffdavis++ |
11:22 |
sharpsie |
mantis: this looks right^^ and even mentions the phantom unique index! |
11:24 |
|
collum joined #evergreen |
11:24 |
sharpsie |
yay for the EG community of 2017/2018! (except for the follow through part, but I'm certainly more guilty than anyone for that) |
11:26 |
sharpsie |
dbs++ # https://bugs.launchpad.net/evergreen/+bug/1695911/comments/3 |
11:26 |
pinesol |
Launchpad bug 1695911 in Evergreen "Long browse entries cause index row size error" [Undecided,New] |
11:26 |
sharpsie |
so not always garbage, just title verbosity dialed to 11 |
11:27 |
sharpsie |
https://i.pinimg.com/600x315/e7/8b/94/e78b945c23a4ada6b7812cf57e9df6d1.jpg |
11:33 |
sharpsie |
still very confused about no results from "ack browse_entry_sort_value_value_key"/"grep -r browse_entry_sort_value_value_key" |
11:36 |
Dyrcona |
sharpsie: Open-ILS/src/sql/Pg/upgrade/0816.schema.bib-auth-browse.sql:7129 |
11:37 |
Dyrcona |
The name is generated by PostgreSQL. It's not in the code. |
11:37 |
* jeff |
nods |
11:37 |
sharpsie |
postgresql: why tho? |
11:38 |
jeff |
also, I may be imagining it, but I think the logic that postgresql uses to generate the name has changed at least once. |
11:38 |
Dyrcona |
Also, this rings a bell. We had a similar issue with some other index once, and the solution was to truncate the index to looking at 1,000 or so characters. |
11:38 |
sharpsie |
was about to quote the "explicit is better than implicit" python thing |
11:39 |
Dyrcona |
jeff: That's probably true. We have a number of indexes whose names don't match a new installation, i.e. our production index names are different what they get in a recent concerto db. |
11:39 |
sharpsie |
programming challenge #2: naming things |
11:40 |
Dyrcona |
sharpsie: 1 is naming things, 2 is off by one errors. :) |
11:40 |
Dyrcona |
0 is cache invalidation. |
11:40 |
sharpsie |
Dyrcona: I knew you would help |
11:41 |
Dyrcona |
:) |
11:42 |
Dyrcona |
mantis: If you come up with a fix, would you mind adding a branch to the Lp bug shared by jeffdavis above? We should probably fix this for everyone. |
11:42 |
sharpsie |
Dyrcona: I assigned the bug to myself, but mantis if you want it, it's all yours |
11:45 |
mantis |
sharpsie: I'll add what I can if we figure out why the syntax of the query Jessica wrote is giving us an error |
11:45 |
mantis |
sharpsie++ |
11:45 |
mantis |
Dyrcona++ |
11:45 |
mantis |
jeff++ |
11:46 |
|
jihpringle joined #evergreen |
11:48 |
jvwoolf |
I really have no idea what I'm doing, so this is in better hands with sharpsie |
11:49 |
jvwoolf |
sharpsie++ |
11:49 |
jvwoolf |
Dyrcona++ |
11:49 |
jvwoolf |
jeffdavis++ |
11:49 |
jvwoolf |
jeff++ |
11:49 |
jvwoolf |
mantis++ |
11:58 |
Dyrcona |
sharpsie: I agree about being explicit. It's hard to write a db upgrade script when an index or constraint could have different names. |
11:59 |
sharpsie |
Dyrcona: might be a different bug, but worth looking at for future generations' sakes |
12:30 |
|
collum joined #evergreen |
12:52 |
|
kworstell-isl joined #evergreen |
13:05 |
eeevil |
mantis / sharpsie / Dyrcona / jeff / jeffdavis: sorry, meetings and then lunch! glad you found the old bug, a bell was ringing... berick has a reasonable plan in that older bug, but one key to keep in mind is making sure that we can find duplicates quickly, as that's core to the browse phase of ingest. so, large dataset testing is necessary for this one, and ingest function changes might be needed. the configuration-offered fix of using substr() to |
13:05 |
eeevil |
cap the length can retain the existing infrastructure (that's we're confident works) and can be automated in the upgrade script by just applying the normalizer to extant browse fields, but will require thought when adding new browse fields locally as Dyrcona implies via "fix it for everyone". I'd personally prefer avoiding encoding low level implementation restrictions in the indexing, but display-field truncation is something to consider as a trade |
13:05 |
eeevil |
off there. |
13:26 |
sharpsie |
eeevil: 10-4 |
13:29 |
|
Dyrcona joined #evergreen |
13:30 |
|
jihpringle joined #evergreen |
13:31 |
jeffdavis |
I had some recent trouble with that browse value/sort value index, the fixes proposed in the bug weren't working for me, need to find some time later today to collect details |
13:58 |
Dyrcona |
Y'know what would be nice? LIMIT on a delete. I have 3 duplicate rows in a table and I'd like to delete 2 of them. That would be easier than deleting all of them and inserting one back. |
13:58 |
jeffdavis |
yeah, our existing browse_entry_sort_value_value_idx (not _key) was causing issues during our upgrade to Postgres 14, and creating new equivalents using substr or md5 as suggested in the bug wasn't working (seemed to result in browse becoming unusably slow for some unidentified reason) |
13:58 |
jeffdavis |
... so we just dropped that index locally altogether |
13:59 |
jeffdavis |
haven't tried the index normalizer approach |
14:00 |
* Dyrcona |
should search Lp for related bugs. I'm pretty sure I opened one. |
14:00 |
Dyrcona |
Different index/constraint though. |
14:01 |
|
jvwoolf joined #evergreen |
14:02 |
Dyrcona |
Whee. What a name: barcode_prefix_org_unit_prefix_prefix_stub_key |
14:48 |
jeff |
Dyrcona: there are various tricks for doing that, especially if you don't care which row you keep. Most involve using ctid. :-) |
14:52 |
|
kworstell_isl joined #evergreen |
14:54 |
sharpsie |
like "Circ report (clone) (clone) (clone) (clone) (clone)" |
15:09 |
jeffdavis |
wondering if there should be a way for certain users to be able to bypass MFA - like consortial support staff logging in with a Circ Desk Staff account at BR1 for support/testing purposes without having to use whatever second auth factor that would normally require |
15:10 |
jeff |
arguably that should be a different account. |
15:11 |
jeff |
but yes, that's just one of a pile of questions that remain to be addressed. :-) |
15:12 |
jeffdavis |
yeah I'm sure we'll come up with more headaches^Wuse cases to think about |
15:14 |
jeff |
including determining some hard guidelines around what requires a user to use MFA to auth (home_ou, any working location requiring MFA, workstation OU? other?), and given a user that does not require MFA, are they prevented from logging in in any scenarios, like with a workstation tied to an org unit that does require MFA, etc? |
15:28 |
Dyrcona |
MFA for everyone. |
15:30 |
Dyrcona |
jeff: On my delete thing from earlier, I'm just going to delete the 3 rows and then insert the data back once. (That's easy enough.) I'll then add a unique constraint. I've tested it, and it works. |
15:48 |
|
mantis left #evergreen |
16:23 |
|
jvwoolf joined #evergreen |
16:23 |
|
jihpringle joined #evergreen |
16:25 |
|
Guest20 joined #evergreen |
17:16 |
mmorgan |
sharpsie++ # newdevs |
17:16 |
|
mmorgan left #evergreen |
17:52 |
jeffdavis |
MFA-- # for reasons unrelated to any pending Evergreen development, but may as well get an early start on the karma |
18:50 |
|
jihpringle joined #evergreen |