| 10:27 |
pinesol |
News from commits: DOCS: LP#1871211 Follow-up eg_vhost.conf <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=ded2dd7815a9d3bcf0305c1b55dd53ee3f7ae4f4> |
| 10:27 |
Dyrcona |
It probably should. |
| 10:29 |
|
sandbergja joined #evergreen |
| 10:30 |
sandbergja |
Dyrcona++ # thanks for taking a look at that live test |
| 10:32 |
berick |
sandbergja++ # branches, redis, and more! |
| 10:32 |
Bmagic |
sandbergja, abneiman, terranm, colum : Would you mind if I back ported this Docs commit to rel_3_12? ded2dd7815a9d3bcf0305c1b55dd53ee3f7ae4f4 |
| 10:33 |
sandbergja |
+1 from me |
| 11:18 |
berick |
hm, i thought we did decide to make it default for post-3.12 main. |
| 11:18 |
berick |
i could be misremembering |
| 11:19 |
berick |
in part, i think, since it would only affect new installs |
| 11:20 |
eeevil |
I thought we'd agreed that we should see what things look like after some more testing and review. |
| 11:20 |
Dyrcona |
I'm not sure what we agreed at this point. Maybe we should have that discussion on the list? |
| 11:20 |
eeevil |
where will there be more new installs than for the purpose of dev, though? |
| 11:21 |
Dyrcona |
The dev meeting is also today. |
| 11:37 |
eeevil |
since I came to complain ;) , I'm happy to pick them into main. are there objections? |
| 11:38 |
eeevil |
man... my fingers are not moving in the right order today. rel_3_12 |
| 11:41 |
eeevil |
it's close to lunch time, so I'll leave the question open for a while. but, ping Dyrcona and sandbergja (in particular) |
| 11:52 |
csharp_ |
eeevil: I'm good for what my opinion is worth |
| 11:55 |
csharp_ |
I've lightly tested it on my puny server |
| 11:55 |
csharp_ |
@band add Irregular Expressions |
| 11:55 |
pinesol |
csharp_: Band 'Irregular Expressions' added to list |
| 12:02 |
sleary |
can I still target bug fixes to 3.12 or is that bad manners at this point? |
| 12:03 |
eeevil |
csharp_: it's worth One Full Commit Bit, as it happens, sir! |
| 15:04 |
Bmagic |
no problem, carrying forward |
| 15:04 |
Bmagic |
#action mmorgan will explore moving LP stats to community site and automating same |
| 15:04 |
Bmagic |
#info sandbergja will write tutorial: "Do a database call (Galen’s cat counter)"#info sandbergja will write tutorial: "Do a database call (Galen’s cat counter)" |
| 15:04 |
Bmagic |
#info sandbergja will go over the Nightwatch test reorg with folks at the Monday at 2pm ET meeting or another time as available |
| 15:05 |
Bmagic |
go ahead sandbergja |
| 15:05 |
sandbergja |
kinda fidddling with my partial draft for the tutorial |
| 15:05 |
sandbergja |
probably need to check back with me next time :-) |
| 15:05 |
Bmagic |
no problem |
| 15:05 |
sleary |
sandbergja I looked at what you sent me a while back and it's looking great |
| 15:05 |
sandbergja |
we didn't get around to moving the nightwatch tests, but we got them working! |
| 15:05 |
sandbergja |
sleary++ # thanks for the review! |
| 15:05 |
terranm |
sandbergja++ |
| 15:05 |
Bmagic |
#action sandbergja will write tutorial: "Do a database call (Galen’s cat counter)" |
| 15:05 |
smayo |
sandbergja++ |
| 15:08 |
Bmagic |
I refreshed and I see new stuff |
| 15:08 |
abneiman |
and terranm++ sandbergja++ and mmorgan++ for many lovely merges |
| 15:08 |
Bmagic |
#link https://wiki.evergreen-ils.org/doku.php?id=dev:bug_squashing:2023-11 |
| 15:09 |
abneiman |
though there is a test conflict to talk about below in the agenda |
| 15:09 |
Bmagic |
terranm++ sandbergja++ mmorgan++ |
| 15:09 |
terranm |
3.12 (currently) has an even 100 patches committed |
| 15:09 |
sleary |
terranm++ sandbergja++ mmorgan++ |
| 15:33 |
Bmagic |
sandbergja++ sleary++ |
| 15:33 |
sleary |
sandbergja++ # thanks for staying on top of linting! |
| 15:33 |
Bmagic |
thanks again sandbergja! next.... |
| 15:33 |
Bmagic |
#topic New Business - Test failures, including at least one critical regression (bug 2043437) - Jane |
| 15:33 |
pinesol |
Launchpad bug 2043437 in Evergreen "Three test failures on rel_3_12 and main" [Critical,New] https://launchpad.net/bugs/2043437 |
| 15:34 |
Bmagic |
#link https://bugs.launchpad.net/evergreen/+bug/2043437 |
| 15:34 |
sandbergja |
oh god, I have a bunch in a row :-D |
| 15:34 |
Bmagic |
yes, yes you do |
| 15:34 |
sandbergja |
we have 3 failing tests, one of which points to a major problem |
| 15:34 |
sandbergja |
tests++ # catching that before we released it to users! |
| 15:35 |
sandbergja |
specifically, the holdings view doesn't load (maybe just a missing import or something) |
| 15:35 |
terranm |
sandbergja++ tests++ |
| 15:35 |
sandbergja |
I feel pretty strongly we should take care of those before building a beta. |
| 15:35 |
mmorgan |
+1 |
| 15:35 |
sandbergja |
But I don't know that I'll have much time to look into them |
| 15:36 |
sandbergja |
Dyrcona already started looking at the perl one, and posted some notes |
| 15:37 |
Dyrcona |
When I said that I don't know how to fix the syntax error, I should have said that it's not obvious to me what's wrong. |
| 15:37 |
Bmagic |
the course reserves issue is fine because the test is bad, so we're looking at the holdingsView.spec.ts issue, and the query issue |
| 15:37 |
sleary |
we should fix the bad test since the problem is obvious, but yes |
| 15:38 |
Bmagic |
agreed on fixing the test. Should each of the three things be it's own bug so folks can claim them? |
| 15:38 |
Dyrcona |
Well, we could use a collab branch to avoid separate bugs, but I'll let the consensus decide. |
| 15:40 |
Bmagic |
I assume this pause is because everyone is reading the bug |
| 15:40 |
sandbergja |
:-) |
| 15:41 |
Bmagic |
Dyrcona: that query works on two of my test systems |
| 15:42 |
Dyrcona |
Bmagic: It blew up for me on a 3.12 vm with stock concerto and Pg 15. I tried the function by itself with different parameters. |
| 15:43 |
Bmagic |
Just the query? Not integrated in Evergreen? |
| 15:43 |
Dyrcona |
Different parameters, I mean interger array and string that should have matched. |
| 15:43 |
Dyrcona |
Just the function by itself, as well as the query I pasted. |
| 15:43 |
Dyrcona |
The query comes from the error output of the test. |
| 15:44 |
JBoyer |
Fun thing: it worked for me on eg3.11 / pg15 and broke on 3.11 / pg14. |
| 15:44 |
Bmagic |
ok, yes, it breaks on newer versions of the database |
| 15:44 |
Bmagic |
bugsquash machine throws the error |
| 15:46 |
Bmagic |
let's move this to post-meeting |
| 15:46 |
Dyrcona |
JBoyer: Gotcha. |
| 15:46 |
JBoyer |
Bmagic, +1 |
| 15:47 |
Bmagic |
#topic New Business - How can we get computers running our tests regularly again? - Jane |
| 15:47 |
eeevil |
I'll also look at the search one, later |
| 15:47 |
sandbergja |
#info for anybody wanting to run the tests, or try it out: https://wiki.evergreen-ils.org/doku.php?id=dev:contributing:qa#types_of_tests |
| 15:47 |
Bmagic |
sandbergja++ |
| 15:47 |
mmorgan |
sandbergja++ |
| 15:48 |
sandbergja |
I am just feeling fired up about tests, and wanted to see if there's capacity for getting buildbot running them automatically for us, or some new solution |
| 15:48 |
Bmagic |
sandbergja: where's the buildbot now? (I've never known where that lives and who's in charge of it) |
| 15:49 |
shulabear |
sandbergja++ |
| 15:49 |
sandbergja |
no clue. Was it an EOLI server? |
| 15:52 |
sandbergja |
the ng lint always passes, for reasons mentioned above hahaha |
| 15:52 |
Bmagic |
haha |
| 15:52 |
Bmagic |
not sure if we've arrived at anything I can put down as action |
| 15:53 |
sandbergja |
I can investigate getting more tests into gh actions, if there aren't concerns with tying ourselves more to that platform |
| 15:53 |
Bmagic |
#action sandbergja will investigate getting more tests into gh actions |
| 15:53 |
JBoyer |
It doesn't have to be the projects definitive home to provide a useful function, even if temporarily, |
| 15:54 |
Bmagic |
almost out of time |
| 15:54 |
Bmagic |
#topic Keep an eye out for Angular 17 / Bootstrap 5.3 upgrade blockers and note them on bug 2043490 - Stephanie |
| 15:54 |
pinesol |
Launchpad bug 2043490 in Evergreen "Angular 17 + Bootstrap 5.3 Upgrade" [Wishlist,New] https://launchpad.net/bugs/2043490 |
| 15:54 |
kmlussier |
sandbergja++ tests++ |
| 15:54 |
JBoyer |
(so long as it's still easy to run tests locally) |
| 15:54 |
Bmagic |
#link https://bugs.launchpad.net/evergreen/+bug/2043490 |
| 15:54 |
Bmagic |
sleary: go for it |
| 15:54 |
sleary |
ah! So, I went ahead and opened a bug for the next big Angular/Bootstrap upgrade, which should be less painful than the last one |
| 15:55 |
sleary |
I haven't looked too closely into what's involved on the Angular side, so I wanted to ask you all to keep an eye on that as you skim your news, and add comments to that bug if you find any potential blockers other than the ng-bootstrap accordion issue listed |
| 15:55 |
sandbergja |
sleary++ |
| 15:55 |
|
smayo joined #evergreen |
| 15:55 |
eeevil |
#info I've requested we keep XMPP as the default OpenSRF transport in EG main for the time being. There's no redis release of OpenSRF yet, so support is only in a side branch, and having redis be the default will make dev (especially backport-focused dev, like bug fixes) more painful because you can't just switch branches and test the code. Also, I'm not convinced that it's deeply understood by more folks than berick, and that puts a lot of pressure |
| 15:55 |
eeevil |
on him to Fix All The Things if Things need Fixing. I'm asking here for any strong objections to applying the 2 existing commits that will make that so, as it is now for rel_3_12. Barring any, I'll pick those commits into main and life will be a little simpler for all the not-testing-redis cases, for now. |
| 15:56 |
eeevil |
(separately, I think I know where the search test failure is coming from, and I'll poke at it early tomorrow) |
| 15:57 |
Bmagic |
eeevil: and when we've all installed and tested redis, then make it default? |
| 15:58 |
eeevil |
Bmagic: well, and when more-than-Bill can help work on it, ideally, but yes. "when it and we are ready" |
| 15:58 |
eeevil |
it's not something we should force Right Now, IMNSHO. but, hopefully, soon |
| 15:58 |
eeevil |
for a definition of soon somewhere between "months" and "geologic time" |
| 16:06 |
|
jihpringle joined #evergreen |
| 16:06 |
shulabear |
bmagic++ |
| 16:10 |
Bmagic |
Dycona: that's strange. Worked for me on 3.9.1 and 3.11.1, but not busquash main |
| 16:11 |
Dyrcona |
Bmagic: Are you talking about the function itself or the search test? I suspect the function has been broken since 0940, but doesn't look like it is used much. |
| 16:12 |
Bmagic |
copy/paste the query I mean |
| 16:13 |
Dyrcona |
I get a syntax error any time/any where I run the function. |
| 16:14 |
JBoyer |
yeah, query_int_wrapper does so little I'm surprised we even have it (syntax escaping simplicity if I had to guess) but sometimes passing '()' as the second param is accepted and sometimes it's a syntax error, which is weird. |
| 16:20 |
Dyrcona |
select query_int_wrapper(vlist, '()') from metabib.record_attr_vector_list where source =2; <- Blows up for me, even though record 2 is in metabib.record_attr_vector_list |
| 16:20 |
Dyrcona |
Can we just replace it with the code from the function? |
| 16:22 |
Dyrcona |
select vlist @@ '()'::query_int from metabib.record_attr_vector_list where source = 2; Also produces a syntax error at the text value '() |
| 16:23 |
eeevil |
I wish I weren't orange right now ... TL;DR: no, we can't (without testing that we don't need it anymore -- and there were reasons) |
| 16:24 |
Dyrcona |
eeevil: It's used in only 1 place AFAICT. |
| 16:24 |
Dyrcona |
I've tried a few variations: '{}' also fails but null works. |
| 16:25 |
eeevil |
Dyrcona: yes, and that 1 place is important. more later, though |
| 16:45 |
eeevil |
I'm transforming back into pure eeevil, leaving my pumpkin state behind! |
| 16:45 |
eeevil |
and, https://bugs.launchpad.net/evergreen/+bug/1438136/comments/12 |
| 16:45 |
pinesol |
Launchpad bug 1438136 in Evergreen 2.8 "OPAC searching significantly slowed by adding format filters" [High,Fix released] |
| 16:45 |
eeevil |
that's why we have query_int_wrapper, and what we need to test before we stop trying to use it. |
| 16:45 |
eeevil |
HOWERVER |
| 16:46 |
eeevil |
that's not the problem here, regardless. the problem here is that we should never be sending an effectively-empty query_int (that is, and int array query "thing") to the database |
| 16:47 |
eeevil |
that clause is a question of the data, but the question being asked is empty in this case. we need to elide the clause altogether |
| 16:48 |
eeevil |
which we normally do (you don't always see a query_int clause either direct or via query_int_wrapper), but something is convincing us to generate an empty query_int at the perl level |
| 16:49 |
eeevil |
I suspect an interaction with the new on_reserve() filter, but mostly because it's new and not because I can point to a problem with it |
| 16:54 |
Dyrcona |
Well, that's somewhere to start. I was thinking of resorting to git bisect. I can take a look tomorrow morning, probably. |
| 16:55 |
Dyrcona |
Anyway.... I'll turn into a pumpkin now..... |
| 17:11 |
|
kmlussier left #evergreen |
| 10:43 |
|
BAMkubasa joined #evergreen |
| 10:50 |
|
mantis1 joined #evergreen |
| 10:55 |
|
briank joined #evergreen |
| 11:15 |
Dyrcona |
OK. I have the latest Rust marc-export test running. I'll see how long this takes. |
| 11:27 |
|
jihpringle joined #evergreen |
| 12:14 |
|
Christineb joined #evergreen |
| 12:18 |
eeevil |
berick: drive by question (from immediately-post hack-a-way) incoming! meetings will probably keep me away from here for the rest of the day, but want to get it to you asap for RediSRF(?) goodness |
| 12:20 |
eeevil |
berick: I'm looking at opensrf/redis anew via the collab branches. I have two questions that jump out at me: is the --reset-message-bus documented beyond "you must do this at redis restart" somewhere? (the commit it comes in with isn't obvious to me; also, I want to help remove any instances of boilerplate stuff that we can automate or fold into other actions as early as possible -- ideally before merge -- and I know this is under discussion or at |
| 12:20 |
eeevil |
least being thought about); and, while the config and password list files don't require it themselves, it /looks/ like the usernames are pinned as "opensrf", "router", and "gateway" in the code that implements each part. is that correct and do you see any reason we /can't/ replace those strings with the value pulled from opensrf_core.xml as with ejabberd? the result being ACL-separated user sets that share a redis instance but can't talk to each |
| 12:20 |
eeevil |
other or guess each other's user and queue names. |
| 12:32 |
Dyrcona |
eeevil: berick said earlier that he modified osrf_control to do --reset-message-bus as required, so its not necessary now. I'm testing that and other updates. That's a good question about the names. (I generally don't change them.) |
| 12:33 |
Bmagic |
I'm troubleshooting a database issue. explain analyze on an insert into biblio.record_entry results in 6+ seconds. The analyze shows me that it spends the majority of the time in reporter.simple_rec_trigger(). Which is odd, because if it's an INSERT, it should just skip down to RETURN. Weird right? |
| 12:33 |
Dyrcona |
Bmagic: No. It still builds the rmsr entry. |
| 12:34 |
Dyrcona |
Wait until you update an authority with hundreds of bibs "attached." :) |
| 12:35 |
Bmagic |
this is EG 3.11.1 with queued ingest. I updated the global flag to turn off queued ingest, and tested it. and visa versa. it's 13 seconds for an insert with it turned off, and 6 seconds with it turned on. But still, 6 seconds seems insane for an insert on biblio.record_entry, especially if we're deferring all of the metabib stuff for the ingester process to come later |
| 12:35 |
Dyrcona |
Updating/inserting bibs and auths can be slow because of all the triggers. |
| 12:36 |
Bmagic |
I got here because it's breaking the course reserves interface. Taking too long |
| 12:37 |
* Dyrcona |
colour me unsurprised. |
| 16:38 |
Bmagic |
Maybe I can change PG configuration to eliminate the secondary DB to prove it has something to do with that (or not) |
| 16:38 |
Dyrcona |
Nope. None of them checked in. |
| 16:40 |
Dyrcona |
Bmagic: Could be. I often have queries fail on the secondary DB, 'cause changed rows from replication are needed. I know that's not your issue, but you shouldn't be inserting into a replicated db, so I'm not sure what you're checking. |
| 16:40 |
Bmagic |
This is my test query: explain analyze select * from reporter.old_super_simple_record where id=671579 |
| 16:41 |
Bmagic |
6+ seconds on db1. and less than one second on db2 |
| 16:42 |
Dyrcona |
OK. Das stimmt. |
| 16:43 |
Dyrcona |
Hm... I need to find the circulations. The copies are all Lost status.... They're also deleted, and I thought using the copy_id would resolve the latter. |
| 16:51 |
Bmagic |
Repeated for all tables in metabib |
| 16:52 |
Bmagic |
retested, and was dissappointed. Restarted postgres, still dissappointed |
| 16:52 |
Bmagic |
disappointed |
| 16:55 |
jeffdavis |
Is it consistently ~6 seconds even for other IDs? Like, not 4 or 8 seconds? |
| 16:59 |
jeffdavis |
We were having an issue on a test server where calls to open-ils.pcrud.search.circ were consistently taking about 6.5s per circ. I never got to the bottom of it (fortunately we're not seeing that in production). It's a long shot but maybe there is a connection? Don't want to send you down a rabbit hole though! |
| 17:00 |
Dyrcona |
There are only 1 or 2 tables involved in that function. Everything else is 4 views that percolate up. |
| 17:00 |
jeffdavis |
But the flesh fields for the problematic call include wide_display_entry. |
| 17:01 |
Dyrcona |
Which is a view, IIRC. |
| 17:09 |
jeff |
out for now, back later! |
| 17:10 |
Bmagic |
https://explain.depesz.com/s/Df5A |
| 17:11 |
Bmagic |
so that seems to have revealed that it's sequential scanning display_field_map |
| 17:16 |
jeffdavis |
I can confirm we're seeing the same behavior on our test server with your example query (explain (analyze, buffers) select * from reporter.old_super_simple_record where id=x) |
| 17:16 |
Bmagic |
so weird, two databases, one a replicant of the other. I just synced up those two settings in postgresql.conf and restarted the primary database. The query is still sequentially scanning that table instead of using an index. I run the same query against the secondary database and it's fast |
| 17:17 |
Bmagic |
jeffdavis: yous is sequential scanning? |
| 17:17 |
jeffdavis |
test db (slow): https://explain.depesz.com/s/pkwY |
| 17:17 |
jeffdavis |
prod db (fast): https://explain.depesz.com/s/UWu5 |
| 17:19 |
Bmagic |
jeffdavis: interesting. In my case, it's prod that's slow and the secondary database is fast |
| 17:20 |
Bmagic |
what's even more interesting, is for your fast case, it's seq scanning display_field_map and apparently that's not taking much time at all |
| 17:24 |
Bmagic |
jeffdavis: PG14? |
| 17:24 |
jeffdavis |
yes |
| 17:25 |
Bmagic |
jeffdavis++ |
| 17:25 |
jeffdavis |
The servers are pretty different, the test db is the product of pg_dump/pg_restore rather than replication |
| 17:25 |
Bmagic |
yeah, I'm tempted to dump/restore as well |
| 17:26 |
Bmagic |
I out of ideas, so that one comes to mind :) |
| 17:27 |
jeffdavis |
well, we're seeing it after dump/restore, but who knows, maybe it would help! |
| 10:03 |
Dyrcona |
This could be a case where joins are slower than subqueries for .... reasons. |
| 10:04 |
Dyrcona |
It could also be completely different on a newer Pg release. |
| 10:04 |
Stompro |
I started using NYTProf now, it is much heavier weight, but the output much more detailed. https://metacpan.org/pod/Devel::NYTProf |
| 10:08 |
Stompro |
Devel::Profile is much faster for quickly seeing results. NYTProf generates a 2G output file in my testing that then has to be processed into an HTML report. |
| 10:08 |
Dyrcona |
I think I'll go for fiaster/lighter weight. I'll read the docs. |
| 10:09 |
Dyrcona |
My code to log subroutine names with timestamps as they're called produces a largeish output and is likely less accurate since it spends time generating timestamps and printing them. |
| 10:10 |
Stompro |
All I had to do was run marc_export with "perl -d:Profile ./marc_export_testing" and it generates the profile log prof.out |
| 10:10 |
Dyrcona |
I'll make a patch and throw it on the LP. |
| 10:11 |
Dyrcona |
I mean a patch for my logging code. |
| 10:11 |
Dyrcona |
So, you think we should just switch from insert_grouped_field to insert_field? |
| 10:13 |
Stompro |
I put a diff of the changes I was testing with at https://gitlab.com/LARL/evergreen-larl/-/snippets/3615366 |
| 10:14 |
Stompro |
Put all the 852s in an array, then once they are all added, call insert_grouped_field for the first one to get the same ordering, and use insert_fields_after for the rest with one call. |
| 10:18 |
Dyrcona |
Stompro: There's a simpler way to do the insert: push the fields to an array, then do the first one with shift and the rest of the array after that. |
| 10:19 |
Stompro |
I figured, my perl array skills need work. :-) |
| 10:20 |
Dyrcona |
I wonder if the first one even needs to be grouped? |
| 10:20 |
Dyrcona |
I'm going to look at MARC::Record again. |
| 10:21 |
Dyrcona |
Stompro++ # For the notes in the snippets. |
| 10:21 |
Stompro |
In my test data, the 901 tag would be placed before the 852 without using the insert_grouped_field for the first. |
| 10:23 |
Stompro |
I don't think MARC::Record re-orders the fields. |
| 10:24 |
Stompro |
If I'm understanding where you are going with that. |
| 10:33 |
Dyrcona |
OK. LoC says the records are supposed to be grouped by hundreds, they don't have to be in order. |
| 10:41 |
|
briank joined #evergreen |
| 10:44 |
Dyrcona |
Oh! That patch I threw on Lp is missing a local change to format the microsecond timestamps to %06d..... |
| 10:53 |
Dyrcona |
Heh. This branch is a mess.... |
| 10:54 |
Dyrcona |
So, I was testing with a dump of 1 library's records. It took about 1 hour 4 minutes to run. I'll make a change based on Stompro's bug description and see what happens. |
| 11:06 |
Dyrcona |
OK. Here goes.... |
| 11:11 |
Stompro |
Dyrcona, does this library have some of the bibs with large numbers of copies? |
| 11:20 |
Dyrcona |
I don't know. I doubt it. It doesn't seem to have made much difference so far. I'll try a larger library or the whole consortium next. |
| 11:20 |
Dyrcona |
It's one we do a weekly export for, so that's why I chose it to test. |
| 11:28 |
Dyrcona |
It does use slightly less (~3%) CPU |
| 11:30 |
Stompro |
In my testing, with our production data it had only a slight improvement. But it really improved the run that was stacked with bibs with lots of copies. |
| 11:36 |
Stompro |
acps_for_bre needs to be reworked to improve the --items performance in general. Maybe the first call just pulls in all call numbers and copies and caches them in a hash... |
| 11:37 |
Stompro |
Or go with the rust version that is already better :-) |
| 11:43 |
Dyrcona |
When I ran the queries through EXPLAIN ANALYZE, none of them were particularly slow. The slowest was the acp_for_bres query. On one particular record, it spent 40ms on a seq scan of copy.cp_cn_idx. I'm not sure how to improve a seq scan on an index, unless it can be coerced to a heap scan somehow. |
| 12:34 |
|
collum joined #evergreen |
| 12:38 |
Dyrcona |
Heh. Almost 1 minute longer..... |
| 12:50 |
|
collum joined #evergreen |
| 13:02 |
Dyrcona |
I am testing this now: time marc_export --all -e UTF-8 --items > all.mrc 2>all.err |
| 13:14 |
Dyrcona |
The Rust marc export does batching of the queries by adding limit and offset. I wonder if we should do the same? I've noticed that the CPU usage goes up over time, which implies that something is struggling through the records. The memory use stays mostly constant once all of the records are retrieved from the database. |
| 13:20 |
Stompro |
Dyrcona, if you use gnu time, it gets you max memory usage also. /usr/bin/time -v... so you don't have to check that separately. |
| 13:25 |
Stompro |
Dyrcona, I'm surprised the execution time increased for you... hmm. |
| 14:00 |
csharp_ |
berick: after installing the default concerto set, notes work - everything is speedy under redis - no errors yet |
| 14:07 |
|
sleary joined #evergreen |
| 14:16 |
berick |
csharp_: woohoo |
| 14:35 |
Dyrcona |
I've been testing Redis with production data, but not much lately. I need to write an email to ask the relevant staff here if they'd like me to update the dev/training server to use Redis on the backend. |
| 14:37 |
Dyrcona |
Current calculation puts it at only 20% faster, i.e. -1 day. |
| 14:38 |
Dyrcona |
I'm going to add a --batch-size option. If it is specified the main query will retrieve that number of records per request. I don't know if I'll get that implemented today. |
| 14:51 |
Dyrcona |
Looks like adding tags isn't my bottleneck My current estimate is minimal difference in performance. I'm going to let this run over the weekend to see if I'm wrong. On Monday, I'll add a batching/paging option to the main query and see if that helps. |
| 10:40 |
mmorgan |
Stompro: extend_reporter.legacy_circ_count |
| 10:40 |
Stompro |
mmorgan++ thank you!! |
| 10:41 |
mmorgan |
YW! |
| 10:43 |
Stompro |
Dyrcona, I was conversing with Brian about his marc_export issues. And I did a test export or our DB. 200K bibs took 5 minutes, used 1.2GB of RAM, and created a 1GB uncompressed xml file, which compressed to 115MB. |
| 10:43 |
Dyrcona |
user_ingest_name_keywords_tgr BEFORE INSERT OR UPDATE ON actor.usr FOR EACH ROW EXECUTE PROCEDURE actor.user_ingest_name_keywords() <- I think that should maybe be an AFTER, but I'll play with it. |
| 10:45 |
Stompro |
That was without items... I should try it again with items. |
| 10:45 |
Dyrcona |
Stompro: I still have to do this in production, but it has always taken longer than that. Are you feeding IDs to marc_export, and what marc_export options are you using? |
| 10:48 |
|
briank joined #evergreen |
| 10:49 |
Dyrcona |
Also, I realize my comment about the user_ingest_name_keywords_tgr needing to be AFTER is bogus. I pulled an extra couple of fields in my query and discovered why I was seeing what I thought was anomalous. |
| 10:50 |
|
kworstell_isl joined #evergreen |
| 10:51 |
Dyrcona |
Some test accounts have weird names. :) |
| 10:52 |
Dyrcona |
The problem could be the extra query to grab item data combined with the massive amount of data. |
| 10:54 |
Stompro |
Dyrcona, --items didn't change the memory usage, still 1.2G for 194432 bibs.. run time seems longer.. I'll report back when done. |
| 10:56 |
Dyrcona |
I think running that select for items in a loop is the real issue. I should refactor this to grab the items at the time the query runs. That complicates the main loop though. |
| 11:13 |
Dyrcona |
I estimate it will export about 1.78 million records. |
| 11:14 |
Stompro |
Dyrcona, Nevermind about the cpu usage, that was me seeing the 9% memory used by mistake. |
| 11:15 |
Stompro |
Using your method I see 67%cpu |
| 11:16 |
Dyrcona |
I'm going to try smaller batches in production starting this afternoon to see if it helps. I may or may not stop the one running on a test vm. Maybe it is time to refactor export to speed things up? |
| 11:24 |
pinesol |
News from commits: Docs: LP1845957 Permissions List with Descriptions <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=2680eca9e4dbaa79b2cd00c7fd3373311b85901c> |
| 11:24 |
pinesol |
News from commits: Docs: LP1845957 Part 1 - Update describing_your_people.adoc <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=c7b205fb604e7827843c7a5ea6542ce02c2f72ef> |
| 11:25 |
Dyrcona |
I think I'll break for lunch early and start on that about noon. |
| 12:49 |
eeevil |
smayo: I do not recall ... it's been A While (TM) ;) ... I can look, though |
| 12:51 |
Dyrcona |
hmm... marc_export should exit when a bad command line option is passed in. |
| 12:56 |
Dyrcona |
It's taking a while to export the "large" bibs. It has to be the items query that is slowing things down. These are 90 bibs in our database with > 500 items. |
| 12:57 |
eeevil |
@later tell smayo I do not recall ... it's been A While (TM) ;) ... I can look, though. UPDATE: looks like it was a "just check a week" flag, basically, though the breakout variable is similar (if 15x larger). if skipping dow_count testing makes everything happy, I'm for it. |
| 12:57 |
pinesol |
eeevil: The operation succeeded. |
| 12:58 |
|
smayo joined #evergreen |
| 12:58 |
Dyrcona |
:) |
| 13:10 |
Dyrcona |
18 minutes and 54 seconds and only 13 records exported.... Well, I know where I need to look. |
| 13:34 |
|
smayo joined #evergreen |
| 13:48 |
|
smayo joined #evergreen |
| 13:50 |
Dyrcona |
58 minutes and it is a bit over halfway done with the 90 large records. I have a 'debug' version of marc_export that I'll use to dump the queries on a test system. |
| 13:56 |
|
jihpringle joined #evergreen |
| 14:24 |
pinesol |
News from commits: Docs: updates to Z39.50 documentation <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=bb4d795eb16102c35d26f0d59d14da50b86605e4> |
| 14:54 |
Dyrcona |
Doing batches does not appear to have improved performance. If anything, it seems worse, but maybe my dev database is faster than production. |
| 15:11 |
berick |
./eg-marc-export --items --to-xml --out-file /tmp/recs.xml # --help also works / other options to limit the data set |
| 15:12 |
Dyrcona |
berick++ I'll give it a look. |
| 15:12 |
berick |
Dyrcona++ |
| 15:21 |
jeffdavis |
Interesting performance problem on our test servers - the Items Out tab is very slow to load. Looks like the call to open-ils.pcrud.search.circ is consistently taking 6.5 seconds per item for some reason. |
| 15:21 |
berick |
oof |
| 15:22 |
jeff |
do the items in question have extremely high circulation counts? |
| 15:23 |
jeff |
we have a test item that gets checked in and out multiple times a day via SIP2 for a Nagios/Zabbix style check. |
| 15:23 |
jeff |
I once made the mistake of using that same item to test something unrelated, and it took consistently 6 seconds or more to retrieve in item status. |
| 15:24 |
jeffdavis |
No, these are just randomly selected items - the one I checked has 8 total circs. |
| 15:24 |
jeff |
(It may not apply in this case. I don't think the pcrud search is going to be trying to get a total circ count for the items.) |
| 15:24 |
jeff |
ah, drat. |
| 15:25 |
jeffdavis |
Our production environment is not affected, but a test server running the same version of Evergreen (3.9) is, as is a different test server running 3.11. |
| 15:25 |
|
mmorgan1 joined #evergreen |
| 15:28 |
Stompro |
jeffdavis, are they all on the same Postgres version? |
| 15:29 |
jeffdavis |
Yes, all PG14. The test servers all share the same Postgres server, my guess is that's where the issue lies but not sure what the mechanism would be. |
| 15:30 |
Dyrcona |
jeffdavis: You have all of the latest patches for Pg installed? |
| 15:30 |
Dyrcona |
Meaning Evergreen patches. |
| 15:32 |
jeffdavis |
The affected servers are either 3.9.1-ish or 3.11.1-ish with some additional backports -- not fully up to date but pretty close. Are there specific recent patches you're thinking of? |
| 15:51 |
berick |
IOW, evergreen-universe-rs does a variety of stuff, but when it talks to EG, it assumes Redis is the communication layer |
| 15:51 |
berick |
ah, no, redis required for those actions |
| 15:55 |
pinesol |
News from commits: Docs: Circulation Patron Record Page <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=9d632b3589a263333a187fda59a708fe672f2813> |
| 15:56 |
Dyrcona |
If I'm going to start messing with Rust, I guess I should dust off the VM where I tested the RedisRF branches. |
| 15:56 |
berick |
muahaha i have successfully distracted you :) |
| 15:57 |
Dyrcona |
:) |
| 16:01 |
Dyrcona |
If the Rust export is faster, then I won't consider it a distraction. :) |
| 11:05 |
Dyrcona |
It might need more patches than just that one.... I'll leave it for now. |
| 11:14 |
|
kmlussier joined #evergreen |
| 11:18 |
Dyrcona |
So, going back to yesterday's conversation about MARC export, I wonder if that commit really was the problem. I reverted that one and two others, then started a new export. It has been running for almost 21 hours and only exported about 340,000 records. I estimate it should export about 1.7 million. |
| 11:20 |
Dyrcona |
At that rate, it will still take roughly 5 days to export them all. This is one a test system, but it's an old production database server and it's "configured." The hardware is no slouch. I guess I will have to dump queries and run them through EXPLAIN. |
| 11:28 |
Dyrcona |
Y'know what. I think I'll stop this export, back out the entire feature and go again. |
| 11:29 |
jeff |
if it's similar behavior as yesterday and most of the resource usage appears to be marc_export using CPU, I'd suspect inefficiency in the MARC record manipulation or in dealing with the relatively large amount of data in memory from the use of fetchall_ on such a large dataset. |
| 11:29 |
jeff |
even if it's not swapping, dealing with that large a data structure might be giving Perl / DBI a challenge. |
| 11:30 |
jeff |
(both of those are just guesses, though. I don't have time this week to experiment to test the theories.) :-) |
| 11:30 |
Dyrcona |
jeff: That might be it. Maybe it's actually the other patch that I pushed yesterday to manipulate the 852? |
| 11:30 |
jeff |
I like your idea of next step, though. Especially if you've exported a set this large before without issue. |
| 11:31 |
Dyrcona |
Well, "without issue" is up for debate.... |
| 12:07 |
pinesol |
Launchpad bug 1788680 in Evergreen 3.3 "Null statcats break copy templates" [High,Fix released] https://launchpad.net/bugs/1788680 |
| 12:07 |
|
jihpringle joined #evergreen |
| 12:08 |
Dyrcona |
We're on 3.7, and I think it stores fleshed JSON objects. The Angular client looks like it only stores stat cat ids. |
| 12:08 |
jeff |
I may resort to empirical testing. Copy affected JSON to a 3.10 system and re-test there. :-) |
| 12:08 |
Dyrcona |
I didn't make a not of the line numbers. I didn't think I'd want to look at it again. |
| 12:09 |
Dyrcona |
On 3.7, it looks like it includes everything in the template including alerts. I seem to recall in master, it ONLY looks at stat cats an only puts in the ID. This may come up again on Friday, so I think I'll have another look. |
| 12:14 |
Dyrcona |
I should have made notes. |
| 12:40 |
jihpringle |
the new alert messages have also caused some funky template issues - https://bugs.launchpad.net/evergreen/+bug/2022349 |
| 12:40 |
pinesol |
Launchpad bug 2022349 in Evergreen "Remove old-style copy/item alerts" [Medium,Confirmed] |
| 12:56 |
Dyrcona |
Everyone have a good rest of your day (or night)! I'm taking off early. |
| 13:11 |
mmorgan |
jeff: I'm familiar with 1788680, but have seen other issues with stat cats in templates, too. |
| 13:11 |
mmorgan |
Mostly templates trying to apply stat cat entries that no longer exist. |
| 13:11 |
|
collum joined #evergreen |
| 13:13 |
mmorgan |
I think I've seen the -1 once or twice, too. Maybe that was an attempt to remove the stat cat value? |
| 13:13 |
|
jihpringle joined #evergreen |
| 13:17 |
mmorgan |
We're not yet using the angular holdings editor, but in testing, are becoming aware of things lurking in templates that get applied and saved in items in Angular, but were ignored in angularjs. Like the copy alerts in 2022349 |
| 13:28 |
|
redavis joined #evergreen |
| 13:32 |
eby |
https://docs.paradedb.com/blog/introducing_bm25 |
| 13:34 |
jeff |
I tested one of the "statcats":{"24":-1} templates on a 3.10 system and while it did set the stat cat in question to "unset" (likely the original template's intent/origin), it did so in a way that was successful, and allowed the item to be saved. On 3.7, the behavior was that the stat cat was invalid and failed to save. |
| 13:36 |
jeff |
eby: saw that ("pg_bm25: Elastic-Quality Full Text Search Inside Postgres"). Also saw some comments at https://news.ycombinator.com/item?id=37809126 from a few days ago that I bookmarked for later. |
| 13:37 |
jeff |
(the "24" in my template fragment isn't significant, it's just the ID of a copy stat cat here) |
| 13:56 |
|
mantis1 left #evergreen |
| 14:24 |
berick |
eby: neat. |
| 14:24 |
berick |
also neat https://github.com/quickwit-oss/tantivy |