Time |
Nick |
Message |
00:07 |
|
bgillap joined #evergreen |
07:01 |
|
collum joined #evergreen |
08:09 |
|
BDorsey joined #evergreen |
08:28 |
|
mmorgan joined #evergreen |
08:49 |
|
smayo joined #evergreen |
09:00 |
|
dguarrac joined #evergreen |
09:12 |
|
sleary joined #evergreen |
09:23 |
|
Dyrcona joined #evergreen |
09:25 |
Dyrcona |
Stompro++ # I'm going to have a look at your two ticket updates today. I think you're right on both counts. |
09:26 |
Stompro |
Would it help if I got working branches for both up? |
09:35 |
|
mantis1 joined #evergreen |
09:45 |
Stompro |
I fell down the profiling rabbit hole, 8% reduction in marc_export runtime by trying to optimize MARC::File::XML escape() since over 50% of the calls are for single character strings for indicators and subfield codes. |
09:57 |
berick |
--force-ordered-holdings-fields (and not calling it) shaved about 10-15% from the Rust variant |
09:58 |
Dyrcona |
Stompro: RE working branches: I was going to mess about with my hacked copy. I added some code earlier this week to time the function calls. I would like to see what you did with Devel::Profile, though. I've never used it before. |
09:58 |
|
sleary joined #evergreen |
10:00 |
Dyrcona |
berick: The Rust variant had the main query running for about 5.5 seconds each time was called when I used --pipe recently, at least according to our PostgreSQL logs that filled up the drive partition. I want to EXPLAIN ANAlYZE it to see if we're missing an index or if can be made to use one. |
10:01 |
Dyrcona |
I'm still going to use the Rust program this Sunday at midnight, but I've jumped the --batch-size to 10000 to get 1/10 the log entries. Hopefully, the logs don't fill up. |
10:02 |
berick |
yeah i bet bigger batch size will help |
10:03 |
Dyrcona |
This could be a case where joins are slower than subqueries for .... reasons. |
10:04 |
Dyrcona |
It could also be completely different on a newer Pg release. |
10:04 |
Stompro |
I started using NYTProf now, it is much heavier weight, but the output much more detailed. https://metacpan.org/pod/Devel::NYTProf |
10:08 |
Stompro |
Devel::Profile is much faster for quickly seeing results. NYTProf generates a 2G output file in my testing that then has to be processed into an HTML report. |
10:08 |
Dyrcona |
I think I'll go for fiaster/lighter weight. I'll read the docs. |
10:09 |
Dyrcona |
My code to log subroutine names with timestamps as they're called produces a largeish output and is likely less accurate since it spends time generating timestamps and printing them. |
10:10 |
Stompro |
All I had to do was run marc_export with "perl -d:Profile ./marc_export_testing" and it generates the profile log prof.out |
10:10 |
Dyrcona |
I'll make a patch and throw it on the LP. |
10:11 |
Dyrcona |
I mean a patch for my logging code. |
10:11 |
Dyrcona |
So, you think we should just switch from insert_grouped_field to insert_field? |
10:13 |
Stompro |
I put a diff of the changes I was testing with at https://gitlab.com/LARL/evergreen-larl/-/snippets/3615366 |
10:14 |
Stompro |
Put all the 852s in an array, then once they are all added, call insert_grouped_field for the first one to get the same ordering, and use insert_fields_after for the rest with one call. |
10:18 |
Dyrcona |
Stompro: There's a simpler way to do the insert: push the fields to an array, then do the first one with shift and the rest of the array after that. |
10:19 |
Stompro |
I figured, my perl array skills need work. :-) |
10:20 |
Dyrcona |
Maybe my suggestion requires more rearrangement of the code, though. Having a firsttag flag might fit better with the current code organization. |
10:20 |
Dyrcona |
I wonder if the first one even needs to be grouped? |
10:20 |
Dyrcona |
I'm going to look at MARC::Record again. |
10:21 |
Dyrcona |
Stompro++ # For the notes in the snippets. |
10:21 |
Stompro |
In my test data, the 901 tag would be placed before the 852 without using the insert_grouped_field for the first. |
10:23 |
Stompro |
I don't think MARC::Record re-orders the fields. |
10:24 |
Stompro |
If I'm understanding where you are going with that. |
10:33 |
Dyrcona |
OK. LoC says the records are supposed to be grouped by hundreds, they don't have to be in order. |
10:41 |
|
briank joined #evergreen |
10:44 |
Dyrcona |
Oh! That patch I threw on Lp is missing a local change to format the microsecond timestamps to %06d..... |
10:53 |
Dyrcona |
Heh. This branch is a mess.... |
10:54 |
Dyrcona |
So, I was testing with a dump of 1 library's records. It took about 1 hour 4 minutes to run. I'll make a change based on Stompro's bug description and see what happens. |
11:06 |
Dyrcona |
OK. Here goes.... |
11:11 |
Stompro |
Dyrcona, does this library have some of the bibs with large numbers of copies? |
11:20 |
Dyrcona |
I don't know. I doubt it. It doesn't seem to have made much difference so far. I'll try a larger library or the whole consortium next. |
11:20 |
Dyrcona |
It's one we do a weekly export for, so that's why I chose it to test. |
11:28 |
Dyrcona |
It does use slightly less (~3%) CPU |
11:30 |
Stompro |
In my testing, with our production data it had only a slight improvement. But it really improved the run that was stacked with bibs with lots of copies. |
11:36 |
Stompro |
acps_for_bre needs to be reworked to improve the --items performance in general. Maybe the first call just pulls in all call numbers and copies and caches them in a hash... |
11:37 |
Stompro |
Or go with the rust version that is already better :-) |
11:43 |
Dyrcona |
When I ran the queries through EXPLAIN ANALYZE, none of them were particularly slow. The slowest was the acp_for_bres query. On one particular record, it spent 40ms on a seq scan of copy.cp_cn_idx. I'm not sure how to improve a seq scan on an index, unless it can be coerced to a heap scan somehow. |
11:45 |
Dyrcona |
CW MARS has always had issues with data because of our size. They ran 2 servers for a proprietary ILS (before Evergreen and before I arrived). We've also had modified versions of some database functions in the past because the standard ones would time out. |
11:46 |
Dyrcona |
I think our db functions are all back to stock at this point. |
11:49 |
|
jihpringle joined #evergreen |
12:34 |
|
collum joined #evergreen |
12:38 |
Dyrcona |
Heh. Almost 1 minute longer..... |
12:50 |
|
collum joined #evergreen |
13:02 |
Dyrcona |
I am testing this now: time marc_export --all -e UTF-8 --items > all.mrc 2>all.err |
13:14 |
Dyrcona |
The Rust marc export does batching of the queries by adding limit and offset. I wonder if we should do the same? I've noticed that the CPU usage goes up over time, which implies that something is struggling through the records. The memory use stays mostly constant once all of the records are retrieved from the database. |
13:20 |
Stompro |
Dyrcona, if you use gnu time, it gets you max memory usage also. /usr/bin/time -v... so you don't have to check that separately. |
13:25 |
Stompro |
Dyrcona, I'm surprised the execution time increased for you... hmm. |
13:28 |
Dyrcona |
Things are always weird here. |
13:29 |
Dyrcona |
So, i have a question about bills: Can I just deleted money.billing entries in the database? |
13:29 |
Dyrcona |
s/deleted/delete/ |
13:35 |
Dyrcona |
I guess one can delete the bills. |
13:43 |
Dyrcona |
It has been running for about 1 hour and has exported 16,175 records. It takes the first 15 to 20 minutes to gather all of the bibs.... |
13:44 |
Dyrcona |
It exported another 470 records in the last minute or so. |
13:44 |
Dyrcona |
Hmm.... That's still too slow, but I'm not sure if it is slower or faster than before. I'll let it run and see what happens. |
13:45 |
Dyrcona |
CPU usage keeps going up. |
13:45 |
Dyrcona |
I think paging would help. |
13:52 |
Dyrcona |
Rough estimate done in my head says that this is slower. :( |
13:53 |
Stompro |
Are you profiling this run? Seeing where the time is spent would be useful. |
13:53 |
Dyrcona |
I'm not profiling. |
13:53 |
Dyrcona |
Also, I might be off by a factor of 10 in my estimate. I'm going to get some fresh air and think about it. |
13:56 |
Dyrcona |
OK. My estimate was off. This looks like it will around 40% faster at this point, but it might speed up or my estimate might still be off. |
13:57 |
* Dyrcona |
goes out for a few minutes for some fresh air. |
13:58 |
Stompro |
Loading all the bibs into memory up front does seem like a bad idea, paging seems like a good next step. |
14:00 |
csharp_ |
berick: after installing the default concerto set, notes work - everything is speedy under redis - no errors yet |
14:07 |
|
sleary joined #evergreen |
14:16 |
berick |
csharp_: woohoo |
14:35 |
Dyrcona |
I've been testing Redis with production data, but not much lately. I need to write an email to ask the relevant staff here if they'd like me to update the dev/training server to use Redis on the backend. |
14:37 |
Dyrcona |
Current calculation puts it at only 20% faster, i.e. -1 day. |
14:38 |
Dyrcona |
I'm going to add a --batch-size option. If it is specified the main query will retrieve that number of records per request. I don't know if I'll get that implemented today. |
14:51 |
Dyrcona |
Looks like adding tags isn't my bottleneck My current estimate is minimal difference in performance. I'm going to let this run over the weekend to see if I'm wrong. On Monday, I'll add a batching/paging option to the main query and see if that helps. |
14:52 |
Dyrcona |
I'm sure it will since it can dump 40,000 or so records per hour when that's all there are. It currently looks like it is doing about 20,000 per hour. |
15:02 |
Dyrcona |
It seems appropriate that "Days of Rust" by INXS is playing now. |
15:03 |
csharp_ |
Dyrcona: Rust Never Sleeps next? |
15:05 |
Dyrcona |
Should be, but no. I'm actually listening to an INXS playlist, so it's "The Gift" now. |
15:08 |
Dyrcona |
I should get some more Neil Young. I have the MTV unplugged set and things he did with Buffalo Springfield and CSNY. |
15:36 |
Dyrcona |
Stompro: https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dyrcona/lp2041364-marc_export_improvements |
15:37 |
jeffdavis |
Fun Friday fact: Devo's Mark Mothersbaugh came up with the phrase "rust never sleeps" - it was an ad-lib while they were playing the song with Neil Young. |
15:43 |
Stompro |
Dyrcona, I'm working on the same thing, just reading your message. I'm going to give cursors a try. |
15:50 |
|
smayo joined #evergreen |
15:53 |
|
gmcharlt joined #evergreen |
15:54 |
|
mantis1 left #evergreen |
16:02 |
Stompro |
Dyrcona, using a cursor to process 10,000 at a time cuts the max memory usage from 1.2G to 152M, and the run time remains the same. |
16:03 |
Stompro |
That is without out --items... let me try with --items to see if it makes a difference. |
16:07 |
eeevil |
you know, I was going to say, re berick's earlier noting of improvements, "if the rust PG binding support cursors, that'd be fast to start /and/ low memory" ... but, alas, I did not. :) |
16:08 |
eeevil |
Stompro: +1 to cursors, though. def a good option for big record sets when TTFB and concurrent mem use are important |
16:09 |
csharp_ |
Dyrcona: Harvest Moon is one of my favorite albums evar |
16:12 |
berick |
same here |
16:12 |
berick |
Harvest as well |
16:13 |
berick |
Stompro++ eeevil++ # looks like cursors are an option -- will give it a poke |
16:17 |
Dyrcona |
I'll give cursors a poke, too. |
16:17 |
Dyrcona |
I think I commented about "cursors and Sybase" and my early experience with them at The Jockey Club and with Horizon last week. |
16:19 |
Dyrcona |
BTW, my maximum memory usage is 9GB. I think that's my biggest issue with MARC export. |
16:19 |
eeevil |
rewindable and writable cursors, and with-hold cursors, are not as fast as not-those-types in PG, but we don't need those, generally. |
16:19 |
Stompro |
With --items, 1.3G vs 256M for max resident memory, 596s vs 473s run time. (That as compares the 852 insert changes). |
16:19 |
Stompro |
s/as/also/ |
16:19 |
Dyrcona |
eeevil: It's the same in Sybase. I was told "don't use cursors," but experimentation showed that read-only cursors were fast. |
16:20 |
eeevil |
not only fast to complete, but hella-fast TTFB (barring super expensive sorting) so they feel even faster to human users |
16:21 |
Dyrcona |
Stompro: Would you mind adding a commit to my collab branch? Doesn't have to right now. I'm finishing up my day. |
16:21 |
Dyrcona |
The collab branch only has what's in main, so it doesn't include the --exclude-hidden option code. |
16:21 |
Stompro |
I'll give it a try, not sure I've ever collab'd before in that sense. |
16:22 |
Dyrcona |
As long as you don't force push, you can add to my branch. Just check it out like normal and push like normal. |
16:27 |
* Dyrcona |
is signing out. Have a great weekend, everyone! |
16:39 |
csharp_ |
@band add Systemctl Reboot |
16:39 |
pinesol |
csharp_: Band 'Systemctl Reboot' added to list |
16:40 |
Stompro |
Dyrcona, I've updated your collab branch with the cursor version. |
16:44 |
Stompro |
@later tell Dyrcona, I've updated your collab branch with my cursor version. |
16:44 |
pinesol |
Stompro: The operation succeeded. |
17:23 |
|
mmorgan left #evergreen |
18:24 |
|
jihpringle joined #evergreen |
20:11 |
|
sandbergja joined #evergreen |
22:32 |
|
sleary joined #evergreen |