Evergreen ILS Website

IRC log for #evergreen, 2023-10-27

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
00:07 bgillap joined #evergreen
07:01 collum joined #evergreen
08:09 BDorsey joined #evergreen
08:28 mmorgan joined #evergreen
08:49 smayo joined #evergreen
09:00 dguarrac joined #evergreen
09:12 sleary joined #evergreen
09:23 Dyrcona joined #evergreen
09:25 Dyrcona Stompro++ # I'm going to have a look at your two ticket updates today. I think you're right on both counts.
09:26 Stompro Would it help if I got working branches for both up?
09:35 mantis1 joined #evergreen
09:45 Stompro I fell down the profiling rabbit hole, 8% reduction in marc_export runtime by trying to optimize MARC::File::XML escape() since over 50% of the calls are for single character strings for indicators and subfield codes.
09:57 berick --force-ordered-holdings-fields (and not calling it) shaved about 10-15% from the Rust variant
09:58 Dyrcona Stompro: RE working branches: I was going to mess about with my hacked copy. I added some code earlier this week to time the function calls. I would like to see what you did with Devel::Profile, though. I've never used it before.
09:58 sleary joined #evergreen
10:00 Dyrcona berick: The Rust variant had the main query running for about 5.5 seconds each time was called when I used --pipe recently, at least according to our PostgreSQL logs that filled up the drive partition. I want to EXPLAIN ANAlYZE it to see if we're missing an index or if can be made to use one.
10:01 Dyrcona I'm still going to use the Rust program this Sunday at midnight, but I've jumped the --batch-size to 10000 to get 1/10 the log entries. Hopefully, the logs don't fill up.
10:02 berick yeah i bet bigger batch size will help
10:03 Dyrcona This could be a case where joins are slower than subqueries for .... reasons.
10:04 Dyrcona It could also be completely different on a newer Pg release.
10:04 Stompro I started using NYTProf now, it is much heavier weight, but the output much more detailed.  https://metacpan.org/pod/Devel::NYTProf
10:08 Stompro Devel::Profile is much faster for quickly seeing results.  NYTProf generates a 2G output file in my testing that then has to be processed into an HTML report.
10:08 Dyrcona I think I'll go for fiaster/lighter weight. I'll read the docs.
10:09 Dyrcona My code to log subroutine names with timestamps as they're called produces a largeish output and is likely less accurate since it spends time generating timestamps and printing them.
10:10 Stompro All I had to do was run marc_export with "perl -d:Profile ./marc_export_testing" and it generates the profile log prof.out
10:10 Dyrcona I'll make a patch and throw it on the LP.
10:11 Dyrcona I mean a patch for my logging code.
10:11 Dyrcona So, you think we should just switch from insert_grouped_field to insert_field?
10:13 Stompro I put a diff of the changes I was testing with at https://gitlab.com/LARL/everg​reen-larl/-/snippets/3615366
10:14 Stompro Put all the 852s in an array, then once they are all added, call insert_grouped_field for the first one to get the same ordering, and use insert_fields_after for the rest with one call.
10:18 Dyrcona Stompro: There's a simpler way to do the insert: push the fields to an array, then do the first one with shift and the rest of the array after that.
10:19 Stompro I figured, my perl array skills need work. :-)
10:20 Dyrcona Maybe my suggestion requires more rearrangement of the code, though. Having a firsttag flag might fit better with the current code organization.
10:20 Dyrcona I wonder if the first one even needs to be grouped?
10:20 Dyrcona I'm going to look at MARC::Record again.
10:21 Dyrcona Stompro++ # For the notes in the snippets.
10:21 Stompro In my test data, the 901 tag would be placed before the 852 without using the insert_grouped_field for the first.
10:23 Stompro I don't think MARC::Record re-orders the fields.
10:24 Stompro If I'm understanding where you are going with that.
10:33 Dyrcona OK. LoC says the records are supposed to be grouped by hundreds, they don't have to be in order.
10:41 briank joined #evergreen
10:44 Dyrcona Oh! That patch I threw on Lp is missing a local change to format the microsecond timestamps to %06d.....
10:53 Dyrcona Heh. This branch is a mess....
10:54 Dyrcona So, I was testing with a dump of 1 library's records. It took about 1 hour 4 minutes to run. I'll make a change based on Stompro's bug description and see what happens.
11:06 Dyrcona OK. Here goes....
11:11 Stompro Dyrcona, does this library have some of the bibs with large numbers of copies?
11:20 Dyrcona I don't know. I doubt it. It doesn't seem to have made much difference so far. I'll try a larger library or the whole consortium next.
11:20 Dyrcona It's one we do a weekly export for, so that's why I chose it to test.
11:28 Dyrcona It does use slightly less (~3%) CPU
11:30 Stompro In my testing, with our production data it had only a slight improvement.  But it really improved the run that was stacked with bibs with lots of copies.
11:36 Stompro acps_for_bre needs to be reworked to improve the --items performance in general.  Maybe the first call just pulls in all call numbers and copies and caches them in a hash...
11:37 Stompro Or go with the rust version that is already better :-)
11:43 Dyrcona When I ran the queries through EXPLAIN ANALYZE, none of them were particularly slow. The slowest was the acp_for_bres query. On one particular record, it spent 40ms on a seq scan of copy.cp_cn_idx. I'm not sure how to improve a seq scan on an index, unless it can be coerced to a heap scan somehow.
11:45 Dyrcona CW MARS has always had issues with data because of our size. They ran 2 servers for a proprietary ILS (before Evergreen and before I arrived). We've also had modified versions of some database functions in the past because the standard ones would time out.
11:46 Dyrcona I think our db functions are all back to stock at this point.
11:49 jihpringle joined #evergreen
12:34 collum joined #evergreen
12:38 Dyrcona Heh. Almost 1 minute longer.....
12:50 collum joined #evergreen
13:02 Dyrcona I am testing this now: time marc_export --all -e UTF-8 --items > all.mrc 2>all.err
13:14 Dyrcona The Rust marc export does batching of the queries by adding limit and offset. I wonder if we should do the same? I've noticed that the CPU usage goes up over time, which implies that something is struggling through the records. The memory use stays mostly constant once all of the records are retrieved from the database.
13:20 Stompro Dyrcona, if you use gnu time, it gets you max memory usage also.  /usr/bin/time -v... so you don't have to check that separately.
13:25 Stompro Dyrcona, I'm surprised the execution time increased for you... hmm.
13:28 Dyrcona Things are always weird here.
13:29 Dyrcona So, i have a question about bills: Can I just deleted money.billing entries in the database?
13:29 Dyrcona s/deleted/delete/
13:35 Dyrcona I guess one can delete the bills.
13:43 Dyrcona It has been running for about 1 hour and has exported 16,175 records. It takes the first 15 to 20 minutes to gather all of the bibs....
13:44 Dyrcona It exported another 470 records in the last minute or so.
13:44 Dyrcona Hmm.... That's still too slow, but I'm not sure if it is slower or faster than before. I'll let it run and see what happens.
13:45 Dyrcona CPU usage keeps going up.
13:45 Dyrcona I think paging would help.
13:52 Dyrcona Rough estimate done in my head says that this is slower. :(
13:53 Stompro Are you profiling this run?  Seeing where the time is spent would be useful.
13:53 Dyrcona I'm not profiling.
13:53 Dyrcona Also, I might be off by a factor of 10 in my estimate. I'm going to get some fresh air and think about it.
13:56 Dyrcona OK. My estimate was off. This looks like it will around 40% faster at this point, but it might speed up or my estimate might still be off.
13:57 * Dyrcona goes out for a few minutes for some fresh air.
13:58 Stompro Loading all the bibs into memory up front does seem like a bad idea, paging seems like a good next step.
14:00 csharp_ berick: after installing the default concerto set, notes work - everything is speedy under redis - no errors yet
14:07 sleary joined #evergreen
14:16 berick csharp_: woohoo
14:35 Dyrcona I've been testing Redis with production data, but not much lately. I need to write an email to ask the relevant staff here if they'd like me to update the dev/training server to use Redis on the backend.
14:37 Dyrcona Current calculation puts it at only 20% faster, i.e. -1 day.
14:38 Dyrcona I'm going to add a --batch-size option. If it is specified the main query will retrieve that number of records per request. I don't know if I'll get that implemented today.
14:51 Dyrcona Looks like adding tags isn't my bottleneck My current estimate is minimal difference in performance. I'm going to let this run over the weekend to see if I'm wrong. On Monday, I'll add a batching/paging option to the main query and see if that helps.
14:52 Dyrcona I'm sure it will since it can dump 40,000 or so records per hour when that's all there are. It currently looks like it is doing about 20,000 per hour.
15:02 Dyrcona It seems appropriate that "Days of Rust" by INXS is playing now.
15:03 csharp_ Dyrcona: Rust Never Sleeps next?
15:05 Dyrcona Should be, but no. I'm actually listening to an INXS playlist, so it's "The Gift" now.
15:08 Dyrcona I should get some more Neil Young. I have the MTV unplugged set and things he did with Buffalo Springfield and CSNY.
15:36 Dyrcona Stompro: https://git.evergreen-ils.org/?p=working/Ev​ergreen.git;a=shortlog;h=refs/heads/collab/​dyrcona/lp2041364-marc_export_improvements
15:37 jeffdavis Fun Friday fact: Devo's Mark Mothersbaugh came up with the phrase "rust never sleeps" - it was an ad-lib while they were playing the song with Neil Young.
15:43 Stompro Dyrcona, I'm working on the same thing, just reading your message.  I'm going to give cursors a try.
15:50 smayo joined #evergreen
15:53 gmcharlt joined #evergreen
15:54 mantis1 left #evergreen
16:02 Stompro Dyrcona, using a cursor to process 10,000 at a time cuts the max memory usage from 1.2G to 152M, and the run time remains the same.
16:03 Stompro That is without out --items... let me try with --items to see if it makes a difference.
16:07 eeevil you know, I was going to say, re berick's earlier noting of improvements, "if the rust PG binding support cursors, that'd be fast to start /and/ low memory" ... but, alas, I did not. :)
16:08 eeevil Stompro: +1 to cursors, though. def a good option for big record sets when TTFB and concurrent mem use are important
16:09 csharp_ Dyrcona: Harvest Moon is one of my favorite albums evar
16:12 berick same here
16:12 berick Harvest as well
16:13 berick Stompro++ eeevil++ # looks like cursors are an option -- will give it a poke
16:17 Dyrcona I'll give cursors a poke, too.
16:17 Dyrcona I think I commented about "cursors and Sybase" and my early experience with them at The Jockey Club and with Horizon last week.
16:19 Dyrcona BTW, my maximum memory usage is 9GB. I think that's my biggest issue with MARC export.
16:19 eeevil rewindable and writable cursors, and with-hold cursors, are not as fast as not-those-types in PG, but we don't need those, generally.
16:19 Stompro With --items, 1.3G vs 256M for max resident memory, 596s vs 473s run time.  (That as compares the 852 insert changes).
16:19 Stompro s/as/also/
16:19 Dyrcona eeevil: It's the same in Sybase. I was told "don't use cursors," but experimentation showed that read-only cursors were fast.
16:20 eeevil not only fast to complete, but hella-fast TTFB (barring super expensive sorting) so they feel even faster to human users
16:21 Dyrcona Stompro: Would you mind adding a commit to my collab branch? Doesn't have to right now. I'm finishing up my day.
16:21 Dyrcona The collab branch only has what's in main, so it doesn't include the --exclude-hidden option code.
16:21 Stompro I'll give it a try, not sure I've ever collab'd before in that sense.
16:22 Dyrcona As long as you don't force push, you can add to my branch. Just check it out like normal and push like normal.
16:27 * Dyrcona is signing out. Have a great weekend, everyone!
16:39 csharp_ @band add Systemctl Reboot
16:39 pinesol csharp_: Band 'Systemctl Reboot' added to list
16:40 Stompro Dyrcona, I've updated your collab branch with the cursor version.
16:44 Stompro @later tell Dyrcona, I've updated your collab branch with my cursor version.
16:44 pinesol Stompro: The operation succeeded.
17:23 mmorgan left #evergreen
18:24 jihpringle joined #evergreen
20:11 sandbergja joined #evergreen
22:32 sleary joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat