IRC log for #evergreen, 2023-10-27

All times shown according to the server's local time.

Time	Nick	Message
00:07		bgillap joined #evergreen
07:01		collum joined #evergreen
08:09		BDorsey joined #evergreen
08:28		mmorgan joined #evergreen
08:49		smayo joined #evergreen
09:00		dguarrac joined #evergreen
09:12		sleary joined #evergreen
09:23		Dyrcona joined #evergreen
09:25	Dyrcona	Stompro++ # I'm going to have a look at your two ticket updates today. I think you're right on both counts.
09:26	Stompro	Would it help if I got working branches for both up?
09:35		mantis1 joined #evergreen
09:45	Stompro	I fell down the profiling rabbit hole, 8% reduction in marc_export runtime by trying to optimize MARC::File::XML escape() since over 50% of the calls are for single character strings for indicators and subfield codes.
09:57	berick	--force-ordered-holdings-fields (and not calling it) shaved about 10-15% from the Rust variant
09:58	Dyrcona	Stompro: RE working branches: I was going to mess about with my hacked copy. I added some code earlier this week to time the function calls. I would like to see what you did with Devel::Profile, though. I've never used it before.
09:58		sleary joined #evergreen
10:00	Dyrcona	berick: The Rust variant had the main query running for about 5.5 seconds each time was called when I used --pipe recently, at least according to our PostgreSQL logs that filled up the drive partition. I want to EXPLAIN ANAlYZE it to see if we're missing an index or if can be made to use one.
10:01	Dyrcona	I'm still going to use the Rust program this Sunday at midnight, but I've jumped the --batch-size to 10000 to get 1/10 the log entries. Hopefully, the logs don't fill up.
10:02	berick	yeah i bet bigger batch size will help
10:03	Dyrcona	This could be a case where joins are slower than subqueries for .... reasons.
10:04	Dyrcona	It could also be completely different on a newer Pg release.
10:04	Stompro	I started using NYTProf now, it is much heavier weight, but the output much more detailed. https://metacpan.org/pod/Devel::NYTProf
10:08	Stompro	Devel::Profile is much faster for quickly seeing results. NYTProf generates a 2G output file in my testing that then has to be processed into an HTML report.
10:08	Dyrcona	I think I'll go for fiaster/lighter weight. I'll read the docs.
10:09	Dyrcona	My code to log subroutine names with timestamps as they're called produces a largeish output and is likely less accurate since it spends time generating timestamps and printing them.
10:10	Stompro	All I had to do was run marc_export with "perl -d:Profile ./marc_export_testing" and it generates the profile log prof.out
10:10	Dyrcona	I'll make a patch and throw it on the LP.
10:11	Dyrcona	I mean a patch for my logging code.
10:11	Dyrcona	So, you think we should just switch from insert_grouped_field to insert_field?
10:13	Stompro	I put a diff of the changes I was testing with at https://gitlab.com/LARL/evergreen-larl/-/snippets/3615366
10:14	Stompro	Put all the 852s in an array, then once they are all added, call insert_grouped_field for the first one to get the same ordering, and use insert_fields_after for the rest with one call.
10:18	Dyrcona	Stompro: There's a simpler way to do the insert: push the fields to an array, then do the first one with shift and the rest of the array after that.
10:19	Stompro	I figured, my perl array skills need work. :-)
10:20	Dyrcona	Maybe my suggestion requires more rearrangement of the code, though. Having a firsttag flag might fit better with the current code organization.
10:20	Dyrcona	I wonder if the first one even needs to be grouped?
10:20	Dyrcona	I'm going to look at MARC::Record again.
10:21	Dyrcona	Stompro++ # For the notes in the snippets.
10:21	Stompro	In my test data, the 901 tag would be placed before the 852 without using the insert_grouped_field for the first.
10:23	Stompro	I don't think MARC::Record re-orders the fields.
10:24	Stompro	If I'm understanding where you are going with that.
10:33	Dyrcona	OK. LoC says the records are supposed to be grouped by hundreds, they don't have to be in order.
10:41		briank joined #evergreen
10:44	Dyrcona	Oh! That patch I threw on Lp is missing a local change to format the microsecond timestamps to %06d.....
10:53	Dyrcona	Heh. This branch is a mess....
10:54	Dyrcona	So, I was testing with a dump of 1 library's records. It took about 1 hour 4 minutes to run. I'll make a change based on Stompro's bug description and see what happens.
11:06	Dyrcona	OK. Here goes....
11:11	Stompro	Dyrcona, does this library have some of the bibs with large numbers of copies?
11:20	Dyrcona	I don't know. I doubt it. It doesn't seem to have made much difference so far. I'll try a larger library or the whole consortium next.
11:20	Dyrcona	It's one we do a weekly export for, so that's why I chose it to test.
11:28	Dyrcona	It does use slightly less (~3%) CPU
11:30	Stompro	In my testing, with our production data it had only a slight improvement. But it really improved the run that was stacked with bibs with lots of copies.
11:36	Stompro	acps_for_bre needs to be reworked to improve the --items performance in general. Maybe the first call just pulls in all call numbers and copies and caches them in a hash...
11:37	Stompro	Or go with the rust version that is already better :-)
11:43	Dyrcona	When I ran the queries through EXPLAIN ANALYZE, none of them were particularly slow. The slowest was the acp_for_bres query. On one particular record, it spent 40ms on a seq scan of copy.cp_cn_idx. I'm not sure how to improve a seq scan on an index, unless it can be coerced to a heap scan somehow.
11:45	Dyrcona	CW MARS has always had issues with data because of our size. They ran 2 servers for a proprietary ILS (before Evergreen and before I arrived). We've also had modified versions of some database functions in the past because the standard ones would time out.
11:46	Dyrcona	I think our db functions are all back to stock at this point.
11:49		jihpringle joined #evergreen
12:34		collum joined #evergreen
12:38	Dyrcona	Heh. Almost 1 minute longer.....
12:50		collum joined #evergreen
13:02	Dyrcona	I am testing this now: time marc_export --all -e UTF-8 --items > all.mrc 2>all.err
13:14	Dyrcona	The Rust marc export does batching of the queries by adding limit and offset. I wonder if we should do the same? I've noticed that the CPU usage goes up over time, which implies that something is struggling through the records. The memory use stays mostly constant once all of the records are retrieved from the database.
13:20	Stompro	Dyrcona, if you use gnu time, it gets you max memory usage also. /usr/bin/time -v... so you don't have to check that separately.
13:25	Stompro	Dyrcona, I'm surprised the execution time increased for you... hmm.
13:28	Dyrcona	Things are always weird here.
13:29	Dyrcona	So, i have a question about bills: Can I just deleted money.billing entries in the database?
13:29	Dyrcona	s/deleted/delete/
13:35	Dyrcona	I guess one can delete the bills.
13:43	Dyrcona	It has been running for about 1 hour and has exported 16,175 records. It takes the first 15 to 20 minutes to gather all of the bibs....
13:44	Dyrcona	It exported another 470 records in the last minute or so.
13:44	Dyrcona	Hmm.... That's still too slow, but I'm not sure if it is slower or faster than before. I'll let it run and see what happens.
13:45	Dyrcona	CPU usage keeps going up.
13:45	Dyrcona	I think paging would help.
13:52	Dyrcona	Rough estimate done in my head says that this is slower. :(
13:53	Stompro	Are you profiling this run? Seeing where the time is spent would be useful.
13:53	Dyrcona	I'm not profiling.
13:53	Dyrcona	Also, I might be off by a factor of 10 in my estimate. I'm going to get some fresh air and think about it.
13:56	Dyrcona	OK. My estimate was off. This looks like it will around 40% faster at this point, but it might speed up or my estimate might still be off.
13:57	* Dyrcona	goes out for a few minutes for some fresh air.
13:58	Stompro	Loading all the bibs into memory up front does seem like a bad idea, paging seems like a good next step.
14:00	csharp_	berick: after installing the default concerto set, notes work - everything is speedy under redis - no errors yet
14:07		sleary joined #evergreen
14:16	berick	csharp_: woohoo
14:35	Dyrcona	I've been testing Redis with production data, but not much lately. I need to write an email to ask the relevant staff here if they'd like me to update the dev/training server to use Redis on the backend.
14:37	Dyrcona	Current calculation puts it at only 20% faster, i.e. -1 day.
14:38	Dyrcona	I'm going to add a --batch-size option. If it is specified the main query will retrieve that number of records per request. I don't know if I'll get that implemented today.
14:51	Dyrcona	Looks like adding tags isn't my bottleneck My current estimate is minimal difference in performance. I'm going to let this run over the weekend to see if I'm wrong. On Monday, I'll add a batching/paging option to the main query and see if that helps.
14:52	Dyrcona	I'm sure it will since it can dump 40,000 or so records per hour when that's all there are. It currently looks like it is doing about 20,000 per hour.
15:02	Dyrcona	It seems appropriate that "Days of Rust" by INXS is playing now.
15:03	csharp_	Dyrcona: Rust Never Sleeps next?
15:05	Dyrcona	Should be, but no. I'm actually listening to an INXS playlist, so it's "The Gift" now.
15:08	Dyrcona	I should get some more Neil Young. I have the MTV unplugged set and things he did with Buffalo Springfield and CSNY.
15:36	Dyrcona	Stompro: https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dyrcona/lp2041364-marc_export_improvements
15:37	jeffdavis	Fun Friday fact: Devo's Mark Mothersbaugh came up with the phrase "rust never sleeps" - it was an ad-lib while they were playing the song with Neil Young.
15:43	Stompro	Dyrcona, I'm working on the same thing, just reading your message. I'm going to give cursors a try.
15:50		smayo joined #evergreen
15:53		gmcharlt joined #evergreen
15:54		mantis1 left #evergreen
16:02	Stompro	Dyrcona, using a cursor to process 10,000 at a time cuts the max memory usage from 1.2G to 152M, and the run time remains the same.
16:03	Stompro	That is without out --items... let me try with --items to see if it makes a difference.
16:07	eeevil	you know, I was going to say, re berick's earlier noting of improvements, "if the rust PG binding support cursors, that'd be fast to start /and/ low memory" ... but, alas, I did not. :)
16:08	eeevil	Stompro: +1 to cursors, though. def a good option for big record sets when TTFB and concurrent mem use are important
16:09	csharp_	Dyrcona: Harvest Moon is one of my favorite albums evar
16:12	berick	same here
16:12	berick	Harvest as well
16:13	berick	Stompro++ eeevil++ # looks like cursors are an option -- will give it a poke
16:17	Dyrcona	I'll give cursors a poke, too.
16:17	Dyrcona	I think I commented about "cursors and Sybase" and my early experience with them at The Jockey Club and with Horizon last week.
16:19	Dyrcona	BTW, my maximum memory usage is 9GB. I think that's my biggest issue with MARC export.
16:19	eeevil	rewindable and writable cursors, and with-hold cursors, are not as fast as not-those-types in PG, but we don't need those, generally.
16:19	Stompro	With --items, 1.3G vs 256M for max resident memory, 596s vs 473s run time. (That as compares the 852 insert changes).
16:19	Stompro	s/as/also/
16:19	Dyrcona	eeevil: It's the same in Sybase. I was told "don't use cursors," but experimentation showed that read-only cursors were fast.
16:20	eeevil	not only fast to complete, but hella-fast TTFB (barring super expensive sorting) so they feel even faster to human users
16:21	Dyrcona	Stompro: Would you mind adding a commit to my collab branch? Doesn't have to right now. I'm finishing up my day.
16:21	Dyrcona	The collab branch only has what's in main, so it doesn't include the --exclude-hidden option code.
16:21	Stompro	I'll give it a try, not sure I've ever collab'd before in that sense.
16:22	Dyrcona	As long as you don't force push, you can add to my branch. Just check it out like normal and push like normal.
16:27	* Dyrcona	is signing out. Have a great weekend, everyone!
16:39	csharp_	@band add Systemctl Reboot
16:39	pinesol	csharp_: Band 'Systemctl Reboot' added to list
16:40	Stompro	Dyrcona, I've updated your collab branch with the cursor version.
16:44	Stompro	@later tell Dyrcona, I've updated your collab branch with my cursor version.
16:44	pinesol	Stompro: The operation succeeded.
17:23		mmorgan left #evergreen
18:24		jihpringle joined #evergreen
20:11		sandbergja joined #evergreen
22:32		sleary joined #evergreen