Search in #evergreen

Results

Result pages: 1 2 3 4 5 6 7

Results for 2022-08-08

10:03	Dyrcona	Lowercase n with tilde is showing up in my console as the two single byte characters represented by its component bytes. Something isn't handling multibyte UTF8 properly somewhere.
10:04	Dyrcona	So, It's probably the load program that I had so much trouble getting Unicode to work with in the past few months.
10:09		jvwoolf joined #evergreen
10:15	Dyrcona	Defintitely coming from MARC::Record->new_from_usmarc() in the load program, regardless of whether or not I set the file handle to UTF-8 or not.
10:22	Dyrcona	MARC::Charset is up to date (1.35).
10:28	Dyrcona	MARC::Charset shouldn't be involved and doesn't look like it is. The error is not coming directly from MARC::File::USMARC::decode() either.
10:29	Dyrcona	I tried installing the latest Encode.pm and no difference.
10:30	Dyrcona	Oh, I may stand corrected: https://metacpan.org/module/MARC::File::USMARC/source#L172
10:31	Dyrcona	Bingo! Source of my error, and it is Encode.pm or the edition of Unicode available to Perl on my system.
10:32	Dyrcona	https://metacpan.org/module/MARC::File::Encode/source#L35
10:39	Dyrcona	Or, maybe it's just "The Unicode Bug..." :(
10:45	Dyrcona	Looks like I might be able to avoid this by converting the records to MARCXML first.
10:53	Dyrcona	Outside of a MARC context, I can't make decode crash on those characters.
11:16		BDorsey joined #evergreen
11:20	Dyrcona	Well, it blows up elsewhere using MARC::File::XML: 2 :129: parser error : Input is not proper UTF-8, indicate encoding !
11:20	Dyrcona	Bytes: 0xA9 0x22 0x20 0x69
11:21		jihpringle joined #evergreen
11:25	Dyrcona	@monologue
11:25	pinesol	Dyrcona: Your current monologue is at least 23 lines long.
11:39	Dyrcona	Ha! I want a goto for a valid reason. I want a label outside my main loop that I can branch to when there is an error. Otherwise, I don't want to interfere with the flow.
11:42	Dyrcona	Maybe I just need to change my loop to a do while.
11:46	Dyrcona	Well, MARC::File::XML just makes it worse. More records get spit out that way.
11:52	Dyrcona	I think the increase in errors comes from the records with invalid lengths and indicators getting mangled when converted to XML.
12:31	csharp_	Dyrcona: fwiw, I usually learn from your monologues :-)
12:31	csharp_	Dyrcona++
12:36	mmorgan	Dyrcona++
12:37	Dyrcona	Thanks!
12:41	Dyrcona	I used yaz-marcdump to convert the binary MARC to XML, and that was after I had preprocessed the file file from the vendor. So it could be that my preprocessor program is writing junk, but I can't find bad UTF-8 in it.
12:42	Dyrcona	It just looks like when going through the MARC modules, Encode suddenly doesn't like otherwise valid \XC2 and \xC3 sequences.
12:56	Dyrcona	Very interesting: If I use a program to split the binary file into records using \x1E\x1D as the input record separator and then rune the decode('UTF-8', $raw_record), I get no errors, so the issue is definitely coming from the MARC modules somehow.
12:57	Dyrcona	That's using the preprocessed file. I'll see what happens with the files directly from the vendor.
12:58	Dyrcona	Ditto... Zero errors.
13:06		jihpringle joined #evergreen
13:13	csharp_	Dyrcona: I know you've been working on this for days and have probably ruled this out, but I've seen stupid stuff where the \XC2 literal characters were themselves mis-encoded somehow
13:14	csharp_	as in literally "\XC2" where one of those was some unicode character that escaped notice
13:16	Dyrcona	csharp_: I originally thought that is what the problem was, or rather a MARC-8 \xC2 that got into the UTF-8 data. \xC2 in MARC-8 is the P with a circle sound copyright symbol, and \xC3 is the regular copyright symbol.
13:17	Dyrcona	However, the input actually has valid UTF-8 sequences and using Encode::decode on the raw data on a record by record basis does not output any errors. The errors come when MARC::File::USMARC::decode() is run on a record.
13:18	csharp_	ah
13:19	Dyrcona	Hmm... I have another idea.....
13:23	Dyrcona	I tried MARC::File::USMARC::decode() on the files from the vendor and there are no errors. When I run it on the preprocessed file, I get errors. So, my preprocessor must be doing something wrong, even though decode UTF-8 like the raw input....
13:25	Dyrcona	If I have to set the output stream to UTF-8, I'll be upset. I spent several days fiddling with that before and I swear that I got it right.....
13:27	Dyrcona	So, I'm already setting binmode on the output to :utf8. Maybe I should do :bytes or :raw?
13:32	Dyrcona	I guess :raw isn't a thing....
13:38	Dyrcona	csharp_++ again for suggesting an encoding issue. Looks like my preprocessor was double encoding some characters.
14:03	csharp_	Dyrcona: oh wow
14:11	Dyrcona	This line in the perlunicode documentation is misleading: Use the ":encoding(...)" layer to read from and write to filehandles using the specified encoding.
14:11	Dyrcona	I suspect it only applies if you're not manually decoding the data, which the MARC code does.
14:32	jeffdavis	bug 1979345 adds a new permission to govern the hold pull list; is that an OK change to include in a point release, assuming there's a release note?
14:32	pinesol	Launchpad bug 1979345 in Evergreen "Angular Holds Pull List Doesn't Scope" [Medium,Confirmed] https://launchpad.net/bugs/1979345
14:34	csharp_	I would probably wait until the next release
14:46	jeffdavis	We're going live with the new perm when we upgrade to 3.9 this weekend, I'll reconcile myself to having to renumber our permissions at some point. :)
14:48		rfrasur joined #evergreen
14:59	Dyrcona	We've had to do things like that after upgrades before because we've backported things from future releases.
15:00	Dyrcona	On the subject of MARC and encoding, just to complicate things, if you're pulling records from Evergreen via DBI as MARCXML and then converting them to USMARC to write to a file. You have to set the output stream to utf8 encoding, or you get errors reading the output file.
15:02	Dyrcona	It's always fun to relearn things like this every 3 or so years. :)
15:10	Dyrcona	jeffdavis: One thing that I often do is wait for the code to make it into master, then I cherry-pick the commits into my local branch so I have the correct id numbers and db upgrade codes. This makes generating the db upgrade script for future upgrades easier.
15:11	jeffdavis	hm, and I guess there's nothing preventing that commit from going into master even if it doesn't get backported to 3.8/3.9

Results for 2022-08-05

08:42		mmorgan joined #evergreen
09:39		Dyrcona joined #evergreen
09:40	Dyrcona	So, it looks like a batch of records from Overdrive are in whatever character set, individually: utf8 "\xC2" does not map to Unicode at /usr/lib/x86_64-linux-gnu/perl/5.26/Encode.pm line 212
09:48	Dyrcona	Hmm.. If I'm reading the table correctly, that should be a "sound recording copyright," or a P with a circle around it in MARC-8.
09:48	Dyrcona	In UTF-8
09:49	Dyrcona	In UTF-8 \xC2 is often the first of a pair, \xC2A9 is the copyright symbol, for example.
09:50	Dyrcona	So, I guess some of the records are UTF-8 and some are MARC-8, or even have a mix of characters in different character sets. I guess the question is, do I load the "bad" records?
09:52	Dyrcona	Sound recording copyright is \xE28497 in UTF-8.
09:53	Dyrcona	I suppose I could try loading the records as MARC-8 and see which leads to fewer issues. I seem to recall the whole thing blowing up when I didn't force the encoding of the MARC to UTF-8, though.
10:01	Dyrcona	Typing in a web form and my palm hit the track pad on my laptop thereby selecting all of the text in the field so that the next character that I typed replaced it all. Ctrl-z didn't bring it back....
10:01	Dyrcona	It's going to be that kind of Friday....
10:05	* Dyrcona	searches for a Perl module like chardet for Python.
10:09	Dyrcona	So, maybe, I should try PyMarc and chardet for this project.
10:13	mmorgan	Ctrl-z must be taking a vacation day :-(
10:14	Dyrcona	I suppose.
10:16	Dyrcona	I'm going to try and solve my record issue by rewriting the prep script in Python using PyMarc and chardet to autodetect the character set of each record. I wonder what chardet will say about MARC-8 records? Maybe, it will call them ISO 2022?
10:23	Dyrcona	Maybe I can keep most of the Perl code and just throw the XML at a character set detection program?
10:30	Dyrcona	Running chardet on the input files says, "utf-8 with confidence 0.99."
10:30	* Dyrcona	sighs. Guess I'll just jam the bad records in, like my current test is doing.
14:00	Dyrcona	And, that zero-width assertion that I used earlier was working to not "swallow" the space. The phono copyright symbol just takes up extra width in my terminal's font.
14:02	Dyrcona	We have a winner! $string =~ s/\xC2(?![\x80-\xBF])/\xE2\x84\x97/gu; and $string =~ s/\xC3(?![\x80-\xBF])/\xC2\xA9/gu;
14:11	Dyrcona	It seems like that took too long to figure out. :)
14:21	Dyrcona	Hm... Next Q: Is it possible to call update_leader on a MARC::Record...
14:25	Dyrcona	In my specific case, I suppose it won't matter. The program will make changes to several tags anyway before outputting the record, so the length should get updated.
14:26	Dyrcona	There is a bug related to that, and it probably affects these multibyte characters, too.
14:30		rfrasur joined #evergreen
14:37	Dyrcona	Ugh. When I modify my prep program with the substitution code, I get a ton of the "does not map to Unicode" errors that started this whole investigation.
14:39	Dyrcona	Oof.... Helps to run it on the correct files in the correct directory....
14:41	Dyrcona	OK. I'm suspicious that my output from this run is the same size as from the previous run. Seems like it should be larger.
14:45	Dyrcona	So the substitution doesn't work on raw MARC data, apparently.
14:47	Dyrcona	diff says the file I just generated with the modified prep script is the same as the old one.
14:47	Dyrcona	Yes, I'm sure I used the new script.....
14:47	Dyrcona	Nice day for ducks. Looks like we're about to get a thunderstorm.
15:10	Dyrcona	Multiline match doesn't help...
15:18	Dyrcona	Single line mode doesn't make a difference either. So, maybe the non-UTF-8 characters are coming from MARC::Record?
15:19	Dyrcona	Bleh. marc--
15:21	Dyrcona	perl-- while i'm at it.
15:31	Dyrcona	@monologue
15:31	pinesol	Dyrcona: Your current monologue is at least 28 lines long.

Results for 2022-07-11

10:47	pinesol	News from commits: LP#1981095: fix deletion of item tags in Angular item attributes editor <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=cc9c27b8647b39c44a1270b74b65bbe1a044739a>
12:34		collum joined #evergreen
12:46		collum joined #evergreen
13:03	Dyrcona	I'm looking into a cleanup of Overdrive URIs, and I've found over 35,000 asset.uri entries that don't correspond to any 856 in the biblio.record_entry.marc for record on the call number. Many of the call numbers are deleted, but some aren't. I have the fixes for dangling asset.uris in the database.
13:05	Dyrcona	They have uri_call_number_maps, but I'm thinking of deleting these just the same.
13:07	Dyrcona	"Curiouser and curiouser," said Alice.
13:08	Dyrcona	My join criteria is wrong.... :(

Results for 2022-04-05

12:48	pinesol	Dyrcona: Band 'Integer Overflow' added to list
12:53	Dyrcona	@tag 000
12:53	pinesol	Dyrcona: Must be because I had the flu for Christmas.
12:53	Dyrcona	@marc 000
12:53	pinesol	Dyrcona: unknown tag 000
12:53	Dyrcona	That's what I thought...
12:56	Dyrcona	Oof. Looks like indicators are bonzo on that record with the 000 tag as well. The "catalog" must be broken.
12:57	Dyrcona	Heh: 520 C $a limb aboard....
13:00	Dyrcona	@marc lea
13:00	pinesol	Dyrcona: unknown tag lea
13:01	Dyrcona	Yeah, just for giggles. This one looks the leader was "copied" to a tag "lea" with indicators e & r, followed by the leader.
13:02	JBoyer	"The best part is that you can take a plain text marc record from one system and paste it into another!"

Results for 2022-03-23

09:52	Dyrcona	derekz: Adding a patron group for this library is the better solution. It seems to me that your circulation rules are too specific if you have to do all that work. The hold and circ rules cascade much like CSS, so simpler is better.
09:53	Dyrcona	Philosphy: IMNSHO, it's better to have a bunch of generic rules that apply to the majority of cases at the largest number of org units, then you make specific exceptions from there.
09:54	Dyrcona	Avoid using permission groups if possible.
09:55	Dyrcona	Unrelated: "Vendor MARC records" seems to be a synonym for "garbage."
09:57	derekz	Dunking on "Vendor MARC records" is a perfect distraction and apropos for March
09:58	Dyrcona	Well, I just got a batch to load that produce character warnings regardless if I process them as UTF-8 or MARC-8, so they're likely in some other character set.
10:01	Dyrcona	derekz: Writing a script as a QND solution is ... viable(?). In the long run, you'll want to eventually do the work with the circ and hold matrices. Unfortunately, there's nothing more permanent than a temporary solution.
10:01	stephengwills	perfect segway. I have a new school library that just started importing vendor records and, during this same few days postgres just started getting goosed by oom-killer. Any chance that not a coincidence? is vandelay known to consume lots of postgres memory?
10:02	Dyrcona	stephengwills: I don't use vandelay much. I'm just throwing garba... ahem.. MARC at the database via DBI.
10:03	stephengwills	I have a time/funding issue and am too paranoid to allow anyone else to touch the database directly. ;)
10:03	Dyrcona	Why do I suspect that this batch of records is a mix of UTF-8 and some Windows code page or 3?
10:05	stephengwills	they started using vandelay instead of having to wait for me. which, in theory, isn’t unreasonable.
10:40	Dyrcona	Thank you, chardet! Windows-1254 with confidence 0.51855302355
10:40	Dyrcona	Now, can yaz-iconv convert that to UTF-8?
10:42	Dyrcona	Argh! Somethng's wrong with my processor... I get "utf-8 with confidence 0.99" on the original. I swear I worked that all out a couple of weeks ago!
10:43	Dyrcona	marc--
10:43	Dyrcona	@blame MARC
10:43	pinesol	Dyrcona: It's all MARC's fault!
10:59	Dyrcona	The real issue isn't MARC so much. It's how Perl handles character sets and "binary" data.
11:00	Dyrcona	My other prep program doesn't seem to have this problem.
11:01	Dyrcona	Also, it would help if vendors would set the leader properly.
11:02	Dyrcona	Maybe it's time for lunch?
11:12	Dyrcona	Or even make the preprocessor part of the loader.
11:15	Dyrcona	Or, maybe just switch to Bmagic's loader that i've been meaning to test.
11:18	Dyrcona	This is still too "hands on" for my taste.
11:28	Dyrcona	@marc 022
11:28	pinesol	Dyrcona: The ISSN, a unique identification number assigned to a continuing resource. (Repeatable) [a,y,z,2,6,8]
11:28	Dyrcona	@marc 028
11:28	pinesol	Dyrcona: The formatted number used for sound recordings, printed music, and videorecordings. Publisher's numbers that are given in an unformatted form are recorded in field 500 (General Note). A print constant identifying the kind of publisher number may be generated based on the value in the first indicator position. (Repeatable) [a,b,6,8]
11:28	Dyrcona	@marc 024
11:28	pinesol	Dyrcona: A standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (International Standard Book Number), 022 (International Standard Serial Number) , and 027 (Standard Technical Report Number)). The type of standard number or code is identified in the first indicator position or in subfield $2 (Source of number or code). (Repeatable) [a,c,d,z,2,6,8]
11:29	Dyrcona	Anyone know where UPCs usually show up?
11:29	Dyrcona	@marc 026
11:29	pinesol	Dyrcona: Used to assist in the identification of antiquarian books by recording information comprising groups of characters taken from specified positions on specified pages of the book, in accordance with the principles laid down in various published guidelines. (Repeatable) []
11:29	Dyrcona	@marc 025
11:29	pinesol	Dyrcona: A number assigned by the Library of Congress to an item that was acquired through one of its overseas acquisition programs. (Repeatable) []
11:39	Dyrcona	Looks like 024.
11:46	rhamby	yeah, should be 024s though the indicator for it is rarely set in my expeirence

Results for 2022-03-18

09:06		Keith_isl joined #evergreen
09:30		jvwoolf joined #evergreen
10:15		terranm joined #evergreen
10:29	Dyrcona	If I want to find a record in the database that has a MARC tag with two particular subfield values, there doesn't seem to be a quick way to do that with a double join on metabib.real_full_rec and even then, I'm not guaranteed that they're in the same tag. Am I missing something?
10:36	Dyrcona	I could probably add a "search" index using xpath or something, but I always get lost in a maze of twisty config and metabib tables when I do that.
10:43		rjackson_isl_hom joined #evergreen
10:45	Dyrcona	Thought I had an example of that in my old code, but I can't find it. I know that I created such things in the database in the past.
10:54	Dyrcona	berick: Yeah, but I'm trying to avoid a join like that.
10:55	Dyrcona	I'd probably have to join mravl and crad anyway.
10:57	Dyrcona	I don't even need a join, yet. There won't be any other records that would match this tag and a particular subfield value, but I can't guarantee that will always be the case (though it very likely will be).
10:58	Dyrcona	Plus, the field will have dates in it, and I should also use those dates in my query.... This is MARC tag 583 for anyone who is curious.
11:02	Dyrcona	I'm trying to come up with a query to get bre ids to pipe into marc_export. If I end up having to deal with dates, then I may just have to write a custom export to pick the marc apart in Perl.
11:03	Dyrcona	I don't even need the query today. I'm trying to figure out something that might work, so I can estimate how long it will take to implement.
11:04	Dyrcona	@marc 583
11:04	pinesol	Dyrcona: Contains information about processing, reference, and preservation actions. (Repeatable) [a,b,c,d,e,f,h,i,j,k,l,n,o,u,x,z,2,3,5,6,8]
11:08	Dyrcona	I basically want to be able to find records where a single 583 has $f = some value $5 = someother value $c < today's date and $d > today's date (if $d is even there)
11:09	Dyrcona	Oh, and a bunch of other criteria, like a certain member library has a non-deleted asset.copy entry with a specific circ modifier...

Results for 2022-03-14

12:43	Dyrcona	It could be an entirely different character set, too, but I got pretty much the same error as you.
12:43	Bmagic	right, we've all had this pain. I've posted here over the years about it. This project is slightly different, and my understanding of the issue has matured over the years
12:44	Bmagic	I've got the file open in MARCEdit. I'm seeing lots of &#xA0 and {acute}
12:45	Dyrcona	So, I'd recommend setting the leader to UTF-8 and seeing what happens. You can do that with $marc->encoding('UTF-8');
12:46	Bmagic	right, let's see
12:48	Bmagic	that did the trick. I would like to try it one way, catch an error, then handle it the other way. Can I "get the message" that MARC::Record bombed? I'm just seeing a screen dump, but my program goes on
12:50	Dyrcona	Bmagic: You're getting a warning, I think from MARC::Charset? You can do some stuff with signal in Perl to make the warnings fatal and then use an eval block to trap them.
12:52	Bmagic	ah! I'll look that up, thanks
12:53	Dyrcona	Here's an example using eval {...}; if ($@) { ... } to log errors: https://pastebin.com/g4RGDJLr
12:53	Bmagic	Dyrcona++
12:57	Dyrcona	In that example any fatal errors that happen inside the eval {} get handled by the code in the if ($@) {}.
12:57	Bmagic	Also: I see that you're reading the file raw with a separator "\x1E\x1D". I suppose that's the best way? You find that reading the file yourself (instead of MARC::Batch) is better?
12:57	Dyrcona	eval does two different things depending on how you use it.
12:58	Dyrcona	Bmagic: MARC::Batch usually works. I do tend to read the file manually because I've had issues in the past with records containing "smart" quotes. One of them looks like the the end of record character.
12:58	Bmagic	I'll see if this error (now that I've narrowed it down to a particular record) will cause "die" or not
12:59	Bmagic	your example forces a die when @warnings
12:59		jihpringle joined #evergreen
13:00	Dyrcona	Right. But not all warnings, just warnings from MARC::Record.
13:00	Dyrcona	I'm not sure the MARC::Charset warning shows up there, but you can try it.
13:00	Bmagic	This program I'm writing needs to be pretty hardy. Handling millions of records every year. So, I think I'll go with the manual reading of the files with raw, like what you've got there
13:02	Dyrcona	Suit yourself. MARC::Batch and MARC::File go to great lengths to read in otherwise garbage records, but they blow up sometimes on otherwise well-formed MARC.
13:05	Bmagic	haha, so, it's more* hardy to use MARC::File/Batch.... Maybe I go with MARC::File.... and if it breaks, catch it and go RAW manual. off to find a file that blows MARC::File
13:06	Dyrcona	The problem I'm fixing by reading the records the way that I do comes from copy/paste cataloging.
13:06	Bmagic	yeah, I bet I can copy/paste a smart quote into a record and produce this issue
13:10	Bmagic	Dyrcona++
13:12	Dyrcona	I thought tsbere submitted a patch for that one, but it didn't work. Then again, I also think tsbere opened a separate bug. I don't know where that bug has gone.
13:13	Dyrcona	I spent a little time working on tsbere's patch, but kind of gave up.
13:15	Dyrcona	Oh, right. tsbere made a PR on github: https://github.com/perl4lib/marc-perl/pull/4
13:16	Dyrcona	Too many bug trackers....
13:18		JBoyer joined #evergreen
13:18	Bmagic	It would be nice if MARC::File would just handle it

Results for 2022-03-04

09:55		tsadok_ joined #evergreen
10:01		stephengwills joined #evergreen
10:01		stephengwills left #evergreen
10:14	Dyrcona	Why is working with MARC so difficult? Now, I'm trying to dump some records from the database to a binary MARC file in UTF-8, and of course, its mangling the "fancy" characters. You'd think I'd have this down pat by now.
10:16	Dyrcona	No, if I can just keep my fat palms off of the touchpad.... (I keep "clicking" inadvertently in another window.)
10:17	* csharp_	removes offensive trolling from the IRC logs
10:18	Dyrcona	OK, when pulling records from the DB, use (BinaryEncoding => 'utf8') on the use MARC::File::XML line, and if using IO::File to write the output do $fh->binmode(':utf8');
10:19	Dyrcona	csharp_: I'm half curious to see the trolling because I missed it, but I hope you're not referring to my monologues. :)
10:21	csharp_	oh, I deleted those too :-)
10:21	csharp_	I KEED I KEED

Results for 2022-03-02

10:24	Dyrcona	I was trying to find an Emacs command or function to make a file: URL from a path. Doesn't look like such a thing exists, but after 15 minutes of search the function help and online, I decided, "I could have implemented it by now."
10:29	Dyrcona	I still haven't implemented it, but it might be a handy thing to have.
10:35	Dyrcona	Of course, there's a Jabber/XMPP library for Emacs..... :)
10:44	Dyrcona	MARC::Charset apparently does not like EM DASH: \xE2\x80\x94.
10:50	Dyrcona	The message suggests that triplets beginning with \xE2\x80 are the problem: no mapping found for [0x80] at position 24
10:50	Dyrcona	gmcharlt ^^ Should I file a bug on MARC::Charset?
10:55	Dyrcona	FWIW: I'm using this program to load some binary MARC records: https://pastebin.com/g4RGDJLr
11:01	miker	Dyrcona: doesn't like going from MARC8 to UTF-8?
11:06	Dyrcona	The file is UTF-8.
11:08	miker	hrm... that's strange ... could the records be claiming MARC8? IIRC we do look at the leader, but I think there's a way to force the issue
11:20	Dyrcona	Now, my dump program is complaining about wide character in print. It didn't before I set the encoding....
11:29	miker	are you calling MARC::Charset->assume_unicode(1); before processing the records? (that's the "force the issue" option)
11:34	Dyrcona	miker: No. You can see the the program that I'm using to load the records. I haven't shared the preprocessor, yet.
11:35	Dyrcona	I was going to ask if anyone in here has used pymarc much. (I've looked at it.) Python usually has better charset handling that does Perl, and I wonder if anyone has used chardet to detect the charsets used by MARC records.
11:35	Dyrcona	I've found all kinds of crap in MARC from 3rd parties.
11:36	miker	right on. fwiw, many of our scripts and db functions use both assume_unicode(1) and ignore_errors(1). see the top part of Open-ILS/src/extras/import/marc_add_ids for instance
11:38	Dyrcona	Well, I've set the 09 in the leader to 'a' for this batch. When I open the file in Emacs it looks like UTF-8.
11:39	Dyrcona	I'm running it again in another db with a fresh copy of production. I reloaded them after yesterday's tests.
11:44	Dyrcona	Looks like you're only expected to use it on output.
11:45	Dyrcona	The documentation needs to be updated or PyMarc does: "When I can require python 2.3, this will go away."
11:47	Dyrcona	I might play with this some time, but for now I'm sticking to Perl.
11:50	Dyrcona	It's funny that my preprocessor, using MARC::Record, has no problem with these records, but I guess I'm not asking it to do any charset conversion. Well, it has no problem since I fixed the double encoding bug. :)
11:58	Dyrcona	IIRC, I think I had to set the charset in these records yesterday, but I forgot when I ran the preprocessor after making changes to it this morning.
11:59	Dyrcona	Yeah, it has passed the author with the different characters for the first letter of the last name with no warnings or errors.
12:03		jihpringle joined #evergreen
14:33	pinesol	csharp_: go with explicit
14:35	Dyrcona	I only have those git add problems when I add things to git on the servers. :)
15:11	gmcharlt	Dyrcona: re MARC::Charset, yes
15:13	Dyrcona	gmcharlt: I'm not sure it's a bug in MARC::Charset, now. It's bad data combined with user error/forgetfulness.
15:13	gmcharlt	Dyrcona: ah, OK
15:13	Dyrcona	I thought the records were specifying UTF-8, but they weren't, even though there were in fact encoded in UTF-8.
15:14	Dyrcona	s/there/they/ # It's getting late and the fingers are tired... :)

Results for 2022-03-01

09:24	* Dyrcona	mumbles "smart quotes...."
09:27	Dyrcona	Also, just plain junk in these records.
09:33	Dyrcona	I wonder if we're using an outdated Unicode standard?--That "we" is meant to be vague, i.e. not necessarily Evergreen.
09:39	Dyrcona	So, it looks like the preprocessing script does something to the records. Maybe I need to tell MARC::Record not to mangle the characters, somwehow?
09:56	Dyrcona	Or, maybe it doesn't.... Comparing dumps of the processed versus raw records, the relevant bits of the busted records look the same.
10:08	Dyrcona	I wonder if converting the records to marcxml will make a difference?
10:10	Dyrcona	So, do I convert them with yaz-mardump or with MARC::File::XML?
10:15	Dyrcona	Right, so when I use yaz-marcdump to convert the input records to marcxml, my editor shows the characters correctly. For some reason, I think my editor is treating the dumps as latin-1, even when I tell it they are UTF-8. What I suspect is MARC::Record and friends are mangling the characters because I'm working with binary MARC.
10:16	Dyrcona	I have little proof, other than what I see in the files through my editor, and that the records get mangled by my preprocessor Perl program.
10:17	Dyrcona	I will adapt my preprocessor and loader to work with marcxml and see what happens.
10:18	Dyrcona	BTW, the input records say they are UTF-8 in the leader.
10:18	* Dyrcona	quacks.
10:19	Dyrcona	Hmm... Should I use MARC::File::XML on these records, or should I use LibXML? I can do what I want with either.....
10:23	* Dyrcona	should write a MARC mode for Emacs. It couldn't be that hard... :)
10:26	Dyrcona	@monologue
10:26	pinesol	Dyrcona: Your current monologue is at least 15 lines long.
10:28	Dyrcona	So, yeah, the MARC::Record code that I'm using is mangling the characters.
10:30	Dyrcona	When I tell Emacs to use UTF-8 with one of the files, I get this: "...encountered characters it couldn’t encode..." followed by a list of characters that won't paste into my IRC client.
10:30	Dyrcona	The error message is much more detailed.
10:31	Dyrcona	I open the original MARC file and the mangled characters show up correctly in Emacs.
10:31	Dyrcona	Proof!
10:55		rjackson_isl_hom joined #evergreen
11:01	Dyrcona	I wonder if the problem is how I'm reading the binary files. I open it via IO::File with the record separator set to \x1e\x1d because records with smart quotes would break with MARC::Batch or MARC::File::USMARC. After I get the raw MARC, I feed that to MARC::Record. I suspect that is where the breakage occurs.
11:01	Dyrcona	Could be that I need to decode the data before passing it to MARC::Record?
11:01		rjackson_isl_hom joined #evergreen
11:01	Dyrcona	Or would that be encode?
11:03	Dyrcona	Well, I can try it and see.

Results for 2022-02-28

13:55	jeffdavis	I think PINES has the fixes for those bugs so I am assuming this is a 3.8-specific issue, probably with the new holdings editor?
13:59	jeffdavis	well not necessarily the holdings editor, forget that bit
14:59		collum joined #evergreen
15:01	Dyrcona	Anybody ever seen this one before: Use of uninitialized value $code_wanted in string eq at /usr/share/perl5/MARC/Field.pm line 314, <GEN1> chunk 1.
15:05	Dyrcona	Ok. Figured it out: Can't call method "subfield" on an undefined value at /home/opensrf/scripts/prep-od-advantage line 76, <GEN1> chunk 9.
15:06	Dyrcona	I have I tried getting something out of the record that doesn't exist, and didn't properly check for its existence before trying to use it.
15:52	jeffdavis	I spoke too soon! no children available for open-ils.actor (1442 warnings)

Results for 2022-02-25

09:34	JBoyer	Dyrcona, something else to consider from that message is whether or not you have any local MODS or other xslt transforms.
09:37	Dyrcona	JBoyer: I think we do, so I'll check that. Thanks for the suggestion.
09:47	jeff	chopPunctuation, chopPunctuation, chopPunctuation... heh.
09:49	Dyrcona	Yeah, I haven't looked but I suspect a busted field in the MARC. Maybe i should dump it now before someone changes it.
09:54	jvwoolf	Dyrcona: Before I forget again, I wanted to say that we tested the patch in 1482757 and it worked fine. We've got it running in production now.
09:54	jvwoolf	Let's see if I can get that to link correctly - lp1482757
09:54	Dyrcona	Lp 1482757
09:55	* JBoyer	shakes fist at anything case-sensitive that's not a password
09:55	Dyrcona	jvwoolf: It works fine for me, too. It just doesn't speed things up in a noticeable way. Also, this process is still really slow on Pg 12+.
09:56	jvwoolf	Dyrcona: It sped up importing eresources pretty significantly for us
09:56	Dyrcona	So, just dumping the MARC to the screen I think I see the problem. There's a field that ends with a lot of blank spaces.
09:56	Dyrcona	Well, subfield....
09:57	jvwoolf	Also, we removed the 30 million deleted URI call numbers and our call number reports work again. We also haven't had any drone timeouts since then, but that could be a cooincidence.
09:57	Dyrcona	jvwoolf: I guess, but I was looking for improvement on later Pg versions, which that patch doesn't do. If you want to sign off, feel free.
11:16	Dyrcona	So, that would be 444 templates?
11:16	Dyrcona	jeff: A qualified yes on sharing it. A definite yes on I still have the record.
11:17	Dyrcona	I should ask before I share it for reasons.
11:21	Dyrcona	Apparently, it's an item that no one else is likely to have. It's MARC type a (a book?) about the construction of one of our libraries. Looks to be a gift from the construction company.
11:28	Dyrcona	Of course the spaces don't show up in the staff client.
11:29	Dyrcona	The 300 field also looks a little messed up in the editor. Subfield a is just ";"
11:36	Dyrcona	Page count is missing.
11:59	Dyrcona	Just to confirm: It blows up on any of the mods transforms. (mods3 gives a missing file error. I should look into that, but doesn't look like we use mods3.) marc21expand880 works, but it doesn't strip out the extra spaces.
12:00	Dyrcona	jeff: Yeah. that's what it looks like.
12:00	Dyrcona	It's obvious in the marcxml. Not so obvious elsewhere.
12:01	Dyrcona	I should probably use MARC::Record to fix it so that the length is updated correctly, but I should be able to just subtract 222 from the current value.
12:03	jeff	I have mixed feelings about trying to maintain size in a marcxml record. :-)
12:04	jeff	we mostly (and should always) ensure that it's updated/correct on conversion from marcxml to binary MARC format.
12:04	Dyrcona	Some of our vendors have strong opinions about it.

Results for 2022-02-15

08:36		mantis1 joined #evergreen
08:38		mmorgan joined #evergreen
09:12		Dyrcona joined #evergreen
09:19	Dyrcona	If I get this message while loading MARC records "no mapping found for [0x80]": a) is that a warning or an error (i.e. does the record load anyway) and b) anyone had to fix these before and got any tips that might save me an hour or so of fumbling around?
09:20	Dyrcona	The records in the file claim to UTF-8, but we all know how that works out in reality.
09:26	Dyrcona	Well, I can say that those messages are not making it to MARC::Record->warnings. Because I would log those to my log file.
09:28	Dyrcona	Grr. Because I botched the shortname in 856$9, they're not showing up in my custom view....
09:29	Dyrcona	That means I can't get a quick count.
09:30	Dyrcona	I can also say that the messages didn't trigger my error handler in the eval because there's no error log.
09:31	pinesol	Dyrcona: Vendor records is probably integrated with systemd
09:35		jvwoolf joined #evergreen
09:39	* Dyrcona	reloads the database to give it another go.
09:49	Dyrcona	Think I'll redirect stderr to a file. I'm not sure where these messages are coming from, either when I create the MARC::Record from the raw marc data, or when doing the insert.
09:49	Dyrcona	Most likely the former.
09:58	JBoyer	Bmagic_, I'm told the cert for the MOBIUS bugsquash server could use a refresh sometime.
10:01	Bmagic_	oh! will do
10:41	csharp_	jvwoolf: re: slow reports, have you done any query analysis to see what it's doing? (e.g. EXPLAIN/EXPLAIN ANALYZE)
10:41	csharp_	note that EXPLAIN ANALYZE actually runs the query, so if it's timing out, that may not work
10:55		rfrasur joined #evergreen
10:59	Dyrcona	So, back to my MARC::Charset thing from earlier. I basically copied the example from the MARC::Lint manpage and ran that on my MARC input file, and it doesn't report any character set issues, though it does complain about punctuation and the subfield 9 in the 856.
11:06	Dyrcona	Oh, interesting. It looks like MARC::File::USMARC->next returns raw MARC and doesn't create a MARC::Record object.
11:09	Dyrcona	On, never mind. It does. I was looking at _next().
11:09	csharp_	Dyrcona: I've seen that kind of thing when processing records with garbage characters too
11:10	csharp_	pretty sure were able to find them in the text and replace/remove them
11:11	Dyrcona	If it looks like single characters, I may just use sed or something like that on the file.
11:12	csharp_	yeah - pretty sure we had a cataloger load the file in MARCEdit at some point and do a simple find/replace
11:12	csharp_	that was back before I was more comfortable with regexes
11:13	Dyrcona	So, I don't see the bad character message from MARC::File::USMARC because it builds the MARC::Record differently. It adds the fields as it finds them, does some checking of its own. My program reads the raw MARC from the file and does MARC::Record->new_from_usmarc().
11:14	Dyrcona	Some of these look like "smart" quotes. Probably Windows-1252, again. :(
11:14	csharp_	yeppers
11:14	Dyrcona	I wish people would learn that you can't just copy and paste into a MARC record.
11:14	csharp_	edit in Word, paste into MARC
11:15	csharp_	WYSIWY(don't actually)G
11:15	Dyrcona	:)
12:51	Dyrcona	At Pg 11, most things start getting faster, but others seem to get hit.
12:52	Dyrcona	We do a dump and restore at least whenever we get new hardware.
12:52	Dyrcona	I do them weekly to keep some test databases up to date. Anyway, getting off topic.
12:54	Dyrcona	Going back to my record load/character issues, the message isn't coming from MARC::Record, either. I've got a program to dump the 856s that uses the same code to read the records and it doesn't peep.
12:56	Dyrcona	I wonder if I can just convert the raw MARC data or if I need to do it field by field....
13:04	Dyrcona	Now, it's just looking like some garbage and not necessarily Windows 1252... Probably a mix.... :(
13:07	Dyrcona	I think I know the source of the messages though. It looks like they're possibly coming from NFD() when the MARC is cleaned to go in the database. I can check that hypothesis real quick.
13:09	Dyrcona	And, no. That's not it, either.
13:11	Dyrcona	g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.26.1/MARC/Charset.pm line 308, <GEN1> chunk 1752.
13:18	Dyrcona	Still don't know where that's coming from. I'm not using MARC::Charset, and doesn't look like the modules that I use do either. It must be the database throwing that at me....
13:57	JBoyer	Dyrcona, MARC::XML and friends are used in the ingest trigger functions, so depending on where you're actually seeing that message output the database is a likely source.
13:59	Dyrcona	JBoyer: Yeah. It doesn't seem to be coming from anything running in my Perl code outside of the database, but I didn't think warnings from the database would just show up in my output.
14:00	Dyrcona	Oh, never mind. They will because of the DBI options.
15:40	Dyrcona	Aight, so it is a mangled UTF-8 "smart quote." The sequence should probably be 3 bytes: \xe2\x80\x9d.
15:41	Dyrcona	Ah, this character appears before it: â
15:43	Dyrcona	Which is \xe2.....
15:43	Dyrcona	MARC::Charset is apparently not dealing with it correctly, or I need to update my Unicode support....
15:54	Dyrcona	Ugh.... I somehow killed the program.... Not sure what I did.
16:00	Dyrcona	gmcharlt: It looks like MARC::Charset doen'st handle \xe2\x80\x9d correctly.
16:02	Dyrcona	I wonder what happens if I replace those in Perl with a "?
16:11	Dyrcona	Also, if the file is UTF-8, and that is a valid UTF-8 sequence, what's MARC::Charset got to do with it?
16:12	Dyrcona	I'ma just leave it alone and reload the file again tomorrow after a db reload.
16:13	mmorgan	Going home and coming back again is always a good approach :)
16:14	Dyrcona	Unfortunately, I am at home.
16:14	Dyrcona	And, I'll be tomorrow, too. :)
16:15	mmorgan	There's always turning it off and on again!
16:15	Dyrcona	I don't think this should be my problem. Things actually look good for a change with the data. I was blaming the vendor, but it looks like MARC::Charset and/or Evergreen's use of it is at fault. Unless this one record isn't set to UTF-8 or something when it should be.
16:16	Dyrcona	Also, I have little context for the warnings. I'd have to search the MARC for the offending text/codes.
16:16	Dyrcona	No record number, which tells me this isn't coming from my code because all errors and warnings are logged with the record number from the file.
16:23		abowling joined #evergreen
16:24	abowling	after a 3.7 update, links in 856 are no longer appearing in the opac. the library has custom templates, but i diffed the relevant ones and it seems nothing has changed. any ideas on what i might be misssing?

Results for 2022-02-07

09:09		terranm joined #evergreen
09:14		jvwoolf joined #evergreen
09:45		Keith-isl joined #evergreen
11:16	Dyrcona	So, back to that MARC subfield lookup issue I had on Friday. If I use naco_normalize() on my input string and naco_normalize() on the subfield text, then I get the matches that I expected. Only the second naco_normailize is necessary, but with both I don't have to make my input all lowercase, etc.
11:39	Dyrcona	My program now does what I expected, it normalizes 347$b values so that they are all the same for a given material type.
11:41	Dyrcona	All of our variations on "4K Ultra HD Blu-ray" (usually missing a word here or there) look the same.
12:26		jihpringle joined #evergreen

Results for 2022-02-04

16:29	jeff	if the normalization is causing problems only with your change attempt, and the problem is that the normalization makes it difficult to find the relevant records, could you change your approach to use mrfr metabib.real_full_rec to find possibly relevant records, then parse their marcxml to determine the non-normalized value in 347$b?
16:29	jeff	(you probably already have that idea or better)
16:30	Dyrcona	jeff: It's pretty simple. I'm searching mrfr using the value string converted to a tsvector to find the records that I want to update.
16:31	Dyrcona	I'm using that string, preconversion to look for matching 347$b in the MARC::Record. It's the $subfield eq $str that it is failing, basically.
16:32	Dyrcona	If normalize $subfield, then it will match, but then I'll be stuffing a different, normalized value into the MARC and our catalogers might not like that.
16:33	Dyrcona	Turns out, too, that it's only 49 records, so I may just ask the cataloging center to fix them.
16:34	Dyrcona	Now, I want to make t-shirts and wrist bands with "WWJSD" on them: What would Jon Skeet do?
16:36	Dyrcona	I'm tempted to stuff this code into my private scripts repo anyway.

Results for 2022-01-11

09:23	Dyrcona	So, I found out that a full ingest is pretty much required going from 3.5 to 3.7. Titles and authors were not showing the bootstrap OPAC on our training server until I ran pingest over the weekend.
09:36	miker	Dyrcona: that's ... strange. missing data in metabib.display_entry, somehow?
09:38		jvwoolf joined #evergreen
09:50	Dyrcona	miker: Yes. The TTOPAC pulls titles and authors from the MARC, IIRC. BooPAC uses display fields.
09:53	mmorgan	JBoyer++
10:50	mantis	We're on 3.6.5 using Angular for the staff but still using TPAC for the OPAC. When accessing the Patron View button on conjoined items, we get a server error. This however works when Boopac is enabled on our other test servers. Is this a known bug?
11:06	Dyrcona	mantis: IDK, but if it is not on Lp, then it's probably not widely known.

Results for 2021-10-13

11:03	rhamby	right
11:11	Dyrcona	Well, I like automating things because automated mistakes usually lend themselves to automated fixes.
11:12	Dyrcona	csharp_: I work with a project that includes a submodule. If you have questions, let me know. I might be able to help.
11:13	Dyrcona	So, I'm looking at MARC export, and I think it's a bug that it exports holdings in 852. I think it's supposed to be 952.
11:31	Dyrcona	Eh, maybe that isn't a bug. I should have looked at the full description again. ;)
11:33	Dyrcona	Hmm.. The query that I'm working on is going to be more complicated than I first thought...
11:34	Dyrcona	Or, maybe not. I probably don't have to include locations for deleted copies.
11:35	Dyrcona	Thanks, rubber ducky!
11:44	mmorgan	__(')<
12:02	Dyrcona	Wonder if I messed up, or if there are really that many copies: Record length of 2216371 is larger than the MARC spec allows (99999 bytes). at /usr/share/perl5/MARC/File/USMARC.pm line 314.
12:03	Dyrcona	mmorgan++
12:03	Dyrcona	Guess I'll convert to XML and have a look after it finishes.
12:05	Dyrcona	Heh. I messed up my query....
12:50	Dyrcona	Always fun when you get the submodule out of sync with the main code. :)
12:52	Dyrcona	Here's the project I'm talking about: https://github.com/Dyrcona/openfortigui
12:54	Dyrcona	Ugh. Looks like this code might be too slow to be useful.
12:57	Dyrcona	When it was just straight up dumping MARC, it took about 10 to 15 minutes to dump 48,000+ records. It has been running for about 45 minutes now and only dumped 1,296 records with holdings. I should probably modify the main query to return the marc and copy info.
13:05	Dyrcona	Wonder if I can array_agg over an array_agg?
13:22	Bmagic	Dyrcona: I've got some perl that dumps records in parallel, I've seen it hit 300 records/second
13:24	Bmagic	IIRC, 8 threads. Mind you, it's not using the perl "threads" module because Encode.pm. Instead it launches a system command and monitors a mutual file on the fs
13:48	Dyrcona	I don't bother with threads in Perl 5. I use fork.
13:49	Dyrcona	Anyway, I think I've got a solution. Rather than run this time consuming query once for each record, I'll do it once for all records and make a hash table of the information per record id.
13:49	Dyrcona	If I get the options right, I can probably have selectall_arrayref make the data structure for me.
13:53	Dyrcona	My program basically works like this: Get a list of bre.id using one of 3 queries. After that loop through the array of ids and grab the marc for each one. If this is a batch of deletes, set the leader 05 to d and write to the binary output file. Otherwise, delete the 852 tags in the marc, look up the copy location and org unt name for each copy and add a 852 to the marc for each.
13:53	Dyrcona	Then, write it to the output file.
13:54	Dyrcona	It got really slow when I added the copy location/org_unit query.
14:02	Bmagic	I had some of the same challenges

Results for 2021-09-17

13:26	Bmagic	small ones tend to load completely. But this interface is still Dojo, so I don't imagine anyone is interested until it's Angular
13:41	pinesol	[opensrf\|kenstir] Fix LP#1883169 by using growing_buffer - <http://git.evergreen-ils.org/?p=OpenSRF.git;a=commit;h=a3368f9>
13:42	jvwoolf	Interestingly, we had one with 6 items fail to load
13:44	Dyrcona	jvwoolf: How big is the MARC associated with those?
13:45	Dyrcona	I'm pretty sure there is some MARC pulled over as well, though I might be thinking of something else.
13:45	Dyrcona	Oops!
13:45		jvwoolf joined #evergreen

Result pages: 1 2 3 4 5 6 7