Evergreen ILS Website

Search in #evergreen

Channels | #evergreen index


Result pages: 1 2 3 4 5 6 7

Results for 2022-11-28

08:39 mmorgan joined #evergreen
09:06 Dyrcona joined #evergreen
09:32 Dyrcona Is there an Angular MARCEdit component or does the client still use the AngularJS interface?
09:33 Dyrcona Ha! There is literally "marc-edit.component.{html,ts}" in eg2/src/app/cat/authority.
09:37 Dyrcona What I'm really looking for is in Open-ILS/src/eg2/src/app/staff/sh​are/marc-edit/marc-edit.module.ts
09:39 mmorgan Dyrcona: There's a MARC Edit option in the angular staff catalog from the full record.
09:49 Dyrcona mmorgan: Thanks! I found the files that I was looking for.
09:51 Dyrcona CWMARS has a customization to make "Local System" the default source for new bib records. I'm looking at that for the Angular editor. I'm considering adding YAOUS and putting this on Lp, but first, I'm trying to determine if it would be necessary.

Results for 2022-10-28

11:47 berick updating ticket now
11:47 gmcharlt ah, OK.
11:56 Dyrcona miker: The duplicates may have come from the orphan ingest catching up with deletes, and I loaded a file that was meant to replace existing records, and yes, there very well could be duplicates in the input file. My program won't insert duplicate records or duplicate URIs. It would update the previously inserted record, however.
12:02 Dyrcona By "deletes" I actually mean updates. I've got a script to remove located URIs from the MARC. That gets the MARC into a MARC::Record, removes the matching 856, then updates the bre in the database.
12:02 Dyrcona Probably got caught up with an incoming record. This probably would not happen under normal circumstances.
12:13 Dyrcona miker: Yeah, that's what happened. I've confirmed that all of those records had a URI removed and then later a new one was added.
12:18 dmoore Howdy all, I'm new to Evergreen and will be hanging around for a bit as I learn it. Coming from an Alma/Primo setup, so I'm glad to be part of an open source community now

Results for 2022-10-24

13:12 miker Dyrcona: boo, lame. did anything show up in the PG logs? (and, I assume, this run includes the recent updates)
13:18 Dyrcona miker: I'm not finding anything.
13:52 Bmagic has anyone had issues with report templates not working post-3.9 upgrade? But only certain ones. Seems to have to do with shelving locations. Same for item templates?
13:54 Dyrcona I swear there was a bug for MARC::Record not calculating record lengths correctly with multibyte sequences. I also swear that was fixed, but I think I'm seeing it. I can't find a bug, however.
14:01 jihpringle Bmagic: we upgraded to 3.9 in August and we haven't had any reports of template issues
14:02 Bmagic jihpringle++
14:02 jihpringle I think the only existing templates that we re-did for the upgrade were the ones related to patron notes, alerts, blocks

Results for 2022-09-27

11:18 BAMkubasa thanks!
11:19 berick hm, our main metadata schema is MARC.  MODS is used in some places for extracting specific bits of information.
11:20 BAMkubasa ok
11:20 * Dyrcona assumed MARC is one of the main data schemes, not metadata, but suppose we could argue about that.
11:22 Dyrcona But, we do use MODS for indexing and a good bit of display in the OPAC. It would be easier to add a new schema if you can represent it with XSLT.
11:22 BAMkubasa so, xpath is a tool/language used to interrogate xml (or structured data?), and the xml would be the thing that would have the schema if I'm remembering how these things interact correctly?
11:24 Dyrcona So, we mostly use XSLT to convert from MARC to MODS. Some index and display fields forego the use of the MODS transforms and have a XPATH expression to extract the needed data from the MARC.
11:25 Dyrcona It's easy to add new fields with XPATH. For instance if there is some RDA field you want to extract from MARC.
11:25 Bmagic BAMkubasa: I end up referencing config.xml_transform which is a straight copy from LoC (right?)
11:26 Dyrcona I guess a more appropriate question is what is the student trying to do? Add Bibframe or something like that?
11:26 BAMkubasa Don't know, they asked theirlocal librarian, who asked me
11:26 Dyrcona Bmagic: Almost a straight copy. We've modified one or two of the transforms in the past.
11:27 Bmagic I've resorted to eding those templates too. I'm not sure if I have a copy of an Evergreen database with tweaks anymore though
11:27 Dyrcona Adding a new schema for bib records would be a big deal, i.e. a lot of work. If you have some other format that you could convert from MARC via XSLT, that would be easier.
11:27 Bmagic I think it was to include more tags for the keyword index (before the feature was added to Evergreen, making that easier)
11:28 Dyrcona Bmagic: By "we," i meant the Evergreen community/developers. Not all of the transforms are strictly stock from LoC. I also think we sometimes fall behind LoC changes to the canonical set.
11:30 Bmagic right on

Results for 2022-09-01

09:10 Stompro joined #evergreen
09:34 Stompro joined #evergreen
09:57 Dyrcona "Curiouser and curiouser," said Alice.
09:58 Dyrcona So, I'm back to banging my head against MARC record encoding because when I process records from Overdrive and encode the output as UTF-8, many of the records get double encoded. If I do the same with records from Kanopy, they don't.
09:59 Dyrcona In fact, if I write the preprocessed Kanopy records out as "bytes" instead of "utf8," that's when some of them blow up....
09:59 Dyrcona They're both sending UTF-8 for the most part as far as I can (care) to tell.
10:04 Dyrcona chardet says: Kanopy_MARC_Records__additions__joneslibrary.mrc: utf-8 with confidence 0.99

Results for 2022-08-08

10:03 Dyrcona Lowercase n with tilde is showing up in my console as the two single byte characters represented by its component bytes. Something isn't handling multibyte UTF8 properly somewhere.
10:04 Dyrcona So, It's probably the load program that I had so much trouble getting Unicode to work with in the past few months.
10:09 jvwoolf joined #evergreen
10:15 Dyrcona Defintitely coming from MARC::Record->new_from_usmarc() in the load program, regardless of whether or not I set the file handle to UTF-8 or not.
10:22 Dyrcona MARC::Charset is up to date (1.35).
10:28 Dyrcona MARC::Charset shouldn't be involved and doesn't look like it is. The error is not coming directly from MARC::File::USMARC::decode() either.
10:29 Dyrcona I tried installing the latest Encode.pm and no difference.
10:30 Dyrcona Oh, I may stand corrected: https://metacpan.org/module/M​ARC::File::USMARC/source#L172
10:31 Dyrcona Bingo! Source of my error, and it is Encode.pm or the edition of Unicode available to Perl on my system.
10:32 Dyrcona https://metacpan.org/module/M​ARC::File::Encode/source#L35
10:39 Dyrcona Or, maybe it's just "The Unicode Bug..." :(
10:45 Dyrcona Looks like I might be able to avoid this by converting the records to MARCXML first.
10:53 Dyrcona Outside of a MARC context, I can't make decode crash on those characters.
11:16 BDorsey joined #evergreen
11:20 Dyrcona Well, it blows up elsewhere using MARC::File::XML: 2 :129: parser error : Input is not proper UTF-8, indicate encoding !
11:20 Dyrcona Bytes: 0xA9 0x22 0x20 0x69
11:21 jihpringle joined #evergreen
11:25 Dyrcona @monologue
11:25 pinesol Dyrcona: Your current monologue is at least 23 lines long.
11:39 Dyrcona Ha! I want a goto for a valid reason. I want a label outside my main loop that I can branch to when there is an error. Otherwise, I don't want to interfere with the flow.
11:42 Dyrcona Maybe I just need to change my loop to a do while.
11:46 Dyrcona Well, MARC::File::XML just makes it worse. More records get spit out that way.
11:52 Dyrcona I think the increase in errors comes from the records with invalid lengths and indicators getting mangled when converted to XML.
12:31 csharp_ Dyrcona: fwiw, I usually learn from your monologues :-)
12:31 csharp_ Dyrcona++
12:36 mmorgan Dyrcona++
12:37 Dyrcona Thanks!
12:41 Dyrcona I used yaz-marcdump to convert the binary MARC to XML, and that was after I had preprocessed the file file from the vendor. So it could be that my preprocessor program is writing junk, but I can't find bad UTF-8 in it.
12:42 Dyrcona It just looks like when going through the MARC modules, Encode suddenly doesn't like otherwise valid \XC2 and \xC3 sequences.
12:56 Dyrcona Very interesting: If I use a program to split the binary file into records using \x1E\x1D as the input record separator and then rune the decode('UTF-8', $raw_record), I get no errors, so the issue is definitely coming from the MARC modules somehow.
12:57 Dyrcona That's using the preprocessed file. I'll see what happens with the files directly from the vendor.
12:58 Dyrcona Ditto... Zero errors.
13:06 jihpringle joined #evergreen
13:13 csharp_ Dyrcona: I know you've been working on this for days and have probably ruled this out, but I've seen stupid stuff where the \XC2 literal characters were themselves mis-encoded somehow
13:14 csharp_ as in literally "\XC2" where one of those was some unicode character that escaped notice
13:16 Dyrcona csharp_: I originally thought that is what the problem was, or rather a MARC-8 \xC2 that got into the UTF-8 data. \xC2 in MARC-8 is the P with a circle sound copyright symbol, and \xC3 is the regular copyright symbol.
13:17 Dyrcona However, the input actually has valid UTF-8 sequences and using Encode::decode on the raw data on a record by record basis does not output any errors. The errors come when MARC::File::USMARC::decode() is run on a record.
13:18 csharp_ ah
13:19 Dyrcona Hmm... I have another idea.....
13:23 Dyrcona I tried MARC::File::USMARC::decode() on the files from the vendor and there are no errors. When I run it on the preprocessed file, I get errors. So, my preprocessor must be doing something wrong, even though decode UTF-8 like the raw input....
13:25 Dyrcona If I have to set the output stream to UTF-8, I'll be upset. I spent several days fiddling with that before and I swear that I got it right.....
13:27 Dyrcona So, I'm already setting binmode on the output to :utf8. Maybe I should do :bytes or :raw?
13:32 Dyrcona I guess :raw isn't a thing....
13:38 Dyrcona csharp_++ again for suggesting an encoding issue. Looks like my preprocessor was double encoding some characters.
14:03 csharp_ Dyrcona: oh wow
14:11 Dyrcona This line in the perlunicode documentation is misleading: Use the ":encoding(...)" layer  to read from and write to filehandles using the specified encoding.
14:11 Dyrcona I suspect it only applies if you're not manually decoding the data, which the MARC code does.
14:32 jeffdavis bug 1979345 adds a new permission to govern the hold pull list; is that an OK change to include in a point release, assuming there's a release note?
14:32 pinesol Launchpad bug 1979345 in Evergreen "Angular Holds Pull List Doesn't Scope" [Medium,Confirmed] https://launchpad.net/bugs/1979345
14:34 csharp_ I would probably wait until the next release
14:46 jeffdavis We're going live with the new perm when we upgrade to 3.9 this weekend, I'll reconcile myself to having to renumber our permissions at some point. :)
14:48 rfrasur joined #evergreen
14:59 Dyrcona We've had to do things like that after upgrades before because we've backported things from future releases.
15:00 Dyrcona On the subject of MARC and encoding, just to complicate things, if you're pulling records from Evergreen via DBI as MARCXML and then converting them to USMARC to write to a file. You have to set the output stream to utf8 encoding, or you get errors reading the output file.
15:02 Dyrcona It's always fun to relearn things like this every 3 or so years. :)
15:10 Dyrcona jeffdavis: One thing that I often do is wait for the code to make it into master, then I cherry-pick the commits into my local branch so I have the correct id numbers and db upgrade codes. This makes generating the db upgrade script for future upgrades easier.
15:11 jeffdavis hm, and I guess there's nothing preventing that commit from going into master even if it doesn't get backported to 3.8/3.9

Results for 2022-08-05

08:42 mmorgan joined #evergreen
09:39 Dyrcona joined #evergreen
09:40 Dyrcona So, it looks like a batch of records from Overdrive are in whatever character set, individually: utf8 "\xC2" does not map to Unicode at /usr/lib/x86_64-linux-gnu/perl/5.26/Encode.pm line 212
09:48 Dyrcona Hmm.. If I'm reading the table correctly, that should be a "sound recording copyright," or a P with a circle around it in MARC-8.
09:48 Dyrcona In UTF-8
09:49 Dyrcona In UTF-8 \xC2 is often the first of a pair, \xC2A9 is the copyright symbol, for example.
09:50 Dyrcona So, I guess some of the records are UTF-8 and some are MARC-8, or even have a mix of characters in different character sets. I guess the question is, do I load the "bad" records?
09:52 Dyrcona Sound recording copyright is \xE28497 in UTF-8.
09:53 Dyrcona I suppose I could try loading the records as MARC-8 and see which leads to fewer issues. I seem to recall the whole thing blowing up when I didn't force the encoding of the MARC to UTF-8, though.
10:01 Dyrcona Typing in a web form and my palm hit the track pad on my laptop thereby selecting all of the text in the field so that the next character that I typed replaced it all. Ctrl-z didn't bring it back....
10:01 Dyrcona It's going to be that kind of Friday....
10:05 * Dyrcona searches for a Perl module like chardet for Python.
10:09 Dyrcona So, maybe, I should try PyMarc and chardet for this project.
10:13 mmorgan Ctrl-z must be taking a vacation day :-(
10:14 Dyrcona I suppose.
10:16 Dyrcona I'm going to try and solve my record issue by rewriting the prep script in Python using PyMarc and chardet to autodetect the character set of each record. I wonder what chardet will say about MARC-8 records? Maybe, it will call them ISO 2022?
10:23 Dyrcona Maybe I can keep most of the Perl code and just throw the XML at a character set detection program?
10:30 Dyrcona Running chardet on the input files says, "utf-8 with confidence 0.99."
10:30 * Dyrcona sighs. Guess I'll just jam the bad records in, like my current test is doing.
14:00 Dyrcona And, that zero-width assertion that I used earlier was working to not "swallow" the space. The phono copyright symbol just takes up extra width in my terminal's font.
14:02 Dyrcona We have a winner! $string =~ s/\xC2(?![\x80-\xBF])/\xE2\x84\x97/gu; and $string =~ s/\xC3(?![\x80-\xBF])/\xC2\xA9/gu;
14:11 Dyrcona It seems like that took too long to figure out. :)
14:21 Dyrcona Hm... Next Q: Is it possible to call update_leader on a MARC::Record...
14:25 Dyrcona In my specific case, I suppose it won't matter. The program will make changes to several tags anyway before outputting the record, so the length should get updated.
14:26 Dyrcona There is a bug related to that, and it probably affects these multibyte characters, too.
14:30 rfrasur joined #evergreen
14:37 Dyrcona Ugh. When I modify my prep program with the substitution code, I get a ton of the "does not map to Unicode" errors that started this whole investigation.
14:39 Dyrcona Oof.... Helps to run it on the correct files in the correct directory....
14:41 Dyrcona OK. I'm suspicious that my output from this run is the same size as from the previous run. Seems like it should be larger.
14:45 Dyrcona So the substitution doesn't work on raw MARC data, apparently.
14:47 Dyrcona diff says the file I just generated with the modified prep script is the same as the old one.
14:47 Dyrcona Yes, I'm sure I used the new script.....
14:47 Dyrcona Nice day for ducks. Looks like we're about to get a thunderstorm.
15:10 Dyrcona Multiline match doesn't help...
15:18 Dyrcona Single line mode doesn't make a difference either. So, maybe the non-UTF-8 characters are coming from MARC::Record?
15:19 Dyrcona Bleh. marc--
15:21 Dyrcona perl-- while i'm at it.
15:31 Dyrcona @monologue
15:31 pinesol Dyrcona: Your current monologue is at least 28 lines long.

Results for 2022-07-11

10:47 pinesol News from commits: LP#1981095: fix deletion of item tags in Angular item attributes editor <https://git.evergreen-ils.org/?p=E​vergreen.git;a=commitdiff;h=cc9c27​b8647b39c44a1270b74b65bbe1a044739a>
12:34 collum joined #evergreen
12:46 collum joined #evergreen
13:03 Dyrcona I'm looking into a cleanup of Overdrive URIs, and I've found over 35,000 asset.uri entries that don't correspond to any 856 in the biblio.record_entry.marc for record on the call number. Many of the call numbers are deleted, but some aren't. I have the fixes for dangling asset.uris in the database.
13:05 Dyrcona They have uri_call_number_maps, but I'm thinking of deleting these just the same.
13:07 Dyrcona "Curiouser and curiouser," said Alice.
13:08 Dyrcona My join criteria is wrong.... :(

Results for 2022-04-05

12:48 pinesol Dyrcona: Band 'Integer Overflow' added to list
12:53 Dyrcona @tag 000
12:53 pinesol Dyrcona: Must be because I had the flu for Christmas.
12:53 Dyrcona @marc 000
12:53 pinesol Dyrcona: unknown tag 000
12:53 Dyrcona That's what I thought...
12:56 Dyrcona Oof. Looks like indicators are bonzo on that record with the 000 tag as well. The "catalog" must be broken.
12:57 Dyrcona Heh: 520 C $a limb aboard....
13:00 Dyrcona @marc lea
13:00 pinesol Dyrcona: unknown tag lea
13:01 Dyrcona Yeah, just for giggles. This one looks the leader was "copied" to a tag "lea" with indicators e & r, followed by the leader.
13:02 JBoyer "The best part is that you can take a plain text marc record from one system and paste it into another!"

Results for 2022-03-23

09:52 Dyrcona derekz: Adding a patron group for this library is the better solution. It seems to me that your circulation rules are too specific if you have to do all that work. The hold and circ rules cascade much like CSS, so simpler is better.
09:53 Dyrcona Philosphy: IMNSHO, it's better to have a bunch of generic rules that apply to the majority of cases at the largest number of org units, then you make specific exceptions from there.
09:54 Dyrcona Avoid using permission groups if possible.
09:55 Dyrcona Unrelated: "Vendor MARC records" seems to be a synonym for "garbage."
09:57 derekz Dunking on "Vendor MARC records" is a perfect distraction and apropos for March
09:58 Dyrcona Well, I just got a batch to load that produce character warnings regardless if I process them as UTF-8 or MARC-8, so they're likely in some other character set.
10:01 Dyrcona derekz: Writing a script as a QND solution is ... viable(?). In the long run, you'll want to eventually do the work with the circ and hold matrices. Unfortunately, there's nothing more permanent than a temporary solution.
10:01 stephengwills perfect segway.  I have a new school library that just started importing vendor records and, during this same few days postgres just started getting goosed by oom-killer.  Any chance that not a coincidence?  is vandelay known to consume lots of postgres memory?
10:02 Dyrcona stephengwills: I don't use vandelay much. I'm just throwing garba... ahem.. MARC at the database via DBI.
10:03 stephengwills I have a time/funding issue and am too paranoid to allow anyone else to touch the database directly. ;)
10:03 Dyrcona Why do I suspect that this batch of records is a mix of UTF-8 and some Windows code page or 3?
10:05 stephengwills they started using vandelay instead of having to wait for me.  which, in theory, isn’t unreasonable.
10:40 Dyrcona Thank you, chardet! Windows-1254 with confidence 0.51855302355
10:40 Dyrcona Now, can yaz-iconv convert that to UTF-8?
10:42 Dyrcona Argh! Somethng's wrong with my processor... I get "utf-8 with confidence 0.99" on the original. I swear I worked that all out a couple of weeks ago!
10:43 Dyrcona marc--
10:43 Dyrcona @blame MARC
10:43 pinesol Dyrcona: It's all MARC's fault!
10:59 Dyrcona The real issue isn't MARC so much. It's how Perl handles character sets and "binary" data.
11:00 Dyrcona My other prep program doesn't seem to have this problem.
11:01 Dyrcona Also, it would help if vendors would set the leader properly.
11:02 Dyrcona Maybe it's time for lunch?
11:12 Dyrcona Or even make the preprocessor part of the loader.
11:15 Dyrcona Or, maybe just switch to Bmagic's loader that i've been meaning to test.
11:18 Dyrcona This is still too "hands on" for my taste.
11:28 Dyrcona @marc 022
11:28 pinesol Dyrcona: The ISSN, a unique identification number assigned to a continuing resource. (Repeatable) [a,y,z,2,6,8]
11:28 Dyrcona @marc 028
11:28 pinesol Dyrcona: The formatted number used for sound recordings, printed music, and videorecordings. Publisher's numbers that are given in an unformatted form are recorded in field 500 (General Note). A print constant identifying the kind of publisher number may be generated based on the value in the first indicator position. (Repeatable) [a,b,6,8]
11:28 Dyrcona @marc 024
11:28 pinesol Dyrcona: A standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (International Standard Book Number), 022 (International Standard Serial Number) , and 027 (Standard Technical Report Number)). The type of standard number or code is identified in the first indicator position or in subfield $2 (Source of number or code). (Repeatable) [a,c,d,z,2,6,8]
11:29 Dyrcona Anyone know where UPCs usually show up?
11:29 Dyrcona @marc 026
11:29 pinesol Dyrcona: Used to assist in the identification of antiquarian books by recording information comprising groups of characters taken from specified positions on specified pages of the book, in accordance with the principles laid down in various published guidelines. (Repeatable) []
11:29 Dyrcona @marc 025
11:29 pinesol Dyrcona: A number assigned by the Library of Congress to an item that was acquired through one of its overseas acquisition programs. (Repeatable) []
11:39 Dyrcona Looks like 024.
11:46 rhamby yeah, should be 024s though the indicator for it is rarely set in my expeirence

Results for 2022-03-18

09:06 Keith_isl joined #evergreen
09:30 jvwoolf joined #evergreen
10:15 terranm joined #evergreen
10:29 Dyrcona If I want to find a record in the database that has a MARC tag with two particular subfield values, there doesn't seem to be a quick way to do that with a double join on metabib.real_full_rec and even then, I'm not guaranteed that they're in the same tag. Am I missing something?
10:36 Dyrcona I could probably add a "search" index using xpath or something, but I always get lost in a maze of twisty config and metabib tables when I do that.
10:43 rjackson_isl_hom joined #evergreen
10:45 Dyrcona Thought I had an example of that in my old code, but I can't find it. I know that I created such things in the database in the past.
10:54 Dyrcona berick: Yeah, but I'm trying to avoid a join like that.
10:55 Dyrcona I'd probably have to join mravl and crad anyway.
10:57 Dyrcona I don't even need a join, yet. There won't be any other records that would match this tag and a particular subfield value, but I can't guarantee that will always be the case (though it very likely will be).
10:58 Dyrcona Plus, the field will have dates in it, and I should also use those dates in my query.... This is MARC tag 583 for anyone who is curious.
11:02 Dyrcona I'm trying to come up with a query to get bre ids to pipe into marc_export. If I end up having to deal with dates, then I may just have to write a custom export to pick the marc apart in Perl.
11:03 Dyrcona I don't even need the query today. I'm trying to figure out something that might work, so I can estimate how long it will take to implement.
11:04 Dyrcona @marc 583
11:04 pinesol Dyrcona: Contains information about processing, reference, and preservation actions. (Repeatable) [a,b,c,d,e,f,h,i,j,k,l,n,o,u,x,z,2,3,5,6,8]
11:08 Dyrcona I basically want to be able to find records where a single 583 has $f = some value $5 = someother value $c < today's date and $d > today's date (if $d is even there)
11:09 Dyrcona Oh, and a bunch of other criteria, like a certain member library has a non-deleted asset.copy entry with a specific circ modifier...

Results for 2022-03-14

12:43 Dyrcona It could be an entirely different character set, too, but I got pretty much the same error as you.
12:43 Bmagic right, we've all had this pain. I've posted here over the years about it. This project is slightly different, and my understanding of the issue has matured over the years
12:44 Bmagic I've got the file open in MARCEdit. I'm seeing lots of &#xA0 and {acute}
12:45 Dyrcona So, I'd recommend setting the leader to UTF-8 and seeing what happens. You can do that with $marc->encoding('UTF-8');
12:46 Bmagic right, let's see
12:48 Bmagic that did the trick. I would like to try it one way, catch an error, then handle it the other way. Can I "get the message" that MARC::Record bombed? I'm just seeing a screen dump, but my program goes on
12:50 Dyrcona Bmagic: You're getting a warning, I think from MARC::Charset? You can do some stuff with signal in Perl to make the warnings fatal and then use an eval block to trap them.
12:52 Bmagic ah! I'll look that up, thanks
12:53 Dyrcona Here's an example using eval {...}; if ($@) { ... } to log errors: https://pastebin.com/g4RGDJLr
12:53 Bmagic Dyrcona++
12:57 Dyrcona In that example any fatal errors that happen inside the eval {} get handled by the code in the if ($@) {}.
12:57 Bmagic Also: I see that you're reading the file raw with a separator "\x1E\x1D". I suppose that's the best way? You find that reading the file yourself (instead of MARC::Batch) is better?
12:57 Dyrcona eval does two different things depending on how you use it.
12:58 Dyrcona Bmagic: MARC::Batch usually works. I do tend to read the file manually because I've had issues in the past with records containing "smart" quotes. One of them looks like the the end of record character.
12:58 Bmagic I'll see if this error (now that I've narrowed it down to a particular record) will cause "die" or not
12:59 Bmagic your example forces a die when @warnings
12:59 jihpringle joined #evergreen
13:00 Dyrcona Right. But not all warnings, just warnings from MARC::Record.
13:00 Dyrcona I'm not sure the MARC::Charset warning shows up there, but you can try it.
13:00 Bmagic This program I'm writing needs to be pretty hardy. Handling millions of records every year. So, I think I'll go with the manual reading of the files with raw, like what you've got there
13:02 Dyrcona Suit yourself. MARC::Batch and MARC::File go to great lengths to read in otherwise garbage records, but they blow up sometimes on otherwise well-formed MARC.
13:05 Bmagic haha, so, it's more* hardy to use MARC::File/Batch.... Maybe I go with MARC::File.... and if it breaks, catch it and go RAW manual. off to find a file that blows MARC::File
13:06 Dyrcona The problem I'm fixing by reading the records the way that I do comes from copy/paste cataloging.
13:06 Bmagic yeah, I bet I can copy/paste a smart quote into a record and produce this issue
13:10 Bmagic Dyrcona++
13:12 Dyrcona I thought tsbere submitted a patch for that one, but it didn't work. Then again, I also think tsbere opened a separate bug. I don't know where that bug has gone.
13:13 Dyrcona I spent a little time working on tsbere's patch, but kind of gave up.
13:15 Dyrcona Oh, right. tsbere made a PR on github: https://github.com/perl4lib/marc-perl/pull/4
13:16 Dyrcona Too many bug trackers....
13:18 JBoyer joined #evergreen
13:18 Bmagic It would be nice if MARC::File would just handle it

Results for 2022-03-04

09:55 tsadok_ joined #evergreen
10:01 stephengwills joined #evergreen
10:01 stephengwills left #evergreen
10:14 Dyrcona Why is working with MARC so difficult? Now, I'm trying to dump some records from the database to a binary MARC file in UTF-8, and of course, its mangling the "fancy" characters. You'd think I'd have this down pat by now.
10:16 Dyrcona No, if I can just keep my fat palms off of the touchpad.... (I keep "clicking" inadvertently in another window.)
10:17 * csharp_ removes offensive trolling from the IRC logs
10:18 Dyrcona OK, when pulling records from the DB, use (BinaryEncoding => 'utf8') on the use MARC::File::XML line, and if using IO::File to write the output do $fh->binmode(':utf8');
10:19 Dyrcona csharp_: I'm half curious to see the trolling because I missed it, but I hope you're not referring to my monologues. :)
10:21 csharp_ oh, I deleted those too :-)
10:21 csharp_ I KEED I KEED

Results for 2022-03-02

10:24 Dyrcona I was trying to find an Emacs command or function to make a file: URL from a path. Doesn't look like such a thing exists, but after 15 minutes of search the function help and online, I decided, "I could have implemented it by now."
10:29 Dyrcona I still haven't implemented it, but it might be a handy thing to have.
10:35 Dyrcona Of course, there's a Jabber/XMPP library for Emacs..... :)
10:44 Dyrcona MARC::Charset apparently does not like EM DASH: \xE2\x80\x94.
10:50 Dyrcona The message suggests that triplets beginning with \xE2\x80 are the problem: no mapping found for [0x80] at position 24
10:50 Dyrcona gmcharlt ^^ Should I file a bug on MARC::Charset?
10:55 Dyrcona FWIW: I'm using this program to load some binary MARC records: https://pastebin.com/g4RGDJLr
11:01 miker Dyrcona: doesn't like going from MARC8 to UTF-8?
11:06 Dyrcona The file is UTF-8.
11:08 miker hrm... that's strange ... could the records be claiming MARC8? IIRC we do look at the leader, but I think there's a way to force the issue
11:20 Dyrcona Now, my dump program is complaining about wide character in print. It didn't before I set the encoding....
11:29 miker are you calling MARC::Charset->assume_unicode(1); before processing the records? (that's the "force the issue" option)
11:34 Dyrcona miker: No. You can see the the program that I'm using to load the records. I haven't shared the preprocessor, yet.
11:35 Dyrcona I was going to ask if anyone in here has used pymarc much. (I've looked at it.) Python usually has better charset handling that does Perl, and I wonder if anyone has used chardet to detect the charsets used by MARC records.
11:35 Dyrcona I've found all kinds of crap in MARC from 3rd parties.
11:36 miker right on. fwiw, many of our scripts and db functions use both assume_unicode(1) and ignore_errors(1). see the top part of Open-ILS/src/extras/import/marc_add_ids for instance
11:38 Dyrcona Well, I've set the 09 in the leader to 'a' for this batch. When I open the file in Emacs it looks like UTF-8.
11:39 Dyrcona I'm running it again in another db with a fresh copy of production. I reloaded them after yesterday's tests.
11:44 Dyrcona Looks like you're only expected to use it on output.
11:45 Dyrcona The documentation needs to be updated or PyMarc does: "When I can require python 2.3, this will go away."
11:47 Dyrcona I might play with this some time, but for now I'm sticking to Perl.
11:50 Dyrcona It's funny that my preprocessor, using MARC::Record, has no problem with these records, but I guess I'm not asking it to do any charset conversion. Well, it has no problem since I fixed the double encoding bug. :)
11:58 Dyrcona IIRC, I think I had to set the charset in these records yesterday, but I forgot when I ran the preprocessor after making changes to it this morning.
11:59 Dyrcona Yeah, it has passed the author with the different characters for the first letter of the last name with no warnings or errors.
12:03 jihpringle joined #evergreen
14:33 pinesol csharp_: go with explicit
14:35 Dyrcona I only have those git add problems when I add things to git on the servers. :)
15:11 gmcharlt Dyrcona: re MARC::Charset, yes
15:13 Dyrcona gmcharlt: I'm not sure it's a bug in MARC::Charset, now. It's bad data combined with user error/forgetfulness.
15:13 gmcharlt Dyrcona: ah, OK
15:13 Dyrcona I thought the records were specifying UTF-8, but they weren't, even though there were in fact encoded in UTF-8.
15:14 Dyrcona s/there/they/ # It's getting late and the fingers are tired... :)

Results for 2022-03-01

09:24 * Dyrcona mumbles "smart quotes...."
09:27 Dyrcona Also, just plain junk in these records.
09:33 Dyrcona I wonder if we're using an outdated Unicode standard?--That "we" is meant to be vague, i.e. not necessarily Evergreen.
09:39 Dyrcona So, it looks like the preprocessing script does something to the records. Maybe I need to tell MARC::Record not to mangle the characters, somwehow?
09:56 Dyrcona Or, maybe it doesn't.... Comparing dumps of the processed versus raw records, the relevant bits of the busted records look the same.
10:08 Dyrcona I wonder if converting the records to marcxml will make a difference?
10:10 Dyrcona So, do I convert them with yaz-mardump or with MARC::File::XML?
10:15 Dyrcona Right, so when I use yaz-marcdump to convert the input records to marcxml, my editor shows the characters correctly. For some reason, I think my editor is treating the dumps as latin-1, even when I tell it they are UTF-8. What I suspect is MARC::Record and friends are mangling the characters because I'm working with binary MARC.
10:16 Dyrcona I have little proof, other than what I see in the files through my editor, and that the records get mangled by my preprocessor Perl program.
10:17 Dyrcona I will adapt my preprocessor and loader to work with marcxml and see what happens.
10:18 Dyrcona BTW, the input records say they are UTF-8 in the leader.
10:18 * Dyrcona quacks.
10:19 Dyrcona Hmm... Should I use MARC::File::XML on these records, or should I use LibXML? I can do what I want with either.....
10:23 * Dyrcona should write a MARC mode for Emacs. It couldn't be that hard... :)
10:26 Dyrcona @monologue
10:26 pinesol Dyrcona: Your current monologue is at least 15 lines long.
10:28 Dyrcona So, yeah, the MARC::Record code that I'm using is mangling the characters.
10:30 Dyrcona When I tell Emacs to use UTF-8 with one of the files, I get this: "...encountered characters it couldn’t encode..." followed by a list of characters that won't paste into my IRC client.
10:30 Dyrcona The error message is much more detailed.
10:31 Dyrcona I open the original MARC file and the mangled characters show up correctly in Emacs.
10:31 Dyrcona Proof!
10:55 rjackson_isl_hom joined #evergreen
11:01 Dyrcona I wonder if the problem is how I'm reading the binary files. I  open it via IO::File with the record separator set to \x1e\x1d because records with smart quotes would break with MARC::Batch or MARC::File::USMARC. After I get the raw MARC, I feed that to MARC::Record. I suspect that is where the breakage occurs.
11:01 Dyrcona Could be that I need to decode the data before passing it to MARC::Record?
11:01 rjackson_isl_hom joined #evergreen
11:01 Dyrcona Or would that be encode?
11:03 Dyrcona Well, I can try it and see.

Results for 2022-02-28

13:55 jeffdavis I think PINES has the fixes for those bugs so I am assuming this is a 3.8-specific issue, probably with the new holdings editor?
13:59 jeffdavis well not necessarily the holdings editor, forget that bit
14:59 collum joined #evergreen
15:01 Dyrcona Anybody ever seen this one before: Use of uninitialized value $code_wanted in string eq at /usr/share/perl5/MARC/Field.pm line 314, <GEN1> chunk 1.
15:05 Dyrcona Ok. Figured it out: Can't call method "subfield" on an undefined value at /home/opensrf/scripts/prep-od-advantage line 76, <GEN1> chunk 9.
15:06 Dyrcona I have I tried getting something out of the record that doesn't exist, and didn't properly check for its existence before trying to use it.
15:52 jeffdavis I spoke too soon! no children available for open-ils.actor (1442 warnings)

Results for 2022-02-25

09:34 JBoyer Dyrcona, something else to consider from that message is whether or not you have any local MODS or other xslt transforms.
09:37 Dyrcona JBoyer: I think we do, so I'll check that. Thanks for the suggestion.
09:47 jeff chopPunctuation, chopPunctuation, chopPunctuation... heh.
09:49 Dyrcona Yeah, I haven't looked but I suspect a busted field in the MARC. Maybe i should dump it now before someone changes it.
09:54 jvwoolf Dyrcona: Before I forget again, I wanted to say that we tested the patch in 1482757 and it worked fine. We've got it running in production now.
09:54 jvwoolf Let's see if I can get that to link correctly - lp1482757
09:54 Dyrcona Lp 1482757
09:55 * JBoyer shakes fist at anything case-sensitive that's not a password
09:55 Dyrcona jvwoolf: It works fine for me, too. It just doesn't speed things up in a noticeable way. Also, this process is still really slow on Pg 12+.
09:56 jvwoolf Dyrcona: It sped up importing eresources pretty significantly for us
09:56 Dyrcona So, just dumping the MARC to the screen I think I see the problem. There's a field that ends with a lot of blank spaces.
09:56 Dyrcona Well, subfield....
09:57 jvwoolf Also, we removed the 30 million deleted URI call numbers and our call number reports work again. We also haven't had any drone timeouts since then, but that could be a cooincidence.
09:57 Dyrcona jvwoolf: I guess, but I was looking for improvement on later Pg versions, which that patch doesn't do. If you want to sign off, feel free.
11:16 Dyrcona So, that would be 444 templates?
11:16 Dyrcona jeff: A qualified yes on sharing it. A definite yes on I still have the record.
11:17 Dyrcona I should ask before I share it for reasons.
11:21 Dyrcona Apparently, it's an item that no one else is likely to have.  It's MARC type a (a book?) about the construction of one of our libraries. Looks to be a gift from the construction company.
11:28 Dyrcona Of course the spaces don't show up in the staff client.
11:29 Dyrcona The 300 field also looks a little messed up in the editor. Subfield a is just ";"
11:36 Dyrcona Page count is missing.
11:59 Dyrcona Just to confirm: It blows up on any of the mods transforms. (mods3 gives a missing file error. I should look into that, but doesn't look like we use mods3.) marc21expand880 works, but it doesn't strip out the extra spaces.
12:00 Dyrcona jeff: Yeah. that's what it looks like.
12:00 Dyrcona It's obvious in the marcxml. Not so obvious elsewhere.
12:01 Dyrcona I should probably use MARC::Record to fix it so that the length is updated correctly, but I should be able to just subtract 222 from the current value.
12:03 jeff I have mixed feelings about trying to maintain size in a marcxml record. :-)
12:04 jeff we mostly (and should always) ensure that it's updated/correct on conversion from marcxml to binary MARC format.
12:04 Dyrcona Some of our vendors have strong opinions about it.

Results for 2022-02-15

08:36 mantis1 joined #evergreen
08:38 mmorgan joined #evergreen
09:12 Dyrcona joined #evergreen
09:19 Dyrcona If I get this message while loading MARC records "no mapping found for [0x80]": a) is that a warning or an error (i.e. does the record load anyway) and b) anyone had to fix these before and got any tips that might save me an hour or so of fumbling around?
09:20 Dyrcona The records in the file claim to UTF-8, but we all know how that works out in reality.
09:26 Dyrcona Well, I can say that those messages are not making it to MARC::Record->warnings. Because I would log those to my log file.
09:28 Dyrcona Grr. Because I botched the shortname in 856$9, they're not showing up in my custom view....
09:29 Dyrcona That means I can't get a quick count.
09:30 Dyrcona I can also say that the messages didn't trigger my error handler in the eval because there's no error log.
09:31 pinesol Dyrcona: Vendor records is probably integrated with systemd
09:35 jvwoolf joined #evergreen
09:39 * Dyrcona reloads the database to give it another go.
09:49 Dyrcona Think I'll redirect stderr to a file. I'm not sure where these messages are coming from, either when I create the MARC::Record from the raw marc data, or when doing the insert.
09:49 Dyrcona Most likely the former.
09:58 JBoyer Bmagic_, I'm told the cert for the MOBIUS bugsquash server could use a refresh sometime.
10:01 Bmagic_ oh! will do
10:41 csharp_ jvwoolf: re: slow reports, have you done any query analysis to see what it's doing? (e.g. EXPLAIN/EXPLAIN ANALYZE)
10:41 csharp_ note that EXPLAIN ANALYZE actually runs the query, so if it's timing out, that may not work
10:55 rfrasur joined #evergreen
10:59 Dyrcona So, back to my MARC::Charset thing from earlier. I basically copied the example from the MARC::Lint manpage and ran that on my MARC input file, and it doesn't report any character set issues, though it does complain about punctuation and the subfield 9 in the 856.
11:06 Dyrcona Oh, interesting. It looks like MARC::File::USMARC->next returns raw MARC and doesn't create a MARC::Record object.
11:09 Dyrcona On, never mind. It does. I was looking at _next().
11:09 csharp_ Dyrcona: I've seen that kind of thing when processing records with garbage characters too
11:10 csharp_ pretty sure were able to find them in the text and replace/remove them
11:13 Dyrcona So, I don't see the bad character message from MARC::File::USMARC because it builds the MARC::Record differently. It adds the fields as it finds them, does some checking of its own. My program reads the raw MARC from the file and does MARC::Record->new_from_usmarc().
11:14 Dyrcona Some of these look like "smart" quotes. Probably Windows-1252, again. :(
11:14 csharp_ yeppers
11:14 Dyrcona I wish people would learn that you can't just copy and paste into a MARC record.
11:14 csharp_ edit in Word, paste into MARC
11:15 csharp_ WYSIWY(don't actually)G
11:15 Dyrcona :)
12:51 Dyrcona At Pg 11, most things start getting faster, but others seem to get hit.
12:52 Dyrcona We do a dump and restore at least whenever we get new hardware.
12:52 Dyrcona I do them weekly to keep some test databases up to date. Anyway, getting off topic.
12:54 Dyrcona Going back to my record load/character issues, the message isn't coming from MARC::Record, either. I've got a program to dump the 856s that uses the same code to read the records and it doesn't peep.
12:56 Dyrcona I wonder if I can just convert the raw MARC data or if I need to do it field by field....
13:04 Dyrcona Now, it's just looking like some garbage and not necessarily Windows 1252... Probably a mix.... :(
13:07 Dyrcona I think I know the source of the messages though. It looks like they're possibly coming from NFD() when the MARC is cleaned to go in the database. I can check that hypothesis real quick.
13:09 Dyrcona And, no. That's not it, either.
13:11 Dyrcona g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.26.1/MARC/Charset.pm line 308, <GEN1> chunk 1752.
13:18 Dyrcona Still don't know where that's coming from. I'm not using MARC::Charset, and doesn't look like the modules that I use do either. It must be the database throwing that at me....
13:57 JBoyer Dyrcona, MARC::XML and friends are used in the ingest trigger functions, so depending on where you're actually seeing that message output the database is a likely source.
13:59 Dyrcona JBoyer: Yeah. It doesn't seem to be coming from anything running in my Perl code outside of the database, but I didn't think warnings from the database would just show up in my output.
14:00 Dyrcona Oh, never mind. They will because of the DBI options.
15:40 Dyrcona Aight, so it is a mangled UTF-8 "smart quote." The sequence should probably be 3 bytes: \xe2\x80\x9d.
15:41 Dyrcona Ah, this character appears before it: â
15:43 Dyrcona Which is \xe2.....
15:43 Dyrcona MARC::Charset is apparently not dealing with it correctly, or I need to update my Unicode support....
15:54 Dyrcona Ugh.... I somehow killed the program.... Not sure what I did.
16:00 Dyrcona gmcharlt: It looks like MARC::Charset doen'st handle \xe2\x80\x9d correctly.
16:02 Dyrcona I wonder what happens if I replace those in Perl with a "?
16:11 Dyrcona Also, if the file is UTF-8, and that is a valid UTF-8 sequence, what's MARC::Charset got to do with it?
16:12 Dyrcona I'ma just leave it alone and reload the file again tomorrow after a db reload.
16:13 mmorgan Going home and coming back again is always a good approach :)
16:14 Dyrcona Unfortunately, I am at home.
16:14 Dyrcona And, I'll be tomorrow, too. :)
16:15 mmorgan There's always turning it off and on again!
16:15 Dyrcona I don't think this should be my problem. Things actually look good for a change with the data. I was blaming the vendor, but it looks like MARC::Charset and/or Evergreen's use of it is at fault. Unless this one record isn't set to UTF-8 or something when it should be.
16:16 Dyrcona Also, I have little context for the warnings. I'd have to search the MARC for the offending text/codes.
16:16 Dyrcona No record number, which tells me this isn't coming from my code because all errors and warnings are logged with the record number from the file.
16:23 abowling joined #evergreen
16:24 abowling after a 3.7 update, links in 856 are no longer appearing in the opac. the library has custom templates, but i diffed the relevant ones and it seems nothing has changed. any ideas on what i might be misssing?

Result pages: 1 2 3 4 5 6 7