Time |
Nick |
Message |
01:48 |
|
davidcandlestick joined #evergreen |
03:08 |
|
v101 joined #evergreen |
03:08 |
|
v101 left #evergreen |
03:11 |
|
mrpeters joined #evergreen |
07:04 |
|
timlaptop joined #evergreen |
07:55 |
|
akilsdonk joined #evergreen |
08:14 |
|
collum joined #evergreen |
08:21 |
|
kbeswick joined #evergreen |
08:38 |
|
Shae joined #evergreen |
08:56 |
|
berick joined #evergreen |
08:59 |
|
mmorgan joined #evergreen |
09:02 |
|
Dyrcona joined #evergreen |
09:06 |
|
rfrasur joined #evergreen |
09:09 |
rfrasur |
rjackson-isl, is there something up with reports? |
09:11 |
rjackson-isl |
reports are not back - it was decided to wait another day due to the response issues |
09:11 |
rfrasur |
Okay, gotcha. Then I'll stop querying the db ;) |
09:40 |
|
yboston joined #evergreen |
09:57 |
jl- |
I'm having some difficulties here,following this import example: http://docs.evergreen-ils.org/dev/_migrating_your_bibliographic_records.html -- 1) my marc records already are marcxml but not one record per line |
09:57 |
jl- |
and I can't use bash or anything to parse/format xml |
09:58 |
jl- |
what tool can I use to make it line format for a staging table? |
09:59 |
paxed |
there is that piece of python code there... |
10:00 |
jeff |
...which could probably be easily modified to consume marcxml rather than marc |
10:00 |
Dyrcona |
jl-: There is actually a Perl routine in the Evergreen utility code that you can use: OpenILS::Utils::Normalize::clean_marc. |
10:00 |
jeff |
hrm. pymarc has MARCReader and JSONReader, but not a marcxml variant. |
10:01 |
Dyrcona |
clean_marc is used by most of Evergreen to clean marcxml before insert into the database. |
10:03 |
Dyrcona |
This little bit of Perl will make the marcxml be all on one line: $xml =~ s/\n//sog; |
10:03 |
paxed |
or perhaps something (untested) like sed -i -e 's/\n//g' records.xml && sed -i -e 's#</record><#</record>\n<#g' records.xml |
10:03 |
Dyrcona |
TIMTOWTDI! |
10:03 |
jl- |
jeff correct |
10:04 |
|
dluch joined #evergreen |
10:04 |
Dyrcona |
blarg. python. :) |
10:04 |
jl- |
paxed I don't think bash should be used to parse xml? |
10:04 |
jl- |
Dyrcona please explain that routine |
10:04 |
* Dyrcona |
doesn't think xml should be used, but that's another debate. :) |
10:04 |
dbs |
yaz-marcdump -i marcxml -o marc filename.xml > filename.mrc :) |
10:05 |
* dbs |
suspects Dyrcona's regex would mash all of the records onto 1 line, instead of one per line |
10:05 |
* Dyrcona |
assumes you're processing 1 record at a time at the point you use the regex. |
10:05 |
dbs |
TIMTOWTDI and unfortunately It All Depends |
10:06 |
dbs |
Dyrcona: oh, but jl- doesn't have anything parsing the marcxml qua marcxml records yet. just a marcxml file |
10:06 |
Dyrcona |
jl-: The code is in Open-IlS/src/perlmod/lib/OpenILS/Utils/Normalize.pm if you want to look at it. |
10:07 |
dbs |
ergo easiest method might be to roll back to MARC binary via yaz-marcdump and just use that little bit of python |
10:07 |
jl- |
hmm |
10:07 |
jeff |
a well reasoned and tested "what to do if your data is already in marcxml format, but not one-record-per-line" (and that might be use yaz-marcdump to go marcxml to MARC) would probably be a useful addition to http://docs.evergreen-ils.org/dev/_migrating_your_bibliographic_records.html |
10:07 |
Dyrcona |
No. Write a little something with MARC::Batch. |
10:08 |
jl- |
dbs so roll back to .mrc |
10:08 |
yboston |
jeff++ |
10:08 |
jeff |
but updating the documentation doesn't help jl-'s immediate question :-) |
10:08 |
yboston |
jeff: I am about to deal with that scenario, "what to do if your data is already in marcxml format, but not one-record-per-line", any tips? |
10:09 |
jeff |
none better than those already suggested here by Dyrcona and dbs and paxed |
10:10 |
jl- |
rolling back |
10:11 |
jeff |
paxed: your suggestion of sed is probably fast and could work for most scenarios, but i fear it may summon a hybrid marcthulu/xmlthulu, similar to http://stackoverflow.com/a/1732454/157515 |
10:12 |
dbs |
and there's always the possibility of xml namespaces ruining the generic sed fun |
10:12 |
jl- |
I noticed that the .mrc file is about 1/3 the file of the xml |
10:12 |
dbs |
jl-: sounds about right |
10:12 |
jl- |
size |
10:13 |
jl- |
now running the python script to parse xml |
10:13 |
paxed |
jeff: possibly. perhaps i should've put a slightly stronger disclaimer in my suggestion... "may summon any of the Elder Gods" |
10:14 |
jl- |
"may make you cry in the bathroom and reconsider your career" |
10:14 |
dbs |
"What have I gotten myself into?" |
10:15 |
|
Wyuli joined #evergreen |
10:15 |
dbs |
"What have libraries done to themselves?" |
10:15 |
jl- |
^ |
10:19 |
|
ericar joined #evergreen |
10:19 |
jeff |
every phone call for a patron password reset that i handle is another reminder to get moving on bug 1264746 |
10:19 |
pinesol_green |
jeff: Error: Could not gather data from Launchpad for bug #1264746 (https://launchpad.net/bugs/1264746). The error has been logged |
10:20 |
jeff |
huh. well, that's ``Add "email password reset" button to user editor'' |
10:25 |
berick |
if anyone wants to help me quickly fix this : http://testing.evergreen-ils.org/~live/test.20.html |
10:25 |
berick |
see working/user/berick/pgtap-db-debs-check-sort-repair |
10:27 |
jeff |
but after that is "phone rings, web browser brings up list of matching patrons in web staff client, options for usual things including triggering a suggested username and triggering an e-mailed password reset" :-) |
10:27 |
* jeff |
looks at berick's url |
10:28 |
jl- |
ergreen=# SELECT staging_importer(); |
10:28 |
jl- |
ERROR: could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998 |
10:28 |
jl- |
dbs |
10:28 |
jeff |
jl-: you may have run out of disk space. |
10:28 |
paxed |
out of free space |
10:29 |
jl- |
ergreen=# SELECT staging_importer(); |
10:29 |
jl- |
ERROR: could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998 |
10:29 |
jl- |
HINT: Check free disk space. |
10:29 |
jl- |
CONTEXT: SQL statement "UPDATE staging_records_import SET dest = currval('biblio.record_entry_id_seq')" |
10:29 |
jl- |
PL/pgSQL function "staging_importer" line 6 at SQL statement |
10:29 |
jl- |
yes that's what it said |
10:29 |
jl- |
sorry for the spam |
10:30 |
jeff |
berick: quick fix looks good (i have not run it through pgtap). any opinion on whether evergreen.upgrade_list_applied_deprecates should be updated to return sorted values itself? |
10:32 |
berick |
jeff: thought about that, but couldn't think of a case where it mattered outside of the test |
10:33 |
eeevil |
jeff / berick: and order may be important (but, of course, that's available from config.upgrade_log, I suppose) |
10:40 |
|
_bott_ joined #evergreen |
10:43 |
berick |
eeevil: as in, the order of entries in supersedes[] or deprecates[] may be important, so evergreen.upgrade_list_applied_deprecates (etc) should not apply asciibetical sorting of patches it returns? |
10:45 |
eeevil |
berick: it's not important today, but may be in the future. but yes, that's what I meant |
10:45 |
berick |
right |
10:45 |
berick |
thanks |
10:45 |
berick |
ok, so i stand by my quick fix, then |
10:46 |
jeff |
have a launchpad bug yet? |
10:46 |
jeff |
or would this just be on the original bug? |
10:46 |
berick |
no, seemed like a waste of pixels |
10:46 |
berick |
original bug |
10:47 |
berick |
https://bugs.launchpad.net/evergreen/+bug/1262722 |
10:47 |
pinesol_green |
Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed] |
10:47 |
* jeff |
waits for new two-factor token |
10:49 |
senator |
paxed: congratulations to your national hockey team (if you're into that sort of thing) |
10:49 |
paxed |
senator: i'm not really into hockey, but it's always nice to beat russia... :P |
10:49 |
mrpeters |
osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:CStoreEditor.pm:114:1392803222222329] editor[0|0] request error open-ils.cstore.json_query.atomic : {"from":["metabib.suggest_browse_entries","\"","title","StartSel=\"<span class='oils_AS'>\", StopSel=</span>","-1","10","14"]} : Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/pe |
10:49 |
mrpeters |
osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:EX.pm:66:1392803222222329] Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/perl/5.14.2/OpenILS/Utils/CStoreEditor.pm:453 <500> Severe query error -- see error log for more details |
10:49 |
mrpeters |
osrfwarn.log:2014-02-19 10:39:08 brick01-head open-ils.cstore: [ERR :2228:oils_sql.c:5585:1392803222222329] open-ils.cstore: Error with query [SELECT * FROM metabib.suggest_browse_entries( '"', 'title', 'StartSel="<span class=''oils_AS''>", StopSel=</span>', '-1', '10', '14' ) AS "metabib.suggest_browse_entries" ;]: 0 ERROR: syntax error in tsquery: ":*"#012CONTEXT: PL/pgSQL function "suggest_browse_entries" line 12 at assignm |
10:49 |
mrpeters |
those ring any bells for anyone? newly upgraded 2.5.2 system... |
10:49 |
senator |
paxed: :-) |
10:50 |
mrpeters |
basically just stacking the system with queries from that and blocking normal activity from functioning |
10:50 |
* jeff |
hands mrpeters a paste.evergreen-ils.org or gist.github.com or something |
10:50 |
mrpeters |
argh yeah sorry, that was longer than i thought |
10:51 |
jeff |
mrpeters: "severe query error" is a good clue to consult your postgresql logs to see what the error was from postgresql's POV. |
10:51 |
senator |
mrpeters: that's an autosuggest query for a quotation mark |
10:51 |
senator |
short term, you may consider disabling autosuggest until you figure why there are so many of those or why they take a long time/error out/whatever |
10:51 |
berick |
@isitdown docs.evergreen-ils.org |
10:51 |
pinesol_green |
berick: http://cat.evergreen-ils.org.meowbify.com/ |
10:51 |
mrpeters |
senator: aha, interesting |
10:53 |
jeff |
of course, if the "severe query error" is that you killed the query because it was long running, your postgresql logs may not be helpful. i don't know for certain that severe query error is shown in the opensrf logs when you kill a query from pg_cancel_backend or similar |
10:54 |
mrpeters |
jeff: yeah, apparently they have some process that runs and kills things after 3 minutes |
10:54 |
mrpeters |
i want to knock that off for now |
10:55 |
jeff |
i'd probably go with senator's suggestion of disabling autosuggest as a stop-gap. |
10:55 |
mrpeters |
yeah, i have --- restarting bricks now |
10:58 |
* dbs |
suspects that killing pg backends leads to a cascade of other errors in cstore/storage thereafter, based on recent experience |
10:59 |
berick |
and anything above the cstore level (middle-layer, client) would just see a request timeout, since they usually give up after a minute or so |
11:00 |
jl- |
I'm assuming I can resolve the space issue by expanding the rootfs ? |
11:01 |
mrpeters |
dbs: yeah? I'd definetly like to get that little "sniper" stopped |
11:03 |
paxed |
jl-: or there's a way to tell postgres to save the database files on somewhere else. (but, iirc, that has to be done when you create the db) |
11:04 |
jl- |
paxed that's why I was wondering if it will even help postgres if I clean up rootfs spce |
11:05 |
jl- |
or if it has allocated space set somewhere |
11:05 |
eeevil |
mrpeters: I'd suggest looking at the schema, make sure all the indexes supporting autosuggest (which also support browse) are in place. you can configure autosuggest to not kick in until you have more than a certain number of characters from the user, IIRC |
11:05 |
mrpeters |
eeevil: thanks, great info. i hope autosuggest really is the culprit! |
11:05 |
paxed |
jl-: it just uses as much space as you've got available. |
11:07 |
eeevil |
mrpeters: and, short of disabling autosuggest, you can just disable visibility testing for autosuggest. I'd point you to the docs that, but the community docs sites seems broken at the moment... |
11:08 |
mrpeters |
no i know how, thanks mike |
11:10 |
mrpeters |
keyword searches are just bunk....title/author/etc. come back snappy....any particular index that might be corrupt/missing/etc. |
11:10 |
|
jboyer-laptaupe joined #evergreen |
11:16 |
dbs |
mrpeters: maybe your database just tipped over some point that triggers a different query optimizer path? That occasionally happens :/ |
11:18 |
mrpeters |
hmm |
11:19 |
dbs |
Unfortunately hard to diagnose without EXPLAINs from prior to that tipping point |
11:19 |
mrpeters |
sure |
11:19 |
mrpeters |
im wondering if the autosuggest stuff, and the process sniping deal WAS the tipping point |
11:24 |
|
mcooper joined #evergreen |
11:27 |
|
mmorgan left #evergreen |
11:28 |
jeff |
berick: tested your fix, pushed to master, commented in bug 1262722 |
11:28 |
pinesol_green |
Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed] https://launchpad.net/bugs/1262722 |
11:28 |
jeff |
berick++ |
11:28 |
jeff |
tests++ |
11:29 |
berick |
jeff++ |
11:30 |
pinesol_green |
[evergreen|Bill Erickson] LP#1262722 Fix DB supersedes/deprecates pgtap test - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=6c16a44> |
11:44 |
phasefx |
if we can get this one too, then we'll have a clean QA test run: https://bugs.launchpad.net/evergreen/+bug/1279420 |
11:44 |
pinesol_green |
Launchpad bug 1279420 in Evergreen "need Test::Output prerequisite" (affected: 1, heat: 6) [Medium,Triaged] |
11:46 |
|
jwoodard joined #evergreen |
11:46 |
jeff |
heh. i was just testing the waters in preparation for arguing for that. :-) |
11:46 |
jeff |
phasefx: that system currently builds once nightly, around midnight? |
11:46 |
phasefx |
correct, though I can fire it off practically at will over here |
11:48 |
phasefx |
in #openils-evergreen, pinesol_green is following an RSS feed for it |
11:48 |
* dbs |
fears what adding a thin plperl wrapper around xml::libxslt is going to do to ingest times |
11:48 |
dbs |
all that I know is that solr is RAD |
11:50 |
jl- |
wow I ran out of space again |
11:50 |
jl- |
200,000 records, -- 15 GB weren't enough |
11:50 |
jl- |
any idea how much is neccessary |
11:52 |
jeff |
"how much can you give it?" |
11:52 |
dbs |
seems like it would be a good research area for planning docs - rough memory / storage requirements |
11:53 |
* jeff |
nods |
11:53 |
dbs |
for various levels of records, etc |
11:53 |
mrpeters |
jl-: yeah, that sounds low for that many records |
11:53 |
mrpeters |
i have a customer about that size, let me tell you what their db size is |
11:53 |
jeff |
jl-: for your immediate purposes, if you can give it 30-60 GB, try that. :-) |
11:54 |
mrpeters |
jl-: is that items or bibs? |
11:56 |
mrpeters |
i have a customer with 121,070 bibs and 147,094 items and their DB hovers around 8GB |
11:56 |
mrpeters |
just to give you a little perspective |
11:57 |
jl- |
mrpeters bibs |
12:02 |
mrpeters |
how many patrons do you plan to load? |
12:02 |
mrpeters |
is 80GB feasible for you? i'm thinking about circulation, auditor, etc. buildup over time as well |
12:02 |
jl- |
possibly |
12:03 |
jl- |
testing on a VM right now |
12:03 |
jl- |
gonna try 60 |
12:03 |
mrpeters |
ah, i see |
12:03 |
mrpeters |
just don't run your production DB in a VM, right eeevil ;) |
12:03 |
mrpeters |
i think we've all seen the pain that can cause |
12:04 |
jl- |
everything is virtualized these days |
12:05 |
mrpeters |
i hear you, but we tried running a BIG production database in vmware and it just didn't perform well |
12:05 |
mrpeters |
but, maybe for a smaller size consortium or library it would perform better, honestly not sure |
12:05 |
mrpeters |
things just got WAY better for us when we went bare metal |
12:05 |
mrpeters |
(still virtualizing everything else) |
12:06 |
jl- |
all our library services are virtualized |
12:06 |
jl- |
so we'll have to go with that |
12:06 |
jl- |
;) |
12:09 |
mrpeters |
soooo, back to my keyword searching issues --- i'm the only one with access to the system --- keyword searches running over 7 minutes --- http://pastie.org/8749161 |
12:09 |
mrpeters |
ill ask again, is there a particular index that may need rebuilding? |
12:10 |
mrpeters |
title/author/subject/etc. work to perfection |
12:11 |
dbs |
mrpeters: 7 minutes for NASCAR? That seems extreme, unless "NASCAR" or a stemmed version thereof is in half of your records. What does EXPLAIN say for that query? |
12:12 |
mrpeters |
dbs: is that the full query there, or is that truncated? |
12:13 |
eeevil |
dbs: re plperl wrapper, we gain the benefit of being able to cache the xslt object ... that will likely pay for any perl parsing overhead, I'd bet |
12:13 |
dbs |
appears to be truncated |
12:13 |
mrpeters |
i think so too |
12:13 |
mrpeters |
let me find it in the pg logs |
12:13 |
mrpeters |
appreciate the input dan |
12:14 |
eeevil |
mrpeters: that's not a complex query. it suggests either tuning or schema deficiencies ... I'm guessing this is the same instance as the autosuggest one above? |
12:14 |
dbs |
mrpeters: as far as "tipping points" go, that's usually associated with a growth in the amount of data that pushes postgresql to suddenly choose a different plan (from reading pgsql-performance on and off over the years) |
12:15 |
mrpeters |
eeevil: yes, it is |
12:15 |
mrpeters |
no change in amount of data, just the upgrade from 2.2.2 to 2.5.2 |
12:15 |
dbs |
So as eeevil says, could be tuning (various cost estimates) or table / index bloat or (maybe?) corrupted / missing index |
12:16 |
mrpeters |
thats what im fearing -- a missing index |
12:16 |
eeevil |
mrpeters: heh ... that changes a LOT of data :) |
12:16 |
eeevil |
both size and shape |
12:16 |
dbs |
ah, I didn't realize the 2.2.2 -> 2.5.2 was in the mix |
12:16 |
mrpeters |
yeah, it is |
12:17 |
mrpeters |
eeevil: true....those ingests are pretty massive |
12:17 |
mrpeters |
i meant as far as new libraries, etc. |
12:17 |
mrpeters |
as far as tuning, it's using the same hardware as before, and the same pgtuned values that have been in place for many years on this cluster |
12:18 |
dbs |
seems very likely to be upgrade schema related. possibly reingest bloat? |
12:19 |
mrpeters |
we did run a VACUUM FULL ANALYZE VERBOSE after the reingest |
12:25 |
dbs |
mrpeters: if you can get an EXPLAIN ANALYZE of a keyword query, that's really going to help |
12:25 |
|
sunnysnowcat joined #evergreen |
12:26 |
mrpeters |
yep, im on it |
12:27 |
mrpeters |
thought this was the query, but it's not seeming to work -- http://pastie.org/8749204 |
12:31 |
dbs |
needs more $core_query_25686$ methinks |
12:31 |
mrpeters |
okk maybe i cut it off somewhere |
12:31 |
mrpeters |
its a big honkin select :P |
12:32 |
eeevil |
mrpeters: syslog mixes up incoming messages. that's what the [XXXX|YY] block is for before the incoming message itself, on each line. the XXXX part is the syslog message id, and the YYY part orders the lines of the message |
12:34 |
eeevil |
find the XXXX for the line containing "-- bib search", then get all the lines for that XXXX, then stitch them together |
12:34 |
eeevil |
you might find that the query you thought you were tracing was not the one that had the long duration... |
12:35 |
mrpeters |
could be, im kind of curious to run it by hand now and see how it responds (and get you an explain) |
12:44 |
mrpeters |
ok, query is churning, then ill get you the explain |
12:46 |
mrpeters |
dbs: damn, sorry! |
12:46 |
mrpeters |
QUERY PLAN |
12:46 |
mrpeters |
------------------------------------------------------------------------------------------------------------------------------ |
12:46 |
mrpeters |
Function Scan on query_parser_fts (cost=0.25..10.25 rows=1000 width=64) (actual time=34352.543..34352.622 rows=949 loops=1) |
12:46 |
mrpeters |
Total runtime: 34352.730 ms |
12:46 |
mrpeters |
(2 rows) |
12:47 |
mrpeters |
34 sconds is pretty damn reasonable |
12:47 |
mrpeters |
but that seems to go up and up and up as load increases |
12:48 |
mrpeters |
but, i did find out the indexes weren't primed into RAM (actually was able to fit the entire DB in ram this time) so maybe that's helped things |
12:52 |
|
jihpringle joined #evergreen |
12:55 |
|
kbeswick joined #evergreen |
12:58 |
mrpeters |
and they've gone to crap again |
12:58 |
mrpeters |
i can't even run that explain again now that the system had load again |
12:59 |
|
stevenyvr joined #evergreen |
12:59 |
mrpeters |
is there a particular table/index we should focus in on and run a vacuum on for keyword searches? |
13:03 |
dbs |
metabib.keyword_field_entry would be the key table to focus on |
13:04 |
* dbs |
slipped away for lunch |
13:05 |
mrpeters |
no worries dan |
13:05 |
dbs |
reingests going on? other things that might be trashing cache? |
13:05 |
mrpeters |
no ingests |
13:05 |
mrpeters |
my explain analyze verbose has been running for about 7 minutes now |
13:06 |
mrpeters |
just getting no where.....before we let people back in it was returning within 30 seconds |
13:06 |
mrpeters |
i assume what i gave above isnt enough info (i probably needed verbose?) |
13:07 |
dbs |
IIRC you want everything from within the query_parser_fts() function; the function itself masks all of the real work going on |
13:08 |
mrpeters |
http://pastie.org/8749326 is what im trying to get you |
13:12 |
dbs |
mrpeters: try http://pastie.org/8749337 |
13:12 |
eeevil |
mrpeters: explain just the core query |
13:12 |
mrpeters |
can do |
13:12 |
eeevil |
(what dbs said) |
13:12 |
* mrpeters |
realizes how little he knows about QP :( |
13:13 |
mrpeters |
gonna be a few minutes on this...waiting until 1:20 to take the system down again from the public |
13:14 |
mrpeters |
it won't budge right now |
13:15 |
dbs |
:( |
13:16 |
mrpeters |
thank you for baring with me |
13:16 |
mrpeters |
i sincerely appreciate the help....ive not run into anything like this before |
13:17 |
mrpeters |
can we stop sip/reports/utility |
13:17 |
mrpeters |
no reason for reports to be running |
13:17 |
mrpeters |
utility should be fine too |
13:19 |
dbs |
mrpeters: we had great pain last week or two due to very broad queries like "A" and "The" appearing to result in lock escalation |
13:19 |
mrpeters |
interesting |
13:19 |
dbs |
Pretty much a denial of service attack. Amusingly, due to our own IT dept running a 3rd party link checker service. |
13:19 |
dbs |
But could affect anyone :( |
13:20 |
mrpeters |
sounds like a great adventure haha |
13:24 |
mrpeters |
dbs: http://pastie.org/8749393 -- ah, finally |
13:25 |
* dbs |
has eyes pop at "Seq Scan on keyword_field_entry fe" - twice |
13:25 |
mrpeters |
drop some knowledge on me dbs! |
13:25 |
dbs |
That's visiting every row in metabib.keyword_field_entry |
13:25 |
dbs |
It's ignoring the index (if there is one) |
13:25 |
mrpeters |
interesting...... |
13:26 |
* dbs |
would expect to see "Index Scan using metabib_keyword_field_entry_index_vector_idx on keyword_field_entry fe" |
13:27 |
mrpeters |
http://pastie.org/8749400 |
13:27 |
dbs |
eeevil: you concur? |
13:27 |
mrpeters |
is that missing an index then? |
13:27 |
dbs |
it looks a lot like it |
13:28 |
dbs |
what does \d metabib.keyword_field_entry say? |
13:28 |
mrpeters |
thats in the paste, dbs |
13:28 |
mrpeters |
just above |
13:29 |
dbs |
yeah |
13:29 |
dbs |
you're missing metabib_keyword_field_entry_index_vector_idx |
13:29 |
mrpeters |
ok |
13:29 |
dbs |
CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector); |
13:30 |
mrpeters |
dbs++ |
13:30 |
mrpeters |
eeevil++ |
13:30 |
mrpeters |
hrm |
13:30 |
dbs |
realllllly weird that's missing |
13:30 |
mrpeters |
evergreen=# CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector); |
13:30 |
mrpeters |
ERROR: relation "metabib_identifier_field_entry_index_vector_idx" already exists |
13:30 |
jeffdavis |
s/identifier/keyword/g in dbs' command |
13:30 |
dbs |
jeffdavis++ # good catch, sorry! |
13:30 |
mrpeters |
ah, thanks jeffdavis |
13:30 |
dbs |
both uses of identifier there |
13:30 |
* eeevil |
reads up |
13:31 |
dbs |
CREATE INDEX metabib_keyword_field_entry_index_vector_idx ON metabib.keyword_field_entry USING GIST (index_vector); |
13:31 |
eeevil |
well, around 2.4/2.5, the indexes are dropped and recreated (IIRC) to speed up the upgrade |
13:31 |
dbs |
-- to be sure :) |
13:31 |
eeevil |
the recreate seems to have failed... ? |
13:31 |
dbs |
sounds like |
13:31 |
eeevil |
you might want to switch to GIN indexes while you're at it |
13:32 |
eeevil |
you can do that with CONCURRENTLY and drop the GIST ones after hours |
13:32 |
mrpeters |
you think? csharp mentioned that to me too |
13:32 |
* dbs |
thought about mentioning that but also thought about just keeping things as close to stock as possible until the firestorm dies |
13:32 |
mrpeters |
it was about 4AM and i wasn't up for tinkering |
13:32 |
|
jihpringle joined #evergreen |
13:32 |
mrpeters |
^^^ what dbs said :) |
13:33 |
mrpeters |
i know csharp has been happy with GIN indexes |
13:34 |
jeffdavis |
we've been using GIN for that index too quite happily, I believe it requires more disk space but is faster |
13:34 |
mrpeters |
just swap GIST for GIN in that query? |
13:35 |
jeffdavis |
I believe so |
13:35 |
dbs |
yep |
13:35 |
dbs |
we've been using GIN too, except when upgrades switch them back to GIST |
13:35 |
mrpeters |
good to know, awesome. is there a list anywhere of which indexes are best to convert? |
13:36 |
mrpeters |
2.4.3 > 2.5 upgrade script maybe? where it drops/recreates them? |
13:57 |
|
kbeswick_ joined #evergreen |
14:12 |
Wyuli |
Is there a button I can press that ships beer & pizza to devs? Can we add that in the Evergreen interface? |
14:13 |
bshum |
We should add that to the "about" page link :) |
14:14 |
Wyuli |
I'm getting stressed just watching the chatter roll through here... |
14:15 |
Wyuli |
I wouldn't be opposed to shipping refreshments out if I could. :) |
14:17 |
jboyer-laptaupe |
dbs++ |
14:17 |
jboyer-laptaupe |
eeevil++ |
14:17 |
jboyer-laptaupe |
I suspect we've been missing that index for longer than we know. |
14:17 |
bshum |
It's alive! alive! |
14:17 |
bshum |
:) |
14:23 |
eeevil |
ha! "laptaupe" ... nice :) |
14:30 |
rjackson-isl |
system is currently smoking! - seconded on dbs++ and eeevil++ |
14:31 |
bshum |
Calling 0861 |
14:31 |
|
mllewellyn joined #evergreen |
14:32 |
mrpeters |
yeah, much love for the help guys |
14:32 |
mrpeters |
fast as can be! |
14:37 |
|
stevenyvr joined #evergreen |
14:37 |
|
smyers_ joined #evergreen |
14:39 |
pinesol_green |
[evergreen|Jason Stephenson] LP1223903 - Add from_bare_hash to Fieldmapper.pm. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=b9edf30> |
14:39 |
pinesol_green |
[evergreen|Jason Stephenson] LP1223903 - Add indexes to authority.record_entry. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=1108b8b> |
14:39 |
pinesol_green |
[evergreen|Jason Stephenson] LP1223903 - Rewrite marc_export.in in support-scripts. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=2c28f9b> |
14:39 |
pinesol_green |
[evergreen|Jason Stephenson] LP1223903 - Add release notes for rewrite of marc_export. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=e495dbb> |
14:39 |
pinesol_green |
[evergreen|Ben Shum] LP1223903 - Stamping upgrade script for new authority_record_entry indexes - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=c0a0fc6> |
14:41 |
pinesol_green |
[evergreen|Ben Shum] LP1223903 - Also bump the schema config - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=c0c1061> |
14:57 |
|
smyers_ joined #evergreen |
14:59 |
eeevil |
Dyrcona: since you've been testing http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1234845_ranked_volumes would you mind taking a look at the SQL version in http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/miker/lp1234845_ranked_volumes ? |
14:59 |
* eeevil |
updates the LP bug too |
14:59 |
Dyrcona |
No problem. I'll load it in a few. |
15:00 |
Dyrcona |
It has mostly been kmlussier looking at it and she's on vacation at the moment. |
15:00 |
eeevil |
Dyrcona: might need to re-drop the intermediate one by hand on your test server, just a head's up |
15:00 |
eeevil |
thanks! |
15:00 |
Dyrcona |
ok |
15:01 |
rangi |
Dyrcona: when you get a chance (non urgent) you might like to do a pull of the ncip stuff too |
15:01 |
Dyrcona |
rangi: Cool. |
15:12 |
ktomita |
For browse search, does anyone know if the sample data has any "see" references? I have seen "See Also" but no "See". |
15:12 |
bshum |
I don't remember off hand what's the difference... |
15:13 |
|
mmorgan joined #evergreen |
15:13 |
ktomita |
bshum: "See" is 4XX for things like alternate names, "See Also" is 5XX for related things. |
15:14 |
ktomita |
bshum: that is my understanding |
15:15 |
bshum |
What's a record in the sample data with "see also"? Just curious. |
15:15 |
ktomita |
bshum: I did a subject browse for 'organic foods' |
15:15 |
* bshum |
wonders if this is some sort of authorities thing that he's not using yet |
15:16 |
phasefx |
in my concerto load, I see authority entries with 400 and 410 tags, but no 5XX |
15:16 |
ktomita |
bshum: yes, this is authorities related. |
15:16 |
|
kbeswick joined #evergreen |
15:16 |
bshum |
ktomita: Well, there isn't any organic foods in the concerto dataset so I'm assuming you must be using something else then. |
15:17 |
ktomita |
bshum: yeah sorry, I had imported some records to try and see it. I haven't been able to get that to work either |
15:21 |
ktomita |
I am going to go back and do some more research and ask again later. |
15:23 |
Dyrcona |
rangi; Good stuff. I need to do some research on what's been pushed where for the moment. |
15:23 |
Dyrcona |
eeevil: I loaded the new code. I did have to remove the intermediate one by hand. |
15:24 |
|
akilsdonk joined #evergreen |
15:24 |
* Dyrcona |
wonder if MNPALS is doing anything on the NCIP front. I've not seen anything of theirs go into the working repo in some time. |
15:30 |
eeevil |
ktomita: IIRC, from writing that with senator, in order to see "see" references you need to link your authority records together. see the 2.5 release notes, sections 2.2.3, 2,6.2 and 2.6.3 for more information |
15:30 |
ktomita |
eeevil: thanks, I will look into those areas |
15:32 |
Dyrcona |
authority_authority_linker.pl... If memory serves. |
15:32 |
eeevil |
Dyrcona: aye (2.2.3 in the release notes) |
15:33 |
eeevil |
though, as with authority_control_fields.pl, it is for priming, not perfect cataloging (because, as we know, MIGO (marc in garbage out)) |
15:33 |
ktomita |
I did run that script. |
15:33 |
eeevil |
ktomita: if you don't have authority records that would link to one another, it won't do anything for you |
15:34 |
ktomita |
eeevil: that brings me back to my original question about it that concerto data does have records that will link with "see" and "see also". |
15:34 |
eeevil |
I doubt it |
15:35 |
ktomita |
I have tried to setup my own records that would have "see" and "see also" linking but have only been able to see "see also" |
15:35 |
ktomita |
I was hoping to get an example of "see" linking. thanks for the responses. more research for me. |
15:46 |
ktomita |
Do you know how to setup the authority marc record so that it would have a "see" link? |
15:53 |
dbs |
SELECT * FROM authority.full_rec WHERE tag LIKE '5%'; says there are no auth records in the --load-all sample set with 5xx fields |
15:55 |
|
gsams joined #evergreen |
15:55 |
graced |
ktomita: I'm curious - are you implementing the authorities work that has been done in the community or writing your own? I heard a rumor at a library conference that you were writing one from scratch... |
15:58 |
ktomita |
graced: I am working with the community code |
15:58 |
graced |
ktomita: cool, thanks for dispelling my rumor :-) |
16:01 |
ktomita |
graced: have you dealt with see and see also much? |
16:09 |
|
Wyuli joined #evergreen |
16:10 |
jl- |
the staging table is still importing (been about 4 hours).. would it be wise to sneak in and see how many rows there are in bibs_records |
16:11 |
Dyrcona |
jl-: Probably not. |
16:11 |
Dyrcona |
Depends on locking. |
16:11 |
jl- |
k |
16:12 |
jl- |
the initial available 70 GB have shrunken to 25 GB |
16:12 |
jl- |
./s/initially/initial |
16:12 |
Dyrcona |
yep. |
16:13 |
jl- |
I'll start worrying once I'm in the single digits |
16:14 |
Dyrcona |
I don't know how much space you need to load via a staging table since I've never done it. |
16:18 |
jl- |
ils curveballs |
16:18 |
jl- |
wait, I don't even like baseball |
16:18 |
jl- |
let's call it a sciscor tackle |
16:19 |
jl- |
http://i.minus.com/iL5juqoZUPgkW.gif |
16:20 |
jeffdavis |
jl-: how many MARC records are you importing? |
16:22 |
jl- |
jeff 230,000 |
16:23 |
jl- |
jeffdavis |
16:26 |
jeffdavis |
hm |
16:27 |
|
jboyer_laptaupe joined #evergreen |
16:27 |
jeffdavis |
I've got a test dataset with about 278,000 bib records, total size of the database after ingest is 10 GB |
16:27 |
|
smyers__ joined #evergreen |
16:27 |
jl- |
something's fishy then |
16:27 |
jl- |
well |
16:27 |
jl- |
after ingest |
16:27 |
jl- |
but during? |
16:28 |
jeffdavis |
not sure |
16:28 |
jl- |
btw these are not 'test datasets' |
16:28 |
jeffdavis |
I tried the staging_importer method that was discussed here recently and it didn't work for me, I ended up killing it after about 4 days and using a slightly different approach |
16:28 |
jl- |
this is real data |
16:28 |
jl- |
from a real university |
16:28 |
jl- |
what worked for you? |
16:29 |
jeffdavis |
getting a list of all the id's from the staging table and converting it into an SQL script with 278,000 individual 'INSERT INTO biblio.record_entry' statements |
16:29 |
jeffdavis |
very inelegant, but it worked for me |
16:29 |
jl- |
interesting |
16:30 |
jeffdavis |
it's the only time I've tried that staging_importer approach, so I can't say if it normally works better or if it just doesn't work for large numbers of bibs |
16:30 |
jl- |
what do you usually use? |
16:30 |
jl- |
staff client? |
16:30 |
jeffdavis |
fwiw these were real (public library) records too |
16:30 |
jl- |
kk |
16:30 |
jeffdavis |
I had been using the old method before (the perl scripts), wanted to try out the new approach |
16:32 |
jl- |
jeffdavis I thought the perl scripts (like pg loader) take forever? |
16:33 |
Dyrcona |
jl-: They're fast compared to the staff client. |
16:33 |
jeffdavis |
that's one reason I wanted to try the other approach :) |
16:33 |
Dyrcona |
If the staff client were even useful for that sort of thing. |
16:34 |
jl- |
Dyrcona what's your take on this, 70GB almost used up, ran for 4 hours |
16:34 |
jl- |
is it the method or the records |
16:34 |
jl- |
or do I need more space |
16:34 |
Dyrcona |
jl-: I'd say its the method more than the records. |
16:35 |
Dyrcona |
Unless your records are huge or you're loading millions. |
16:35 |
jl- |
the question is if it will work eventually or not.. |
16:35 |
jl- |
or if I should abort it |
16:35 |
jl- |
I could try with giving it 100 GB |
16:35 |
jl- |
but that seems bizzare |
16:36 |
phasefx |
there are pg tuning things you can do temporarily to make imports faster, but that bit is voodoo to me |
16:36 |
phasefx |
stuff with working mem and checkpoints |
16:37 |
phasefx |
disable archive mode :) |
16:37 |
bshum |
Well, tuning postgres is probably a good thing to do anyways. |
16:38 |
phasefx |
just, differently tuned for normal production use as compared to huge batch imports |
16:39 |
Dyrcona |
You are also probably having issues with work logs, 'cause your whole import is likely done as a single transaction. |
16:39 |
Dyrcona |
Our entire database with indexes and that sort of thing takes up about 146GB of disk space. |
16:40 |
* bshum |
keeps trying to shrink his |
16:40 |
bshum |
:) |
16:40 |
Dyrcona |
dumped, compressed and encrypted, our database is only 14GB. :) |
16:42 |
Dyrcona |
jl-: Searching the IRC and dev mailing list lost from three or four years ago might turn up some discussions of what I was doing during our migration. |
16:42 |
Dyrcona |
I got some tips on tuning postgres for large loads back then. |
16:43 |
Dyrcona |
s/lost/logs/ |
16:43 |
jl- |
Dyrcona I could batch delete all log files |
16:43 |
jl- |
from the sys? |
16:43 |
Dyrcona |
jl-: No. Postgresql keeps a work log around in its own data until a transaction is complete or a checkpoint is reached. |
16:44 |
|
hbrennan joined #evergreen |
16:44 |
Dyrcona |
this doesn't show up in /var/log/ or anywhere like that. |
16:44 |
Dyrcona |
I recommend getting a good book on PostgreSQL or do some online research about how it works. |
16:45 |
* dbs |
used basically what is in the docs, but modified it to import in batches - not as extreme as jeffdavis' 1 insert per, but 100 records or 1000 records per transaction |
16:46 |
* dbs |
checks -- batches of 5000 records, actually |
16:47 |
dbs |
this was from an import in the summer of 2012. |
16:48 |
hbrennan |
We're upgrading and missed Rose-URI perl module on the db server.... is it safe to rerun the whole 2.3-2.4.0-upgrade-db.sql a second time? |
16:48 |
dbs |
http://pastie.org/8749954 - ah, cute, I started with 1000, then 5000, then batches of 10000, to get more feedback early... |
16:48 |
jeffdavis |
http://pastebin.com/VXgUiVEj <- this is what I did, allows parallelization without dealing with locks |
16:49 |
Dyrcona |
http://git.mvlcstaff.org/?p=jason/hz2eg.git;a=summary <- the public version of what I did. :) |
16:50 |
* dbs |
likes the simplicity of jeffdavis's approach, about 70 fewer files :) |
16:51 |
hbrennan |
and apologies for being rude and cutting in |
16:51 |
dbs |
hbrennan: not at all. I didn't answer you because I can't remember where Rose::URI fits in :/ |
16:51 |
* Dyrcona |
was moving a whole database from one ILS to another, not just bibs. |
16:52 |
hbrennan |
we fixed that part |
16:52 |
* dbs |
guesses that you probably don't want to rerun the whole upgrade script |
16:52 |
hbrennan |
but would it hurt? |
16:52 |
dbs |
Depends, actually; if the whole thing rolled back because Rose:URI was missing, then it wouldn't hurt. |
16:52 |
hbrennan |
I'm pretending to be smart by typing what the real admin is saying across the room |
16:53 |
dbs |
But without being there and seeing the logs... |
16:54 |
* dbs |
used basically what is in the docs for imports because he wrote the docs based on what he did for his last import :) |
16:54 |
hbrennan |
Ah. Just got a reply back from Equinox. They say it is a-okay to rerun |
16:55 |
hbrennan |
Hooray! We've been twittling thumbs for an hour waiting |
16:55 |
hbrennan |
Best to be safe though |
16:55 |
dbs |
hbrennan++ |
16:55 |
hbrennan |
Thanks, Dan! dbs++ |
16:57 |
dbs |
But yes, once you get past a small set of demo records, you're not going to avoid PostgreSQL tuning and customized approaches to loading, which is what we've probably said about 10 times now :/ |
16:58 |
|
Wyuli joined #evergreen |
17:19 |
|
mmorgan left #evergreen |
17:36 |
|
dcook joined #evergreen |
17:37 |
|
smyers_ joined #evergreen |
17:43 |
jeff |
hrm. search errors in the logs on one of my test systems. will investigate further tonight. |
17:44 |
jeff |
pgtap being present shouldn't break searches, should it? |
17:44 |
jeff |
seems unlikely. |
17:47 |
jeff |
oh. i believe i have a disconnect between my database and my services. all is broken as expected. :-) |
19:13 |
|
nap joined #evergreen |
19:14 |
|
nap left #evergreen |
19:47 |
|
nap joined #evergreen |
22:26 |
|
rjackson_isl joined #evergreen |
23:38 |
|
zerick joined #evergreen |