I've been profiling my IkiWiki to try to improve speed (with many pages makes speed even more important) and I've written a patch to improve the speed of match_glob. This matcher is a good one to improve the speed of, because it gets called so many times. Here's my patch - please consider it! -- [[KathrynAndersen]] > It seems to me as though changing `glob2re` to return qr/$re/, and calling > `memoize(glob2re)` next to the other memoize calls, would be a less > verbose way to do this? --[[smcv]] >> I think so, yeah. Anyway, do you have any benchmark results handy, >> Kathryn? --[[Joey]] >>> See below. >>> Also, would it make more sense for glob2re to return qr/^$re$/i rather than qr/$re/? Everything that uses glob2re seems to use $foo =~ /^$re$/i >>> rather than /$re/ so I think that would make sense. >>> -- [[KathrynAndersen]] >>>> Git branch `smcv/ka-glob-cache` has Kathryn's patch. Git >>>> branch `smcv/memoize-glob2re` does as I suggested, which >>>> is less verbose than Kathryn's patch but also not as >>>> fast; I'm not sure why, tbh. --[[smcv]] >>>>> I think it's because my patch focuses on match_glob while the memoize patch focuses on `glob2re`, and `glob2re` is called in `filecheck`, `meta` and `po` as well as in `match_glob` and `match_user`; thus the memoized `glob2re` is dealing with a bigger set of globs to look up, and thus could be just that little bit slower. -- [[KathrynAndersen]] >>>>>> What may be going on is that glob2re is already a fairly fast >>>>>> function, so the overhead of memoizing it with the very generic >>>>>> `_memoizer` (see its source) swamps the memoization gain. Note >>>>>> that the few functions memoized with the Memoizer before were much >>>>>> more expensive, so that little overhead was acceptable then. >>>>>> >>>>>> It also may be that Kathryn's patch is slightly faster due to using >>>>>> the construct `$foo =~ $regexp` rather than `$foo =~ /$regexp/` >>>>>> (probably avoids a copy or something like that internally) -- >>>>>> this despite checking both `exists` and `defined` on the hash, which >>>>>> should be reundant AFAICS. >>>>>> >>>>>> My guess is that the best of both worlds would be to move >>>>>> the byhand memoization to glob2re and have it return a compiled >>>>>> `/^/i` regexp that can be used without further modifiction in most >>>>>> cases. --[[Joey]] -------------------------------------------------------------- Benchmarks done with Devel::Profile on the same testbed IkiWiki setup. I'm just showing the start of the profile output, since that's what's relevant. Before:
time elapsed (wall):   27.4173
time running program:  22.5909  (82.40%)
time profiling (est.): 4.8264  (17.60%)
number of calls:       1314729
number of exceptions:  65

%Time    Sec.     #calls   sec/call  F  name
11.05    2.4969    62333   0.000040     IkiWiki::PageSpec::match_glob
 4.10    0.9261      679   0.001364     Text::Balanced::_match_tagged
 2.72    0.6139    59812   0.000010     IkiWiki::SuccessReason::merge_influences
After:
time elapsed (wall):   26.1843
time running program:  21.5673  (82.37%)
time profiling (est.): 4.6170  (17.63%)
number of calls:       1252433
number of exceptions:  65

%Time    Sec.     #calls   sec/call  F  name
 7.66    1.6521    62333   0.000027     IkiWiki::PageSpec::match_glob
 4.33    0.9336      679   0.001375     Text::Balanced::_match_tagged
 2.81    0.6057    59812   0.000010     IkiWiki::SuccessReason::merge_influences
Note that the seconds per call for match_glob in the "after" case has gone down by about a third. K.A. -------------------------------------------------------------- A second set of benchmarks, done by rebuilding the docwiki at commit f942c2db05e4 like so: perl -Iblib/lib -d:Profile ikiwiki.in -setup docwiki.setup --no-verbose The docwiki appears to use fewer glob matches than Kathryn's wiki. With master: time elapsed (wall): 29.6970 time running program: 24.6930 (83.15%) time profiling (est.): 5.0041 (16.85%) number of calls: 1359180 number of exceptions: 13 %Time Sec. #calls sec/call F name 13.62 3.3629 3406 0.000987 Text::Balanced::_match_tagged 10.84 2.6773 79442 0.000034 IkiWiki::PageSpec::match_glob 3.08 0.7598 59454 0.000013 :IkiWiki/Plugin/inline.pm:223 3.07 0.7593 29830 0.000025 IkiWiki::bestlink 2.99 0.7378 10231 0.000072 IkiWiki::PageSpec::match_link With my `smcv/memoize-glob2re` branch: time elapsed (wall): 30.4931 time running program: 25.1248 (82.39%) time profiling (est.): 5.3683 (17.61%) number of calls: 1439943 number of exceptions: 13 %Time Sec. #calls sec/call F name 13.19 3.3146 3406 0.000973 Text::Balanced::_match_tagged 8.41 2.1123 79442 0.000027 IkiWiki::PageSpec::match_glob 3.97 0.9979 86905 0.000011 Memoize::_memoizer 3.05 0.7654 59454 0.000013 :IkiWiki/Plugin/inline.pm:223 3.02 0.7576 29830 0.000025 IkiWiki::bestlink and in a repeated run: 8.40 2.0905 79442 0.000026 IkiWiki::PageSpec::match_glob With Kathryn's patch as seen in my `smcv/ka-glob-cache` branch: time elapsed (wall): 27.7567 time running program: 22.9941 (82.84%) time profiling (est.): 4.7627 (17.16%) number of calls: 1279946 number of exceptions: 13 %Time Sec. #calls sec/call F name 14.29 3.2867 3406 0.000965 Text::Balanced::_match_tagged 7.89 1.8136 79442 0.000023 IkiWiki::PageSpec::match_glob 3.30 0.7577 59454 0.000013 :IkiWiki/Plugin/inline.pm:223 3.24 0.7461 29830 0.000025 IkiWiki::bestlink 3.19 0.7332 143 0.005127 ? IkiWiki::pagespec_match_list and in a repeated run: 7.84 1.8253 79442 0.000023 IkiWiki::PageSpec::match_glob --[[smcv]] --------------------------------------------------------------
diff --git a/IkiWiki.pm b/IkiWiki.pm
index 08a3d78..c187b98 100644
--- a/IkiWiki.pm
+++ b/IkiWiki.pm
@@ -2482,6 +2482,8 @@ sub derel ($$) {
 	return $path;
 }
 
+my %glob_cache;
+
 sub match_glob ($$;@) {
 	my $page=shift;
 	my $glob=shift;
@@ -2489,8 +2491,15 @@ sub match_glob ($$;@) {
 	
 	$glob=derel($glob, $params{location});
 
-	my $regexp=IkiWiki::glob2re($glob);
-	if ($page=~/^$regexp$/i) {
+	# Instead of converting the glob to a regex every time,
+	# cache the compiled regex to save time.
+	if (!exists $glob_cache{$glob}
+	    or !defined $glob_cache{$glob})
+	{
+	    my $re=IkiWiki::glob2re($glob);
+	    $glob_cache{$glob} = qr/^$re$/i;
+	}
+	if ($page =~ $glob_cache{$glob}) {
 		if (! IkiWiki::isinternal($page) || $params{internal}) {
 			return IkiWiki::SuccessReason->new("$glob matches $page");
 		}
--------------------------------------------------------------