I think there is a problem in my "dependency graph". As an example, [here](http://poivron.org/~nil/misc/ikiwiki_buggy_index) is the index ikiwiki generated for [my site](http://poivron.org/~nil/misc/ikiwiki_buggy_index) (note that the site changed since this index was generated). Some **HUGE** dependencies appear, clearly non optimal, like depends = A| B | A | C | A | D | A | E | A | F | A | G | .... or depends= A | B | C | D | A | B | C | D | A | B | C | D | .... Couldn't isolate the cause, but some sources for this problem may be: * related to the img module * easily observable in my sire because one of my pages includes 80 resized images Other special things in my templates and site: * a sidebar with \[[!include pages="notes/\*" template=foo]] while notes.mdwn has a \[[!include pages="notes/*"]] and uses the sidebar; removed it, doesn't change * a template (biblio.tmpl) calling the "img" plugin with a template parameter as the image filename; removed it, doesn't change * some strange games with tags whose page calls a "map" directive to show other tags shile tags are also used in tagclouds (in the sidebar and in the main pages) * ... I observed these problems (same *kind*, I didn't check in details) on * ikiwiki 2.00gpa1 + v5.8.4 + Debian 3.1 * ikiwiki 2.3 + v5.8.8 + Ubuntu 7.04 I can think about reducung the size of my wiki source and making it available online for analysis. -- NicolasLimare > As long as these dependencies don't grow over time (ie, when a page is > edited and nothing changed that should add a dependency), I wouldn't > worry about them. There are many things that can cause non-optimal > dependencies to be recorded. For one thing, if you inline something, ikiwiki > creates a dependency like: > > (PageSpec) or (file1 or file2 or file3 ...) > > Where fileN are all the files that the PageSpec currently matches. (This > is ncessary to detect when a currently inlined file is deleted, and know > the inlining page needs an update.) Now consider what it does if you have > a single page with two inline statements, that inline the same set of > stuff twice: > > ((PageSpec) or (file1 or file2 or file3 ...) or (PageSpec) or (file1 or file2 or file3 ...) > > Clearly non-optimal, indeed. > > Ikiwiki doesn't bother to simplify complex PageSpecs > because it's difficult to do, and because all they use is some disk > space. Consider what ikiwiki uses these dependencies for. > All it wants to know is: does the PageSpec for this page it's considering > rebuilding match any of the pages that have changed? Determining this is > a simple operation -- the PageSpec is converted to perl code. The perl > code is run. > > So the total impact of an ugly dependency like this is: > > 1. Some extra data read/written to disk. > 2. Some extra space in memory. > 3. A bit more data for the PageSpec translation code to handle. But that > code is quite fast. > 4. Typically one extra function call when the generated perl code is run. > Ie, when the expression on the left-hand side fails, which typically > happens after one (inexpensive) function call, it has to check > the identical expression on the right hand side. > > So this is at best a wishlist todo item, not a bug. A PageSpec simplifier > (or improved `pagespec_merge()` function) could be written and improve > ikiwiki's memory and disk usage, but would it actually speed it up any? > We'd have to see the code to the simplifier to know. > > --[[Joey]] [[!template id=gitbranch branch=smcv/ready/optimize-depends author="[[smcv]]"]] >> I've been looking at optimizing ikiwiki for a site using >> [[plugins/contrib/album]] (which produces a lot of pages) and it seems >> that checking which pages depend on which pages does take a significant >> amount of time. The optimize-depends branch in my git repository >> avoids using `pagespec_merge()` for this (indeed it's no longer used >> at all), and instead represents dependencies as a list of pagespecs >> rather than a single pagespec. This does turn out to be faster, although >> not as much as I'd like. --[[smcv]] >>> [[Merged|done]] --[[smcv]] >>> I just wanted to note that there is a whole long discussion of dependencies and pagespecs on the [[todo/tracking_bugs_with_dependencies]] page. -- [[Will]] >>>> Yeah, I had a look at that (as the only other mention of `pagespec_merge`). >>>> I think I might have solved some of the problems mentioned there, >>>> actually - `pagespec_merge` no longer needs to exist in my branch (although >>>> I haven't actually deleted it), because the "or" operation is now done in >>>> the Perl code, rather than by merging pagespecs and translating. --[[smcv]] [[!template id=gitbranch branch=smcv/ready/remove-pagespec-merge author="[[smcv]]"]] >>>>> I've now added a patch to the end of that branch that deletes >>>>> `pagespec_merge` almost entirely (we do need to keep a copy around, in >>>>> ikiwiki-transition, but that copy doesn't have to be optimal or support >>>>> future features like [[tracking_bugs_with_dependencies]]). --[[smcv]] --- Some questions on your optimize-depends branch. --[[Joey]] In saveindex it still or'd together the depends list, but the `{depends}` field seems only useful for backwards compatability (ie, ikiwiki-transition uses it still), and otherwise just bloats the index. > If it's acceptable to declare that downgrading IkiWiki requires a complete > rebuild, I'm happy with that. I'd prefer to keep the (simple form of the) > transition done automatically during a load/save cycle, rather than > requiring ikiwiki-transition to be run; we should probably say in NEWS > that the performance increase won't fully apply until the next > rebuild. --[[smcv]] >> It is acceptable not to support downgrades. >> I don't think we need a NEWS file update since any sort of refresh, >> not just a full rebuild, will cause the indexdb to be loaded and saved, >> enabling the optimisation. --[[Joey]] Is an array the right data structure? `add_depends` has to loop through the array to avoid dups, it would be better if a hash were used there. Since inline (and other plugins) explicitly add all linked pages, each as a separate item, the list can get rather long, and that single add_depends loop has suddenly become O(N^2) to the number of pages, which is something to avoid.. > I was also thinking about this (I've been playing with some stuff based on the > `remove-pagespec-merge` branch). A hash, by itself, is not optimal because > the dependency list holds two things: page names and page specs. The hash would > work well for the page names, but you'll still need to iterate through the page specs. > I was thinking of keeping a list and a hash. You use the list for pagespecs > and the hash for individual page names. To make this work you need to adjust the > API so it knows which you're adding. -- [[Will]] > I wasn't thinking about a lookup hash, just a dedup hash, FWIW. > --[[Joey]] >> I was under the impression from previous code review that you preferred >> to represent unordered sets as lists, rather than hashes with dummy >> values. If I was wrong, great, I'll fix that and it'll probably go >> a bit faster. --[[smcv]] >>> It depends, really. And it'd certianly make sense to benchmark such a >>> change. --[[Joey]] Also, since a lot of places are calling add_depends in a loop, it probably makes sense to just make it accept a list of dependencies to add. It'll be marginally faster, probably, and should allow for better optimisation when adding a lot of depends at once. > That'd be an API change; perhaps marginally faster, but I don't > see how it would allow better optimisation if we're de-duplicating > anyway? --[[smcv]] >> Well, I was thinking that it might be sufficient to build a `%seen` >> hash of dependencies inside `add_depends`, if the places that call >> it lots were changed to just call it once. Of course the only way to >> tell is benchmarking. --[[Joey]] In Render.pm, we now have a triply nested loop, which is a bit scary for efficiency. It seems there should be a way to rework this code so it can use the optimised `pagespec_match_list`, and/or hoist some of the inner loop calculations (like the `pagename`) out. > I don't think the complexity is any greater than it was: I've just > moved one level of "loop" out of the generated Perl, to be > in visible code. I'll see whether some of it can be hoisted, though. > --[[smcv]] >> The call to `pagename` is the only part I can see that's clearly >> run more often than before. That function is pretty inexpensive, but.. >> --[[Joey]] Very good catch on img/meta using the wrong dependency; verified in the wild! (I've cherry-picked those bug fixes.) [[!tag wishlist patch patch/core]]