Re: commit head: bidi related performance fix in fl_BlockLayout::_doInsertTextSpan()

From: Joaquín Cuenca Abela (cuenca@pacaterie.u-psud.fr)
Date: Sat Jul 27 2002 - 07:34:12 EDT

  • Next message: Joaquín Cuenca Abela: "Re: commit head: bidi related performance fix in fl_BlockLayout::_doInsertTextSpan()"

    On Fri, 2002-07-26 at 23:57, Tomas Frydrych wrote:
    >
    > Hi Joaquin,
    >
    > > Unfortunately nearly is the right word :)
    > > I'm still having some extra splits with the "strong type a -> weak ->
    > > strong type b" combination.
    >
    > There are two types of this combination where you can get
    > unncessary split. One case is where the strong/weak boundary is at
    > the end of the span of text for which we are to create the run.
    > Unfortunately in this case there is nothing we can do about it,
    > because the text after the weak character(s) have not yet been
    > inserted, and so we must anticipate the worst case and split. I have
    > done some tests with problemfonts.abw and this does not seem to
    > figure at all.
    >
    > The other is in the case where the text for which we are inserting the
    > run is made up of several segments in the piecetable, i.e., when the
    > while loop runs more than once. In my tests I got about 37 times
    > when the while loop run 2 times, and about 7 when it run 3 times,
    > and 99 times where it run 1 time. This problem would seem to be
    > responsible for all the unnecessary splits. Now, this case could be
    > handled by the code, but it becomes quite a bit more complex and
    > will require lot of debugging not to screw things up for genuine
    > bidirectional documents.

    ok, for now I'm still looking for simple mistakes. If it takes time to
    solve that, we can leave that for later. I still have some others
    spots...

    > I have spent several hours on it tonight,
    > and given up for now -- the reversed loading bug makes it very
    > difficult to test --

    Yeah, I'm having exactly the same difficulties testing long docs...

    > but I will keep it on my TODO list, and come back to
    > it at some point in the future.
    >
    > > With your code I'm getting 4556396 of ::find_slot, and with if (false)
    > > I'm getting 3884410.
    >
    > I think there is probably scope for optimatization in the run
    > construction process and that we could cut this down even without
    > cutting the number of runs initially created. Any idea where the
    > ::find_slot calls originate? (incidentally, how big is your document?)

    It's that:

    Tata titi tutu
    (typeset as Heading 1, typeset in Verdana bold 17pt)

    asdfj asdfj asdf jasdjf as djfasjdf /asdfasd asdf asdfasdf/ */asdfa asdf
    lasf jla dslf adsf/* asldkfjas d asdl jfdsfj saldfjlafd jlas djflasdjfa
    asdf asdf asdf af.
    (typeset as Normal, Times New Roman, 12pt. The parts between // are in
    italic, and the parts between */ /* are in bold italic).

    That's repeated until it fills 205 pages. The document has a size of
    2.9M.

    For the ::find_slot stuff, yes, I see where they originate (when I wrote
    about ::find_slot, you should in fact read the top 20 functions of the
    profile, as they're all called so much due to the same problem).

    I will speak here using the data from the "if (false)" condition in
    _doInsertTextSpan. We have 3884410 of ::find_slot, and most of them
    comes from here:

                    0.00 0.09 113888/3611858
    PP_AttrProp::setProperty(char const *, char const *) [69]
                    0.01 0.18 232232/3611858
    PP_AttrProp::getPropertyType(char const *, tProperty_type) const [100]
                    0.02 0.41 536659/3611858
    pt_PieceTable::getStyle(char const *, PD_Style **) const [67]
                    0.05 0.98 1285367/3611858
    PP_AttrProp::getProperty(char const *, char const *&) const [42]
                    0.05 1.05 1375711/3611858
    PP_AttrProp::getAttribute(char const *, char const *&) const [38]

    So the getProperty and getAttribute functions dominate by a far shot the
    time that it takes to load the document.

    Now the question is why do we have more than 1 million of these function
    calls, when we have only 20678 runs (that's if the bidi don't makes any
    additional run. With the additional splits of the bidi code we have
    31014 runs).

    These runs are splitted like that:

    (with bidi)
    3879/31014 p_EndOfParagraphRun
    3879/31014 fp_FmtMarkRun
    23256/31014 fp_TextRun

    (without bidi)
    3879/20678 fp_EndOfParagraphRun
    3879/20678 fp_FmtMarkRun
    12920/20678 fp_TextRun

    In both cases, the number of runs are dominated by the number of
    fp_TextRun.

    Coming back to PP_AttrProp::getAttribute and getProperty:

    getAttribute:
    ============
    64204/1388916 fl_BlockLayout::_lookupProperties(void)
    152655/1388916 PD_Style::getAttribute(char const *, char const *&)
    const
    1124974/1388916 _getStyle(PP_AttrProp const *, PD_Document *)

    getProperty:
    ===========
    524402/1319603 PD_Style::getProperty(char const *, char const *&)
    const [62]
    795182/1319603 PP_evalProperty(char const *, PP_AttrProp const *,
    PP_AttrProp const *, PP_AttrProp const *, PD_Document *, bool) [15]

    So the find_slot calls comes mainly from:

    1124974/3884410 _getStyle
    524402/3884410 PD_Style::getProperty
    795182/3884410 PP_evalProperty

    _getStyle calls two getAttribute, and it's called 562487 times mainly
    from:
    509846/562487 PP_evalProperty

    and PP_evalProperty is called mainly from:
    30266/518512 fp_Run::lookupProperties(void) [24]
    51680/518512 fp_TextRun::_lookupProperties(PP_AttrProp const *,
    PP_AttrProp const *, PP_AttrProp const *) [43]
    62064/518512 FL_DocLayout::findFont(PP_AttrProp const *,
    PP_AttrProp const *, PP_AttrProp const *, int, bool) [78]
    112366/518512 fl_BlockLayout::getProperty(char const *, bool) const
    [53]
    117272/518512 FL_DocLayout::findFonts(PP_AttrProp const *,
    PP_AttrProp const *, PP_AttrProp const *, GR_Font **, GR_Font **, bool)
    [52]
    139947/518512 fp_Run::updateHighlightColor(void) [44]

    Note that this data is taking in account some optimizations that I've
    done (the findFonts function). For instance, when we get the font to
    typeset a run, it calls 8 times PP_evalProperty, and we were getting the
    font two times (1 for screen resolution, and another time for layout
    resolution). Changing that to get the two font versions doing 8
    PP_evalProperty instead of 16 (doing a new findFonts function) I
    prevented "several" calls to ::find_slots (before your change to the
    bidi part it was several millions of calls).

    As you can see, the number of runs * number of PP_evalProperty by run
    dominates the time that it takes to load the doc (I've just followed a
    branch of the data here, the other branches also finish with fp_Run and
    PP_evalProperty stuff afair).

    Your bidi patch reduces the number of runs, my local changes reduces the
    number of PP_evalProperty by run (there are other places where we have
    redundant PP_evalProperty, just like the font part that I outlined
    before, as calling updateHighlightColor several times without caching
    it).

    When the reverse loading bug will be fixed I will start sending some of
    these optimizations.

    Cheers,

    -- 
    Joaquín Cuenca Abela
    cuenca@pacaterie.u-psud.fr
    


    This archive was generated by hypermail 2.1.4 : Sat Jul 27 2002 - 07:39:59 EDT