[Fwd: Parsing @* (was Re: "always" block)]

From: Shalom Bresticker (Shalom.Bresticker@freescale.com)
Date: Sat Jul 10 2004 - 22:56:08 PDT

  • Next message: Shalom Bresticker: "[Fwd: Parsing @* (was Re: "always" block)]"

    --
    Shalom Bresticker                        Shalom.Bresticker @freescale.com
    Design & Reuse Methodology                           Tel: +972 9  9522268
    Freescale Semiconductor Israel, Ltd.                 Fax: +972 9  9522890
    POB 2208, Herzlia 46120, ISRAEL                     Cell: +972 50 5441478
    

    [ ]Freescale Internal Use Only [ ]Freescale Confidential Proprietary

    attached mail follows:


    "Stephen Williams" <spamtrap@icarus.com> wrote in message
    news:491e2$40edb595$40695902$2628@msgid.meganewsservers.com...
    > I'm one of those compiler writers who had a run in with the syntax,
    > but unlike some other things in Verilog, it is not ambiguous. I had
    > to detect "(*)" in the lexical analyzer, and if matched, convert it
    > to a single '*' token. This prevents "(*" from being interpreted as
    > the start-of-attribute token. It's a little quirky, but not difficult.

    Unfortunately it is ambiguous. Since it is not literally defined as a single
    token it can and is treated differently by different tools.
    Some tools allow @ ( * ) but complain on @ ( /*Alice has */ * /*a cat*/ )
    which suggests that they treat whole sequence ( * ) including optional
    whitespace as one token. Other tools can deal with whitespace, comments,
    macros etc. between (, * and ) which suggests they deal with it on the
    syntax level - (, * and ) are separate tokens and (* and *) attribute tokens
    are treated in the grammar as a valid replacement for pairs of tokens ( with
    * and * with ) accordingly.

    You can stop reading here, below are just some things related to lexical
    part of Verilog I always wanted to say :o)

    In a perfect world "Lexical conventions" chapter of the LRM would not
    contain any information about syntax or semantical meaning of the tokens it
    defines but would have all the solid tokens listed (operators, keywords...)
    and all constructible tokens defined by simple and unequivocal rules
    (strings, numbers, identifiers...). Later chapters would only refer to the
    defined tokens and would not define new ones. In order to do this, lexical
    grammar would need to be at least context-free, unfortunately it is not the
    case of Verilog.

    Reality it much worse, because it's the vendors who define actual Verilog.
    Even if they implement something that is wrong their custromers and backward
    compatibility rule will force them to keep the unwanted features (I am
    guilty myself :o( but I'm not fan of backward compatibility as long as lack
    of it gives something in return). One of the examples would be this
    discussion:
    http://www.boyd.com/1364_btf/report/full_pr/350.html
    IMHO whoever allowed configurations to be part of regular Verilog source in
    their implementation simply shot themselves in the foot, in particular
    because lexical rules of library mapping file are in conflict with those for
    regular Verilog source (see file_path_spec rule - if it was defined as
    string there would be no issue, but with current rules I find it more
    problematic than additional keywords conflicting with some existing
    designs). Personally I don't see LRM allowing configs as part of regular
    source text (although some statements in chapter 13 suggest it, but 13.2.1
    should make it clear that they are allowed only in special files, and annex
    A, which is normative, makes a clear distinction between library_text and
    source_text). It would be nice feature to allow configs among modules but
    first, rules of config body would have to be adapted to the existing rules
    of regular source text.

    Everyone who dealt with such "nice and useful features" of lexical grammar
    of Verilog knows what I'm talking about f.e.:
    based numbers being composed of three separate tokens (instead of two: size
    and base+value - value can't be parsed without base),
    edge_descriptor which is defined as new kind of token far outside chapter 2
    and in some cases conflicts with unsized_number or identifier tokens,
    similar case of level_symbols gluing even in LRM examples (8.7),
    not mentioning de facto standard `uselib directive (similar problems to the
    ones of file_path_spec in library mapping file but easy to workaround as
    long as you don't consider Windows paths, besides `uselib does not allow
    wildcards which makes it much less problematic than configs) or
    real numbers in a form starting from dot (invalid according to LRM but
    accepted by many tools)...
    The ambiguity of (*) is just one of many in Verilog, I agree that it is one
    of the easiest to deal with :o)

    That's all for now, thank you for reading my complaints :o)

    Regards,
    ABW



    This archive was generated by hypermail 2.1.4 : Sat Jul 10 2004 - 22:53:45 PDT and
    sponsored by Boyd Technology, Inc.