Search-engine friendly clone of the ACL2 documentation.

Use-set

Typo-detection

We try to detect possible typos in wire names.

Verilog implementations allow the use of implicit wires. Because of this, a typo in a wire name might go undetected. As part of our use-set analysis, we now try to detect wires that might be typos.

How do we know whether a wire name is actually misspelled, and is not simply some implicit wire that a logic designer is using? It is not clear that there is any hard and fast way to tell. Our goal, then, is to develop a heuristic method to identify as many typos as possible (and as few non-typos as possible).

Here is our approach at a high level.

First, we will only consider wires that are not explicitly defined, since these wires seem to be the obvious starting place to look for typos. Note that it is quite easy to identify these wires, e.g., see make-implicit-wires, which adds declarations to the modules to make these wires explicit.
Next, we will only consider the subset of these wires that are either unused or unset, per our ordinary use-set analysis. The idea behind this restriction is that typos are probably relatively rare, and it is unlikely that someone would misspell the name in both contexts.
The names of the remaining wires are then analyzed and compared with the names of existing, defined wires. The idea here is that most typos are probably minor things: character transpositions or omissions, incorrect case, improper underscores, etc. Only if the wire's name is somewhat similar to another wire do we report it as a potential typo.

Analyzing Wire Names

How can we determine if one wire name is similar to another? This problem is reminiscent of spell-checking, where approaches such as Soundex are used to determine if say that two words sound the same.

The reason that Soundex codes work well for spell checking prose is that, even though people may not know how to spell the words they want to write, they usually know how to say them. So, if someone wants to write the word Cholera, they might at least write something like Colara instead. The two "words" are phonetically similar, so by assigning and comparing phonetic codes, we can easily determine which words to suggest.

Soundex codes do not seem particularly appropriate for our task. Actually, our job seems easier. First, let us look at some of the wire names used throughout the design. There are some all-uppercase signals such as:

STPCLKB
THERMDC
THERMTRIPB

And there are some all-lowercase names, such as:

in
tclk
vbpa

But mostly what we find are mixed-case names, often with underscores, such as:

mmSnoopDataValid_CX_P
msBusWriteLOCK_C1_T0_P
rrT0McTm1PdgClr_P
orvHi
x1I3_ReadGflags_X
rnRomEnSel_A
matchb39_6b
bcDWCBAEnt_C0_P

Wire Name Partitioning

These names are such nonphonetic garble that ordinary algorithms like Soundex would probably not produce anything meaningful. On the other hand, these names seem quite easy to split up into pieces. And, probably, as someone trying to type one of these names, I am at least going to get most of these pieces right. For instance, if I am trying to write rnRomEnSel_A, I might forget the abbreviations used and type something like rnRomEnableSel_A, or forget the underscore and type rnRomEnSelA, or miscapitalize and type something like rnRomEnsel_A.

So, our first move is to split up the wire names into lists of pieces. Then, we can compare signal names on a piece-by-piece basis. To carry out this partitioning, we adopt the following strategy:

Consider numbers as lowercase characters.
We split into pieces upon encountering any underscore or other non-alphanumeric characters, and these special characters are simply dropped. For instance, given a name like rnRomEnSel_A, we will split into rnRomEnSel and A.
We split on every transition from a lowercase character to an uppercase character. For instance, in rnRomEnSel_A, this rule leads us to split into rn, Rom, En, and Sel_A (which is further split into Sel and A by the previous rule).
If we see two uppercase characters followed by a lowercase character, we split between the uppercase characters. For instance, in bcDWCBAEnt_C0_P, this rule leads us to split between bcDWCBA and Ent_C0_P (which are further split by the previous rules into bc, DWCBA, Ent, C0, and P.

Comparing Partitioned Names

Now, suppose we have an implicit wire, a, and an explicit wire b, and we want to decide whether a might be a typo and the designer really meant to write b. We begin by partitioning both wire names; call the resulting partitionings x and y.

If x and y are equal, then a and b differ only in that one might have underscores where the other does not. We think this is very likely to be a typo.

Otherwise, we compare the pieces of x and y in order. If exactly one piece differs, then we may wish to consider whether one to be a typo of the other after some further analysis. If more than a single piece is different, we think x and y are sufficiently distinct from one another that we will not consider x to be a typo of y.

This further analysis is motivated by looking at examples of matches.

First, we require that the two pieces have the same (case insensitive) leading character. The motivation here is to prevent considering a wire like bcDWCBAEnt_C0_P to be a typo of bcDWCBAEnt_D0_P.
We additionally require that adding a trailing b or B to either name does not cause them to become identical.
Finally, if the pieces have the same length, and are identical except that they end with distinct numbers, we reject the match. This is intended to prevent matching between signals like bcDWCBAEnt_C0_P and bcDWCBAEnt_C1_P.

Subtopics

Typo-read-lowercase-part: Read as many lowercase characters as possible, but stop early if a special is encountered.
Typo-read-part: Read the first "part" of a wire name.
Typo-read-uppercase-part: Read as many uppercase characters as we can.
Typo-read-special
Typo-detect-aux: We build an alist that might associate some of the wires to the lists of wires we think they could be typos of.
Typo-detect: Build an alist associating any "bad" wires with any "good" wires that they may be typos for.
Typo-find-plausible-typos1: Walk down the partitioning alist and gather the names of all signals that Part might be a typo for.
Typo-mismatch-plausibly-typo-p: X and Y are single pieces that are mismatched. Do they satisfy our criteria for being considered "possibly a typo"?
Typo-partitioning-alist: Build an alist mapping strings to their partitionings.
Typo-partition: Fully partition a wire name.
Vl-typo-first-mismatch: Given two same-length partitionings, return their first mismatch as a pair.
Typo-partitions-plausibly-typo-p: X and Y are whole-partitionings. Do they satisfy our criteria for being considered "possibly a typo"?
Vl-typo-count-mismatches: Given two same-length partitionings, determine how many of their pieces are mismatched.
Typo-numbers
*typo-special-substrings-chars*
*typo-numbers*