Comments on the NADigest discussion regarding the use of MATLAB vs. Fortran in introductory numerical analysis courses.

 

 

 

 

Posting in NA Digest, V. 07, # 03

Of Fortran vs. Matlab, Notation, and Wikis by Robert van de Geijn:

 

As scientists, we are very familiar with the language of mathematics.  As educators, we carefully choose notation for introducing concepts in numerical analysis that is simultaneously precise and conveys the material at a level appropriate for the target audience. 

 

As computational scientists, we are familiar with one or more programming languages.  As educators we should choose a programming language that allows the mathematical notation to be naturally translated to code so that students can focus first on understanding the concepts without wasting time on the peculiarities of the language.

 

The problem is that the languages that are often/sometimes used for scientific computing, be it Fortran, C++, or Java, in their raw form do not express the mathematical notation well.  As a result, much time is wasted by our students as they struggle with the translation from mathematical notation to code.

 

The ideal solution is a domain-specific language for the particular problem at hand, be it numerical linear algebra algorithms, numerical integration, etc.  Unfortunately, domain-specific languages are cumbersome to develop and often lack the compilers that can deliver the performance that we demand. 

 

The practical solution is an Application Programming Interface (API) that allows one to code in an available language in a style that transforms that language into a domain-specific language.  The API should have the property that the translation from the mathematical notation to the API is simple.  An appropriate API can be easily defined for most programming languages, so that the choice of language becomes less important. 

 

An unfortunate consequence of history is the fact that languages like Fortran were developed in the distant past, and as a result have influenced the notation that we use to express algorithms in our courses and textbooks. For example, an algorithm for overwriting a matrix A with its Cholesky factorization will often appear in a book or on the white board as

 

for j=1:n

   A( j,j ) = sqrt( A(j,j) )

   for i=j+1:n

      A( i,j ) = A( i,j ) / A( j,j )

   end

   for k=j+1:n

      for i=k:n

         A( i,k ) = A( i,k ) – A( i,j ) * A( j,k )

      end

   end

end

 

(Here I purposely left a “bug” in the algorithm to illustrate what struggles students face.) 

It is stated in this way because it reflects how the naïve algorithm is typically presented in an effort to mimic languages like Fortran, more than that it resembles the derivation of the algorithm [1].  Notice that Matlab uses abstraction until it is used to implement algorithms like Cholesky factorization, when indices again rear their ugly heads.

 

After the 2000th student has come to one’s office hours with an indexing bug, it finally starts to become obvious that indices are the major obstacle when translating a mathematical description of the algorithm, which often involves submatrices in the above example, to code that exposes indices.  In other words, there is a disjoint between the abstraction at which the theory behind an algorithm is developed and the notation used to then express the algorithm and/or code.  This is not a problem with the chosen language.  Rather, it is a problem with the API chosen to represent the algorithm in code.

 

Experience tells us that it is much easier to translate code expressed at a high level of abstraction to code expressed in the more traditional fashion.  Thus, if students are first taught to use APIs that allow their code to be expressed at a high level of abstraction, there is not much lost.  Later in the course, or a subsequent course, key parts of the code can then be translated to straight loops, or calls to high-performance kernels.  Since this translation then happens in a context that makes sense to the student, having already understood the concepts, this step would be simultaneously less painful and more meaningful.

 

In the domain of dense linear algebra, which is often a topic in a beginning numerical analysis course, the proper choice of notation and API has had a startling consequence:  Algorithms and code can now be generated mechanically from problem specification.  Early evidence seems to indicate that notation that avoids indexing even allows numerical stability analyses to be made more systematic and possibly mechanical.  For details, I recommend the recent dissertation by Paolo Bientinesi [2].

 

We have started a “wiki” for linear algebra that builds on some of the above ideas: http://www.linearalgebrawiki.org.  Comments and participation from the community are welcome.  The benefit of a wiki is that the same topic can be described using multiple representation, in recognition that different people are more comfortable with different notation.

 

Bibliography (papers available from http://www.cs.utexas.edu/users/flame/Publications/)

 

[1] Paolo Bientinesi and Robert van de Geijn. " Representing Dense Linear Algebra Algorithms: A Farewell to Indices." FLAME Working Note #17. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-10.

 

[2] Paolo Bientinesi. “ Mechanical Derivation and Systematic Analysis of Correct Linear Algebra Algorithms.”  Ph.D. Dissertation.  Department of Computer Sciences, University of Texas at Austin, August 2006. 

 

 

Posting in NA Digest, V. 07, # 04

Those poor indices! by Louis Komzsik:

 

Dear Colleagues,

 

I believe that the lively FORTRAN vs. MATLAB discussion exposed another disconnect between the academia and the industry. I assumed that the

ultimate goal of this discussion was the education of industrially employable professionals and not a future generation of educators!  With all due respect to the importance of abstraction and the convenience of APIs, the comment of "the indices rearing their ugly heads" by an esteemed professor from a very highly regarded institution has just demonstrated aforementioned disconnect. 

 

Those indices are not just necessary evils of old fashioned languages like FORTRAN; they represent some physical or geometric quantities. For

the limited possibilities in the education environment, heavily but understandably focused on linear algebra, they at least represent the terms of matrices; and if there is something to do with certain terms of a matrix, then it must be done with indices. Even if they bring errors in the student algorithms or inconvenience the professors, they have significant educational and practical training value.

 

In the industrial work of a programmer, computer scientist or systems engineer, the indices may represent nodes and elements in FEA, geometric

entities in a CAD model, locations on a tool path in a CNC machine program, positions of a robotic arm movement in a CAM environment, or an instance in time of a simulation of dynamic behavior of a structure in CAE. Anybody working in such application environments must be able to use indices and should have experience in developing algorithms in conventional high level languages.

 

Louis Komzsik

UGS Corp.

 

 

Those poor physical and geometric quantities! by Robert van de Geijn:

 

This message is in reponse to the posting “Those poor indices!” by an esteemed practitioner from a highly regarded company in NA Digest, V07, #04 in response to my own posting on the Fortran vs. MATLAB debate. 

 

The misconception in “Those poor indices!” lies with the claim that “indices are not just necessary evils of old fashioned languages like FORTRAN; they represent some physical or geometric quantities.”  Replace the word “indices” by “vectors” or “entries in vectors” and this statement is of course absolutely correct.  But my original posting wasn’t about vectors or entries in vectors.  It was about notation, APIs, and the evils of indices. 

 

Consider the ith element of a vector x.  The index i is an artifact of having to provide a unique identifier so that data, that itself has physical and/or geometric meaning (expressed in some chosen basis), can be retrieved.  This data is x[ i ], not i.  Now, if our forefathers had chosen to store the data that describes a physical problem as a database, then a key would have been used instead of this index.  For example, the key could have consisted of the (relative or absolute) coordinates where the values with physical properties exist in space.  But our forefathers coded in languages like Fortran, and therefore these coordinates were mapped to the natural numbers, which were then used to indirectly address the data of interest.

 

Now, one could argue that indices are a necessary evil, since they provide possibly the only efficient method for accessing the required data.  Well, yes and no.  If you look at how truly efficient code is written, you will find that is uses pointers, which are directly incremented.  Even indexed Fortran code, when a high quality optimizing compiler is applied, is translated to code that does pointer manipulation rather than indexing into arrays, which demonstrates the artificial nature of indices.

 

What the posting “Those poor indices!” clearly shows is that the required emphasis on indices, which is an artifact of how we choose to code, has completely obscured the important quantities in our computations:  the physical and geometric quantities.  If even an expert like my esteemed colleague can lose track of that, clearly our students in beginning classes will be completely confused.    And this demonstrates the need for avoiding unnecessarily complex indexing in those classes.