Is Fortran faster than C?

From time to time I read that Fortran is or can be faster then C for heavy calculations. Is that really true? I must admit that I hardly know Fortran, but the Fortran code I have seen so far did not show that the language has features that C doesn't have.

If it is true, please tell me why. Please don't tell me what languages or libs are good for number crunching, I don't intend to write an app or lib to do that, I'm just curious.

Source: (StackOverflow)

Fortran: integer*4 vs integer(4) vs integer(kind=4)

I'm trying to learn Fortran and I'm seeing a lot of different definitions being passed around and I'm wondering if they're trying to accomplish the same thing. What is the difference between the following?

  • integer*4
  • integer(4)
  • integer(kind=4)


Source: (StackOverflow)


CMake tutorial [closed]

Can anyone provide link(s) to good CMake tutorial except very expensive and hard-to-get official one?

Especially interesting in using CMake for Fortran projects but will be grateful for any good tutorial.


What I already found is CMake articles in Kitware Public Wiki. Fortran example is absolutely useless. =( Also while waiting for answers I'm playing with SCons. Looks nice. =)

Source: (StackOverflow)

Learning FORTRAN In the Modern Era

I've recently come to maintain a large amount of scientific calculation-intensive FORTRAN code. I'm having difficulties getting a handle on all of the, say, nuances, of a forty year old language, despite google & two introductory level books. The code is rife with "performance enhancing improvements". Does anyone have any guides or practical advice for de-optimizing FORTRAN into CS 101 levels? Does anyone have knowledge of how FORTRAN code optimization operated? Are there any typical FORTRAN 'gotchas' that might not occur to a Java/C++/.NET raised developer taking over a FORTRAN 77/90 codebase?

Source: (StackOverflow)

How to build i686-linux-android-gfortran for android-ndk8b (x86 arch Android)?

I tried building i686-linux-android-gfortran using build-gcc.sh following this (it's for androdindk-7b) but I get error about link.h. I added link.h from here, but it gives further more errors.

Has anyone tried enabling i686-linux-android-gfortran for x86 Android?

Source: (StackOverflow)

How can numpy be so much faster than my Fortran routine?

I get a 512^3 array representing a Temperature distribution from a simulation (written in Fortran). The array is stored in a binary file that's about 1/2G in size. I need to know the minimum, maximum and mean of this array and as I will soon need to understand Fortran code anyway, I decided to give it a go and came up with the following very easy routine.

  integer gridsize,unit,j
  real mini,maxi
  double precision mean

  read(unit=unit) tmp
  do j=2,gridsize**3
      read(unit=unit) tmp
      end if
  end do

This takes about 25 seconds per file on the machine I use. That struck me as being rather long and so I went ahead and did the following in Python:

    import numpy


Now, I expected this to be faster of course, but I was really blown away. It takes less than a second under identical conditions. The mean deviates from the one my Fortran routine finds (which I also ran with 128-bit floats, so I somehow trust it more) but only on the 7th significant digit or so.

How can numpy be so fast? I mean you have to look at every entry of an array to find these values, right? Am I doing something very stupid in my Fortran routine for it to take so much longer?


To answer the questions in the comments:

  • Yes, also I ran the Fortran routine with 32-bit and 64-bit floats but it had no impact on performance.
  • I used iso_fortran_env which provides 128-bit floats.
  • Using 32-bit floats my mean is off quite a bit though, so precision is really an issue.
  • I ran both routines on different files in different order, so the caching should have been fair in the comparison I guess ?
  • I actually tried open MP, but to read from the file at different positions at the same time. Having read your comments and answers this sounds really stupid now and it made the routine take a lot longer as well. I might give it a try on the array operations but maybe that won't even be necessary.
  • The files are actually 1/2G in size, that was a typo, Thanks.
  • I will try the array implementation now.


I implemented what @Alexander Vogt and @casey suggested in their answers, and it is as fast as numpy but now I have a precision problem as @Luaan pointed out I might get. Using a 32-bit float array the mean computed by sum is 20% off. Doing

real,allocatable :: tmp (:,:,:)
double precision,allocatable :: tmp2(:,:,:)

Solves the issue but increases computing time (not by very much, but noticeably). Is there a better way to get around this issue? I couldn't find a way to read singles from the file directly to doubles. And how does numpy avoid this?

Thanks for all the help so far.

Source: (StackOverflow)

Writing robust and "modern" Fortran code

In some scientific environments, you often cannot go without FORTRAN as most of the developers only know that idiom, and there is lot of legacy code and related experience. And frankly, there are not many other cross-platform options for high performance programming (C++ would do the task, but the syntax, zero-starting arrays, and pointers are not compatible with some people).

So, let's assume a new project must use Fortran 90, but I want to build the most modern software architecture out of it, while being compatible with most recent compilers (Intel ifort, but also including the Sun/HP/IBM compilers)

So I'm thinking of imposing stuff that is widely known as common good sense, but not yet a standard in my environment:

  • global variable forbidden, no gotos, no jump labels, implicit none, etc.
  • "object-oriented programming" (modules with datatypes and related subroutines)
  • modular/reusable functions, well documented, reusable libraries
  • assertions/preconditions/invariants (implemented using preprocessor statements)
  • unit tests for all (most) subroutines and "objects"
  • an intense "debug mode" (#ifdef DEBUG) with more checks and all possible Intel compiler checks possible (array bounds, subroutine interfaces, etc.)
  • uniform and enforced legible coding style, using code processing tool helpers.

The goal with all that is to have trustworthy, maintainable and modular code. Whereas, in lot of legacy codes, re-usability was not a important target.

I searched around for references about object-oriented Fortran, programming-by-contract (assertions/preconditions/etc.), and found only ugly and outdated documents, syntaxes and papers done by people with no large-scale project involvement, and dead projects.

Any good URLs, advice, reference paper/books on this subject?

Source: (StackOverflow)

How does BLAS get such extreme performance?

Out of curiosity I decided to benchmark my own matrix multiplication function versus the BLAS implementation... I was to say the least surprised at the result:

Custom Implementation, 10 trials of 1000x1000 matrix multiplication:

Took: 15.76542 seconds.

BLAS Implementation, 10 trials of 1000x1000 matrix multiplication:

Took: 1.32432 seconds.

This is using single precision floating point numbers.

My Implementation:

template<class ValT>
void mmult(const ValT* A, int ADim1, int ADim2, const ValT* B, int BDim1, int BDim2, ValT* C)
    if ( ADim2!=BDim1 )
    	throw std::runtime_error("Error sizes off");

    int cc2,cc1,cr1;
    for ( cc2=0 ; cc2<BDim2 ; ++cc2 )
    	for ( cc1=0 ; cc1<ADim2 ; ++cc1 )
    		for ( cr1=0 ; cr1<ADim1 ; ++cr1 )
    			C[cc2*ADim2+cr1] += A[cc1*ADim1+cr1]*B[cc2*BDim1+cc1];

I have two questions:

  1. Given that a matrix-matrix multiplication say: nxm * mxn requires n*n*m multiplications, so in the case above 1000^3 or 1e9 operations. How is it possible on my 2.6Ghz processor for BLAS to do 10*1e9 operations in 1.32 seconds? Even if multiplcations were a single operation and there was nothing else being done, it should take ~4 seconds.
  2. Why is my implementation so much slower?

Source: (StackOverflow)

The reading list for scientific programmer [closed]

I am working to become a scientific programmer. I have enough background in Math and Stat but rather lacking on programming background. I found it very hard to learn how to use a language for scientific programming because most of the reference for SP are close to trivial.

My work involves statistical/financial modelling and none with physics model. Currently, I use Python extensively with numpy and scipy. Done R/Mathematica. I know enough C/C++ to read code. No experience in Fortran.

I dont know if this is a good list of language for a scientific programmer. If this is, what is a good reading list for learning the syntax and design pattern of these languages in scientific settings.

Source: (StackOverflow)

Should I learn Fortran or C++ to extend R?

I work with machine learning with fairly large datasets (they still fit in memory) and I have written some calculations in R which I find to be too slow. Thus I would like to replace the "critical parts" of the program with compiled code that I would call from R. An example problem that I have in hand is implementing the forward-backward algorithm.

My question is whether I should learn Fortran or C++ to do this? I only need to work with numeric vectors or matrices. I'm mainly interested in which language is easier to learn and interface from R and I don't really care which one looks better on my CV.

I have read the R extensions manual and played a bit with the inline package with some simple Fortran and C++ code. My current impression is that Fortran95 would be simpler to learn, although the Rcpp package also looks very interesting. I currently know R, Python and Matlab.

Source: (StackOverflow)

Fortran vs C++, does Fortran still hold any advantage in numerical analysis these days?

With the rapid development of C++ compilers,especially the intel ones, and the abilities of directly applying SIMD functions in your C/C++ codes, does Fortran still hold any real advantage in the world of numerical computations?

I am from an applied maths background, my job involves a lot of numerical analysis, computations, optimisations and such, with a strictly defined performance-requirement.

I hardly know anything about Fortran, I have some experience in C/CUDA/matlab(if you consider the latter as a computer language to begin with), and my daily task involves analysis of very large data (e.g. 10GB-large matrix), and it seems the program at least spend 2/3 of its time on memory-accessing (thats why I send some of its job to GPU), do you people think it may worth the effects for me to trying the fortran routine on at least some performance-critical part of my codes to improve the performance of my program?

Because the complexity and things need to be done involved there, I will only go that routine if only there is significant performance benefit there, thanks in advance.

Source: (StackOverflow)

Why is fortran used for scientific computing? [closed]

I've read that Fortran is still heavily used for scientific computing. For code already heavily invested in Fortran this makes sense to me.

But is there a reason to use Fortran over other modern languages for a new project? Are there language design decisions in Fortran that makes it much more suitable for scientific computing compared to say the more popular languages (C++, Java, Python, Ruby, etc.)? For example, are there specific language features of Fortran that maybe allow numeric optimization in compilers to a much higher degree compared to other languages I mentioned?

Source: (StackOverflow)

fortran SAVE statement

I've read it's entry in the language reference (Intel's), but I cannot quite grasp what it does. Could someone in layman's terms explain it to me, what it means when it is included in a module ?

Source: (StackOverflow)

Why is the gcc math library so inefficient?

When I was porting some fortran code to c, it surprised me that the most of the execution time discrepancy between the fortran program compiled with ifort (intel fortran compiler) and the c program compiled with gcc, comes from the evaluations of trigonometric functions (sin, cos). It surprised me because I used to believe what this answer explains, that functions like sine and cosine are implemented in microcode inside microprocessors.

In order to spot the problem more explicitly I made a small test program in fortran

program ftest
  implicit none
  real(8) :: x
  integer :: i
  x = 0d0
  do i = 1, 10000000
    x = cos (2d0 * x)
  end do
  write (*,*) x
end program ftest

On intel Q6600 processor and 3.6.9-1-ARCH x86_64 Linux I get with ifort version 12.1.0

$ ifort -o ftest ftest.f90 
$ time ./ftest

real    0m0.280s
user    0m0.273s
sys     0m0.003s

while with gcc version 4.7.2 I get

$ gfortran -o ftest ftest.f90 
$ time ./ftest

real    0m2.148s
user    0m2.090s
sys     0m0.003s

This is almost a factor of 10 difference! Can I still believe that the gcc implementation of cos is a wrapper around the microprocessor implementation in a similar way as this is probably done in the intel implementation? If this is true, where is the bottle neck?


According to comments, enabled optimizations should improve the performance. My opinion was that optimizations do not affect the library functions ... which does not mean that I don't use them in nontrivial programs. However, here are two additional benchmarks (now on my home computer intel core2)

$ gfortran -o ftest ftest.f90
$ time ./ftest

real    0m2.993s
user    0m2.986s
sys     0m0.000s


$ gfortran -Ofast -march=native -o ftest ftest.f90
$ time ./ftest

real    0m2.967s
user    0m2.960s
sys     0m0.003s

Which particular optimizations did you (commentators) have in mind? And how can compiler exploit a multi-core processor in this particular example, where each iteration depends on the result of the previous one?


The benchmark tests of Daniel Fisher and Ilmari Karonen made me think that the problem might be related to the particular version of gcc (4.7.2) and maybe to a particular build of it (Arch x86_64 Linux) that I am using on my computers. So I repeated the test on the intel core i7 box with debian x86_64 Linux, gcc version 4.4.5 and ifort version 12.1.0

$ gfortran -O3 -o ftest ftest.f90
$ time ./ftest

real    0m0.272s
user    0m0.268s
sys     0m0.004s


$ ifort -O3 -o ftest ftest.f90
$ time ./ftest

real    0m0.178s
user    0m0.176s
sys     0m0.004s

For me this is a very much acceptable performance difference, which would never make me ask this question. It seems that I will have to ask on Arch Linux forums about this issue.

However, the explanation of the whole story is still very welcome.

Source: (StackOverflow)

Calling 32bit Code from 64bit Process

I have an application that we're trying to migrate to 64bit from 32bit. It's .NET, compiled using the x64 flags. However, we have a large number of DLLs written in FORTRAN 90 compiled for 32bit. The functions in the FORTRAN DLLs are fairly simple: you put data in, you pull data out; no state of any sort. We also don't spend a lot of time there, a total of maybe 3%, but the calculation logic it performs is invaluable.

Can I somehow call the 32bit DLLs from 64bit code? MSDN suggests that I can't, period. I've done some simple hacking and verified this. Everything throws an invalid entry point exception. The only possible solution i've found so far is to create COM+ wrappers for all of the 32bit DLL functions and invoke COM from the 64bit process. This seems like quite a headache. We can also run the process in WoW emulation, but then the memory ceiling wouldn't be increased, capping at around 1.6gb.

Is there any other way to call the 32bit DLLs from a 64bit CLR process?

Source: (StackOverflow)