MPAS | -fp-model precise
Overview
The -fp-model precise flag is a compiler option used to
control floating-point behavior in programs, ensuring strict adherence
to value-safe optimizations when implementing floating-point
calculations. This option is primarily available in Intel compilers for
Linux and macOS systems, with Microsoft Visual C++ using the equivalent
/fp:precise syntax on Windows.
Mathematical Behavior
The compiler strictly adheres to IEEE-754 standards for
floating-point operations. It does not perform algebraic transformations
on floating-point expressions, such as reassociation or distribution,
unless it can guarantee the transformation produces a bitwise identical
result. This means expressions like a * b + a * c will not
be optimized to a * (b + c) under this mode.
Special values including NaN (Not a Number), positive infinity,
negative infinity, and negative zero are processed according to IEEE-754
specifications. For example, the expression x != x
evaluates to true if x is NaN.
MPAS
- MPAS Version 5.0
The ability to obtain bit-identical results for any MPI taskcount (though this may require the addition of compiler flags,e.g., '-fp-model precise' for the ifort compiler)
Experiment (no precise mode)
- MPASv8.2.2 by intel compiler
- MPAS-120km: x1.40962
- Core = 2, 4, 8, 16, 24, 32, 48, 68
- Initial datetime = 2023-04-02_00UTC
- IC source = ERA5
Elapsed time
| Step | Mesh | IC | Elapsed time | Core |
|---|---|---|---|---|
| init | MPAS-120km-MPASv822 | 2023040200 | <60s | 68 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 2h8m | 68 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 3h30m | 48 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 4h51m | 32 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 9h24m | 24 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 16h50m | 16 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 25h33m | 8 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 49h20m | 4 |
| atm | MPAS-120km-MPASv822 | 2023040200 | 62h36m | 2 |

Plot
- diag.2023-04-14_00.00.00.nc (Core-68 minus Core-2)












WRF/MPAS Forum
- different
results from different number of processors (resolved) | Feb 9, 2021
- We have seen similar issues that appear to be the result of compiler optimizations of various sorts. Have you tried turning off optimizations in the make target for your compiler in the top-level Makefile to see whether that enables you to get bitwise identical results for different MPI task counts?
- For the Intel Fortran compiler (ifort) specifically, it can help to
add the
'-fp-model precise' flag(as I've done in the above example).makefile ifort: ( $(MAKE) all \ "FC_PARALLEL = mpif90" \ "CC_PARALLEL = mpicc" \ "CXX_PARALLEL = mpicxx" \ "FC_SERIAL = ifort" \ "CC_SERIAL = icc" \ "CXX_SERIAL = icpc" \ "FFLAGS_PROMOTION = -real-size 64" \ "FFLAGS_OPT = -O0 -fp-model precise -convert big_endian -free -align array64byte" \ "CFLAGS_OPT = -O0 -fp-model precise" \ "CXXFLAGS_OPT = -O0" \ "LDFLAGS_OPT = -O0" \ "FFLAGS_DEBUG = -g -convert big_endian -free -CU -CB -check all -fpe0 -traceback" \ "CFLAGS_DEBUG = -g -traceback" \ "CXXFLAGS_DEBUG = -g -traceback" \ "LDFLAGS_DEBUG = -g -fpe0 -traceback" \ "FFLAGS_OMP = -qopenmp" \ "CFLAGS_OMP = -qopenmp" \ "CORE = $(CORE)" \ "DEBUG = $(DEBUG)" \ "USE_PAPI = $(USE_PAPI)" \ "OPENMP = $(OPENMP)" \ "CPPFLAGS = $(MODEL_FORMULATION) -D_MPI" )
- Another test that might be worth trying is to turn off all physics
schemes and see whether you get bitwise identical results with just the
dynamical core; that might help in tracking the issue down to either
physics or dynamics. The easiest way to turn off physics in the v7.0
release of the model would be to set in the &physics group in the
namelist.atmospherefile.config_physics_suite = 'none'
- Inconsistent
WRF results when using different number of cores | Sep 7, 2024
- I am running a large-domain WRF v4.5.2 simulation on Derecho with
varying core counts. The simulation setup is identical across runs,
except for the number of cores used. However, when comparing the output
at the same timestamps, I noticed significant differences in the results
dependent on the core count. For instance, after a five-day
period, the 2-m air temperature between runs can differ by as much as
±10 K in some grid cells. After several tests, I found that
this discrepancy is caused by the optimization settings during
compilation. When I disable optimization during compilation, the results
are identical regardless of the core count.
- So why does the optimization during compilation result in different outcomes when varying the number of cores? Could this be related to the size of the decomposed tiles and their communication across cores?
- We are well aware that different number of processors will lead to slightly different model results. This is caused by higher level of optimization.
- To run WRF with low level of optimization could be expensive. As an alternative, we always suggest users to stay with the same number of processors.
- I am running a large-domain WRF v4.5.2 simulation on Derecho with
varying core counts. The simulation setup is identical across runs,
except for the number of cores used. However, when comparing the output
at the same timestamps, I noticed significant differences in the results
dependent on the core count. For instance, after a five-day
period, the 2-m air temperature between runs can differ by as much as
±10 K in some grid cells. After several tests, I found that
this discrepancy is caused by the optimization settings during
compilation. When I disable optimization during compilation, the results
are identical regardless of the core count.