These -m options are defined for the i386 and x86-64 family of computers:
-mtune=
cpu-type
Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are:
As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC was released.
There is no -march=generic option because -march
indicates the instruction set the compiler can use, and there is no
generic instruction set applicable to all processors. In contrast,
-mtune indicates the processor (or, in this case, collection of
processors) for which the code is optimized.
generic
, but when used as march
option, PentiumPro
instruction set will be used, so the code will run on all i686 family chips.
While picking a specific cpu-type will schedule things appropriately
for that particular chip, the compiler will not generate any code that
does not run on the i386 without the -march=
cpu-type option
being used.
-march=
cpu-type
Generate instructions for the machine type cpu-type. The choices
for cpu-type are the same as for -mtune. Moreover,
specifying -march=
cpu-type implies -mtune=
cpu-type.
-mcpu=
cpu-type
A deprecated synonym for -mtune.
-mfpmath=
unit
Generate floating point arithmetics for selected unit unit. The choices for unit are:
This is the default choice for i386 compiler.
For the i386 compiler, you need to use -march= cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.
This is the default choice for the x86-64 compiler.
-masm=
dialect
Output asm instructions using selected dialect. Supported
choices are intel or att (the default one). Darwin does
not support intel.
-mieee-fp
-mno-ieee-fp
Control whether or not the compiler uses IEEE floating point
comparisons. These handle correctly the case where the result of a
comparison is unordered.
-msoft-float
Generate output containing library calls for floating point. Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.
On machines where a function returns floating point results in the 80387
register stack, some floating point opcodes may be emitted even if
-msoft-float is used.
-mno-fp-ret-in-387
Do not use the FPU registers for return values of functions.
The usual calling convention has functions return values of types
float
and double
in an FPU register, even if there
is no FPU. The idea is that the operating system should emulate
an FPU.
The option -mno-fp-ret-in-387 causes such values to be returned
in ordinary CPU registers instead.
-mno-fancy-math-387
Some 387 emulators do not support the sin
, cos
and
sqrt
instructions for the 387. Specify this option to avoid
generating those instructions. This option is the default on FreeBSD,
OpenBSD and NetBSD. This option is overridden when -march
indicates that the target cpu will always have an FPU and so the
instruction will not need emulation. As of revision 2.6.1, these
instructions are not generated unless you also use the
-funsafe-math-optimizations switch.
-malign-double
-mno-align-double
Control whether GCC aligns double
, long double
, and
long long
variables on a two word boundary or a one word
boundary. Aligning double
variables on a two word boundary will
produce code that runs somewhat faster on a Pentium at the
expense of more memory.
On x86-64, -malign-double is enabled by default.
Warning: if you use the -malign-double switch,
structures containing the above types will be aligned differently than
the published application binary interface specifications for the 386
and will not be binary compatible with structures in code compiled
without that switch.
-m96bit-long-double
-m128bit-long-double
These switches control the size of long double
type. The i386
application binary interface specifies the size to be 96 bits,
so -m96bit-long-double is the default in 32 bit mode.
Modern architectures (Pentium and newer) would prefer long double
to be aligned to an 8 or 16 byte boundary. In arrays or structures
conforming to the ABI, this would not be possible. So specifying a
-m128bit-long-double will align long double
to a 16 byte boundary by padding the long double
with an additional
32 bit zero.
In the x86-64 compiler, -m128bit-long-double is the default choice as
its ABI specifies that long double
is to be aligned on 16 byte boundary.
Notice that neither of these options enable any extra precision over the x87
standard of 80 bits for a long double
.
Warning: if you override the default value for your target ABI, the
structures and arrays containing long double
variables will change
their size as well as function calling convention for function taking
long double
will be modified. Hence they will not be binary
compatible with arrays or structures in code compiled without that switch.
-mmlarge-data-threshold=
number
When -mcmodel=medium is specified, the data greater than
threshold are placed in large data section. This value must be the
same across all object linked into the binary and defaults to 65535.
-mrtd
Use a different function-calling convention, in which functions that
take a fixed number of arguments return with the ret
num
instruction, which pops their arguments while returning. This saves one
instruction in the caller since there is no need to pop the arguments
there.
You can specify that an individual function is called with this calling sequence with the function attribute stdcall. You can also override the -mrtd option by using the function attribute cdecl. See Function Attributes.
Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.
Also, you must provide function prototypes for all functions that
take variable numbers of arguments (including printf
);
otherwise incorrect code will be generated for calls to those
functions.
In addition, seriously incorrect code will result if you call a
function with too many arguments. (Normally, extra arguments are
harmlessly ignored.)
-mregparm=
num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.
Warning: if you use this switch, and
num is nonzero, then you must build all modules with the same
value, including any libraries. This includes the system libraries and
startup modules.
-msseregparm
Use SSE register passing conventions for float and double arguments and return values. You can control this behavior for a specific function by using the function attribute sseregparm. See Function Attributes.
Warning: if you use this switch then you must build all
modules with the same value, including any libraries. This includes
the system libraries and startup modules.
-mpc32
-mpc64
-mpc80
Set 80387 floating-point precision to 32, 64 or 80 bits. When -mpc32 is specified, the significands of results of floating-point operations are rounded to 24 bits (single precision); -mpc64 rounds the the significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the significands of results of floating-point operations to 64 bits (extended double precision), which is the default. When this option is used, floating-point operations in higher precisions are not available to the programmer without setting the FPU control word explicitly.
Setting the rounding of floating-point operations to less than the default
80 bits can speed some programs by 2% or more. Note that some mathematical
libraries assume that extended precision (80 bit) floating-point operations
are enabled by default; routines in such libraries could suffer significant
loss of accuracy, typically through so-called "catastrophic cancellation",
when this option is used to set the precision to less than extended precision.
-mstackrealign
Realign the stack at entry. On the Intel x86, the
-mstackrealign option will generate an alternate prologue and
epilogue that realigns the runtime stack. This supports mixing legacy
codes that keep a 4-byte aligned stack with modern codes that keep a
16-byte stack for SSE compatibility. The alternate prologue and
epilogue are slower and bigger than the regular ones, and the
alternate prologue requires an extra scratch register; this lowers the
number of registers available if used in conjunction with the
regparm
attribute. The -mstackrealign option is
incompatible with the nested function prologue; this is considered a
hard error. See also the attribute force_align_arg_pointer
,
applicable to individual functions.
-mpreferred-stack-boundary=
num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
On Pentium and PentiumPro, double
and long double
values
should be aligned to an 8 byte boundary (see -malign-double) or
suffer significant run time performance penalties. On Pentium III, the
Streaming SIMD Extension (SSE) data type __m128
may not work
properly if it is not 16 byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally
increases code size. Code that is sensitive to stack space usage, such
as embedded systems and operating system kernels, may want to reduce the
preferred alignment to -mpreferred-stack-boundary=2.
-mmmx
-mno-mmx
-msse
-mno-sse
-msse2
-mno-sse2
-msse3
-mno-sse3
-mssse3
-mno-ssse3
-msse4.1
-mno-sse4.1
-msse4.2
-mno-sse4.2
-msse4
-mno-sse4
-msse4a
-mno-sse4a
-msse5
-mno-sse5
-m3dnow
-mno-3dnow
-mpopcnt
-mno-popcnt
-mabm
-mno-abm
These switches enable or disable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow! extended instruction sets. These extensions are also available as built-in functions: see X86 Built-in Functions, for details of the functions enabled and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse.
These options will enable GCC to use these extended instructions in
generated code, even without -mfpmath=sse. Applications which
perform runtime CPU detection must compile separate files for each
supported architecture, using the appropriate flags. In particular,
the file containing the CPU detection code should be compiled without
these options.
-mcld
This option instructs GCC to emit a cld
instruction in the prologue
of functions that use string instructions. String instructions depend on
the DF flag to select between autoincrement or autodecrement mode. While the
ABI specifies the DF flag to be cleared on function entry, some operating
systems violate this specification by not clearing the DF flag in their
exception dispatchers. The exception handler can be invoked with the DF flag
set which leads to wrong direction mode, when string instructions are used.
This option can be enabled by default on 32-bit x86 targets by configuring
GCC with the --enable-cld configure option. Generation of cld
instructions can be suppressed with the -mno-cld compiler option
in this case.
-mcx16
This option will enable GCC to use CMPXCHG16B instruction in generated code.
CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
data types. This is useful for high resolution counters that could be updated
by multiple processors (or cores). This instruction is generated as part of
atomic built-in functions: see Atomic Builtins for details.
-msahf
This option will enable GCC to use SAHF instruction in generated 64-bit code.
Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions supported
by AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF and
SAHF are load and store instructions, respectively, for certain status flags.
In 64-bit mode, SAHF instruction is used to optimize fmod
, drem
or remainder
built-in functions: see Other Builtins for details.
-mrecip
This option will enable GCC to use RCPSS and RSQRTSS instructions (and their
vectorized variants RCPPS and RSQRTPS) with an additional Newton-Rhapson step
to increase precision instead of DIVSS and SQRTSS (and their vectorized
variants) for single precision floating point arguments. These instructions
are generated only when -funsafe-math-optimizations is enabled
together with -finite-math-only and -fno-trapping-math.
Note that while the throughput of the sequence is higher than the throughput
of the non-reciprocal instruction, the precision of the sequence can be
decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
-mveclibabi=
type
Specifies the ABI type to use for vectorizing intrinsics using an
external library. Supported types are acml
for the AMD
math core library style of interfacing. GCC will currently emit
calls to __vrd2_sin
, __vrd2_cos
, __vrd2_exp
,
__vrd2_log
, __vrd2_log2
, __vrd2_log10
,
__vrs4_sinf
, __vrs4_cosf
, __vrs4_expf
,
__vrs4_logf
, __vrs4_log2f
, __vrs4_log10f
and __vrs4_powf
when using this type and -ftree-vectorize
is enabled. A ACML ABI compatible library will have to be specified
at link time.
-mpush-args
-mno-push-args
Use PUSH operations to store outgoing parameters. This method is shorter
and usually equally fast as method using SUB/MOV operations and is enabled
by default. In some cases disabling it may improve performance because of
improved scheduling and reduced dependencies.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be
computed in the function prologue. This is faster on most modern CPUs
because of reduced dependencies, improved scheduling and reduced stack usage
when preferred stack boundary is not equal to 2. The drawback is a notable
increase in code size. This switch implies -mno-push-args.
-mthreads
Support thread-safe exception handling on Mingw32. Code that relies
on thread-safe exception handling must compile and link all code with the
-mthreads option. When compiling, -mthreads defines
-D_MT; when linking, it links in a special thread helper library
-lmingwthrd which cleans up per thread exception handling data.
-mno-align-stringops
Do not align destination of inlined string operations. This switch reduces
code size and improves performance in case the destination is already aligned,
but GCC doesn't know about it.
-minline-all-stringops
By default GCC inlines string operations only when destination is known to be
aligned at least to 4 byte boundary. This enables more inlining, increase code
size, but may improve performance of code that depends on fast memcpy, strlen
and memset for short lengths.
-minline-stringops-dynamically
For string operation of unknown size, inline runtime checks so for small
blocks inline code is used, while for large blocks library call is used.
-mstringop-strategy=
alg
Overwrite internal decision heuristic about particular algorithm to inline
string operation with. The allowed values are rep_byte
,
rep_4byte
, rep_8byte
for expanding using i386 rep
prefix
of specified size, byte_loop
, loop
, unrolled_loop
for
expanding inline loop, libcall
for always expanding library call.
-momit-leaf-frame-pointer
Don't keep the frame pointer in a register for leaf functions. This
avoids the instructions to save, set up and restore frame pointers and
makes an extra register available in leaf functions. The option
-fomit-frame-pointer removes the frame pointer for all functions
which might make debugging harder.
-mtls-direct-seg-refs
-mno-tls-direct-seg-refs
Controls whether TLS variables may be accessed with offsets from the
TLS segment register (%gs
for 32-bit, %fs
for 64-bit),
or whether the thread base pointer must be added. Whether or not this
is legal depends on the operating system, and whether it maps the
segment to cover the entire TLS area.
For systems that use GNU libc, the default is on.
-mfused-madd
-mno-fused-madd
Enable automatic generation of fused floating point multiply-add instructions if the ISA supports such instructions. The -mfused-madd option is on by default. The fused multiply-add instructions have a different rounding behavior compared to executing a multiply followed by an add.
These -m switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.
-m32
-m64
Generate code for a 32-bit or 64-bit environment.
The 32-bit environment sets int, long and pointer to 32 bits and
generates code that runs on any i386 system.
The 64-bit environment sets int to 32 bits and long and pointer
to 64 bits and generates code for AMD's x86-64 architecture. For
darwin only the -m64 option turns off the -fno-pic and
-mdynamic-no-pic options.
-mno-red-zone
Do not use a so called red zone for x86-64 code. The red zone is mandated
by the x86-64 ABI, it is a 128-byte area beyond the location of the
stack pointer that will not be modified by signal or interrupt handlers
and therefore can be used for temporary data without adjusting the stack
pointer. The flag -mno-red-zone disables this red zone.
-mcmodel=small
Generate code for the small code model: the program and its symbols must
be linked in the lower 2 GB of the address space. Pointers are 64 bits.
Programs can be statically or dynamically linked. This is the default
code model.
-mcmodel=kernel
Generate code for the kernel code model. The kernel runs in the
negative 2 GB of the address space.
This model has to be used for Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2
GB of the address space but symbols can be located anywhere in the
address space. Programs can be statically or dynamically linked, but
building of shared libraries are not supported with the medium model.
-mcmodel=large
Generate code for the large model: This model makes no assumptions about addresses and sizes of sections.