You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

328 line
15KB

  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!-- Copyright (C) 1988-2020 Free Software Foundation, Inc.
  4. Permission is granted to copy, distribute and/or modify this document
  5. under the terms of the GNU Free Documentation License, Version 1.3 or
  6. any later version published by the Free Software Foundation; with the
  7. Invariant Sections being "Funding Free Software", the Front-Cover
  8. Texts being (a) (see below), and with the Back-Cover Texts being (b)
  9. (see below). A copy of the license is included in the section entitled
  10. "GNU Free Documentation License".
  11. (a) The FSF's Front-Cover Text is:
  12. A GNU Manual
  13. (b) The FSF's Back-Cover Text is:
  14. You have freedom to copy and modify this GNU Manual, like GNU
  15. software. Copies published by the Free Software Foundation raise
  16. funds for GNU development. -->
  17. <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
  18. <head>
  19. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  20. <title>Vector Extensions (Using the GNU Compiler Collection (GCC))</title>
  21. <meta name="description" content="Vector Extensions (Using the GNU Compiler Collection (GCC))">
  22. <meta name="keywords" content="Vector Extensions (Using the GNU Compiler Collection (GCC))">
  23. <meta name="resource-type" content="document">
  24. <meta name="distribution" content="global">
  25. <meta name="Generator" content="makeinfo">
  26. <link href="index.html#Top" rel="start" title="Top">
  27. <link href="Option-Index.html#Option-Index" rel="index" title="Option Index">
  28. <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
  29. <link href="C-Extensions.html#C-Extensions" rel="up" title="C Extensions">
  30. <link href="Offsetof.html#Offsetof" rel="next" title="Offsetof">
  31. <link href="Return-Address.html#Return-Address" rel="prev" title="Return Address">
  32. <style type="text/css">
  33. <!--
  34. a.summary-letter {text-decoration: none}
  35. blockquote.indentedblock {margin-right: 0em}
  36. blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
  37. blockquote.smallquotation {font-size: smaller}
  38. div.display {margin-left: 3.2em}
  39. div.example {margin-left: 3.2em}
  40. div.lisp {margin-left: 3.2em}
  41. div.smalldisplay {margin-left: 3.2em}
  42. div.smallexample {margin-left: 3.2em}
  43. div.smalllisp {margin-left: 3.2em}
  44. kbd {font-style: oblique}
  45. pre.display {font-family: inherit}
  46. pre.format {font-family: inherit}
  47. pre.menu-comment {font-family: serif}
  48. pre.menu-preformatted {font-family: serif}
  49. pre.smalldisplay {font-family: inherit; font-size: smaller}
  50. pre.smallexample {font-size: smaller}
  51. pre.smallformat {font-family: inherit; font-size: smaller}
  52. pre.smalllisp {font-size: smaller}
  53. span.nolinebreak {white-space: nowrap}
  54. span.roman {font-family: initial; font-weight: normal}
  55. span.sansserif {font-family: sans-serif; font-weight: normal}
  56. ul.no-bullet {list-style: none}
  57. -->
  58. </style>
  59. </head>
  60. <body lang="en">
  61. <a name="Vector-Extensions"></a>
  62. <div class="header">
  63. <p>
  64. Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="prev">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
  65. </div>
  66. <hr>
  67. <a name="Using-Vector-Instructions-through-Built_002din-Functions"></a>
  68. <h3 class="section">6.52 Using Vector Instructions through Built-in Functions</h3>
  69. <p>On some targets, the instruction set contains SIMD vector instructions which
  70. operate on multiple values contained in one large register at the same time.
  71. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used
  72. this way.
  73. </p>
  74. <p>The first step in using these extensions is to provide the necessary data
  75. types. This should be done using an appropriate <code>typedef</code>:
  76. </p>
  77. <div class="smallexample">
  78. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  79. </pre></div>
  80. <p>The <code>int</code> type specifies the <em>base type</em>, while the attribute specifies
  81. the vector size for the variable, measured in bytes. For example, the
  82. declaration above causes the compiler to set the mode for the <code>v4si</code>
  83. type to be 16 bytes wide and divided into <code>int</code> sized units. For
  84. a 32-bit <code>int</code> this means a vector of 4 units of 4 bytes, and the
  85. corresponding mode of <code>foo</code> is <acronym>V4SI</acronym>.
  86. </p>
  87. <p>The <code>vector_size</code> attribute is only applicable to integral and
  88. floating scalars, although arrays, pointers, and function return values
  89. are allowed in conjunction with this construct. Only sizes that are
  90. positive power-of-two multiples of the base type size are currently allowed.
  91. </p>
  92. <p>All the basic integer types can be used as base types, both as signed
  93. and as unsigned: <code>char</code>, <code>short</code>, <code>int</code>, <code>long</code>,
  94. <code>long long</code>. In addition, <code>float</code> and <code>double</code> can be
  95. used to build floating-point vector types.
  96. </p>
  97. <p>Specifying a combination that is not valid for the current architecture
  98. causes GCC to synthesize the instructions using a narrower mode.
  99. For example, if you specify a variable of type <code>V4SI</code> and your
  100. architecture does not allow for this specific SIMD type, GCC
  101. produces code that uses 4 <code>SIs</code>.
  102. </p>
  103. <p>The types defined in this manner can be used with a subset of normal C
  104. operations. Currently, GCC allows using the following operators
  105. on these types: <code>+, -, *, /, unary minus, ^, |, &amp;, ~, %</code>.
  106. </p>
  107. <p>The operations behave like C++ <code>valarrays</code>. Addition is defined as
  108. the addition of the corresponding elements of the operands. For
  109. example, in the code below, each of the 4 elements in <var>a</var> is
  110. added to the corresponding 4 elements in <var>b</var> and the resulting
  111. vector is stored in <var>c</var>.
  112. </p>
  113. <div class="smallexample">
  114. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  115. v4si a, b, c;
  116. c = a + b;
  117. </pre></div>
  118. <p>Subtraction, multiplication, division, and the logical operations
  119. operate in a similar manner. Likewise, the result of using the unary
  120. minus or complement operators on a vector type is a vector whose
  121. elements are the negative or complemented values of the corresponding
  122. elements in the operand.
  123. </p>
  124. <p>It is possible to use shifting operators <code>&lt;&lt;</code>, <code>&gt;&gt;</code> on
  125. integer-type vectors. The operation is defined as following: <code>{a0,
  126. a1, &hellip;, an} &gt;&gt; {b0, b1, &hellip;, bn} == {a0 &gt;&gt; b0, a1 &gt;&gt; b1,
  127. &hellip;, an &gt;&gt; bn}</code>. Vector operands must have the same number of
  128. elements.
  129. </p>
  130. <p>For convenience, it is allowed to use a binary vector operation
  131. where one operand is a scalar. In that case the compiler transforms
  132. the scalar operand into a vector where each element is the scalar from
  133. the operation. The transformation happens only if the scalar could be
  134. safely converted to the vector-element type.
  135. Consider the following code.
  136. </p>
  137. <div class="smallexample">
  138. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  139. v4si a, b, c;
  140. long l;
  141. a = b + 1; /* a = b + {1,1,1,1}; */
  142. a = 2 * b; /* a = {2,2,2,2} * b; */
  143. a = l + a; /* Error, cannot convert long to int. */
  144. </pre></div>
  145. <p>Vectors can be subscripted as if the vector were an array with
  146. the same number of elements and base type. Out of bound accesses
  147. invoke undefined behavior at run time. Warnings for out of bound
  148. accesses for vector subscription can be enabled with
  149. <samp>-Warray-bounds</samp>.
  150. </p>
  151. <p>Vector comparison is supported with standard comparison
  152. operators: <code>==, !=, &lt;, &lt;=, &gt;, &gt;=</code>. Comparison operands can be
  153. vector expressions of integer-type or real-type. Comparison between
  154. integer-type vectors and real-type vectors are not supported. The
  155. result of the comparison is a vector of the same width and number of
  156. elements as the comparison operands with a signed integral element
  157. type.
  158. </p>
  159. <p>Vectors are compared element-wise producing 0 when comparison is false
  160. and -1 (constant of the appropriate type where all bits are set)
  161. otherwise. Consider the following example.
  162. </p>
  163. <div class="smallexample">
  164. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  165. v4si a = {1,2,3,4};
  166. v4si b = {3,2,1,4};
  167. v4si c;
  168. c = a &gt; b; /* The result would be {0, 0,-1, 0} */
  169. c = a == b; /* The result would be {0,-1, 0,-1} */
  170. </pre></div>
  171. <p>In C++, the ternary operator <code>?:</code> is available. <code>a?b:c</code>, where
  172. <code>b</code> and <code>c</code> are vectors of the same type and <code>a</code> is an
  173. integer vector with the same number of elements of the same size as <code>b</code>
  174. and <code>c</code>, computes all three arguments and creates a vector
  175. <code>{a[0]?b[0]:c[0], a[1]?b[1]:c[1], &hellip;}</code>. Note that unlike in
  176. OpenCL, <code>a</code> is thus interpreted as <code>a != 0</code> and not <code>a &lt; 0</code>.
  177. As in the case of binary operations, this syntax is also accepted when
  178. one of <code>b</code> or <code>c</code> is a scalar that is then transformed into a
  179. vector. If both <code>b</code> and <code>c</code> are scalars and the type of
  180. <code>true?b:c</code> has the same size as the element type of <code>a</code>, then
  181. <code>b</code> and <code>c</code> are converted to a vector type whose elements have
  182. this type and with the same number of elements as <code>a</code>.
  183. </p>
  184. <p>In C++, the logic operators <code>!, &amp;&amp;, ||</code> are available for vectors.
  185. <code>!v</code> is equivalent to <code>v == 0</code>, <code>a &amp;&amp; b</code> is equivalent to
  186. <code>a!=0 &amp; b!=0</code> and <code>a || b</code> is equivalent to <code>a!=0 | b!=0</code>.
  187. For mixed operations between a scalar <code>s</code> and a vector <code>v</code>,
  188. <code>s &amp;&amp; v</code> is equivalent to <code>s?v!=0:0</code> (the evaluation is
  189. short-circuit) and <code>v &amp;&amp; s</code> is equivalent to <code>v!=0 &amp; (s?-1:0)</code>.
  190. </p>
  191. <a name="index-_005f_005fbuiltin_005fshuffle"></a>
  192. <p>Vector shuffling is available using functions
  193. <code>__builtin_shuffle (vec, mask)</code> and
  194. <code>__builtin_shuffle (vec0, vec1, mask)</code>.
  195. Both functions construct a permutation of elements from one or two
  196. vectors and return a vector of the same type as the input vector(s).
  197. The <var>mask</var> is an integral vector with the same width (<var>W</var>)
  198. and element count (<var>N</var>) as the output vector.
  199. </p>
  200. <p>The elements of the input vectors are numbered in memory ordering of
  201. <var>vec0</var> beginning at 0 and <var>vec1</var> beginning at <var>N</var>. The
  202. elements of <var>mask</var> are considered modulo <var>N</var> in the single-operand
  203. case and modulo <em>2*<var>N</var></em> in the two-operand case.
  204. </p>
  205. <p>Consider the following example,
  206. </p>
  207. <div class="smallexample">
  208. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  209. v4si a = {1,2,3,4};
  210. v4si b = {5,6,7,8};
  211. v4si mask1 = {0,1,1,3};
  212. v4si mask2 = {0,4,2,5};
  213. v4si res;
  214. res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
  215. res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,6} */
  216. </pre></div>
  217. <p>Note that <code>__builtin_shuffle</code> is intentionally semantically
  218. compatible with the OpenCL <code>shuffle</code> and <code>shuffle2</code> functions.
  219. </p>
  220. <p>You can declare variables and use them in function calls and returns, as
  221. well as in assignments and some casts. You can specify a vector type as
  222. a return type for a function. Vector types can also be used as function
  223. arguments. It is possible to cast from one vector type to another,
  224. provided they are of the same size (in fact, you can also cast vectors
  225. to and from other datatypes of the same size).
  226. </p>
  227. <p>You cannot operate between vectors of different lengths or different
  228. signedness without a cast.
  229. </p>
  230. <a name="index-_005f_005fbuiltin_005fconvertvector"></a>
  231. <p>Vector conversion is available using the
  232. <code>__builtin_convertvector (vec, vectype)</code>
  233. function. <var>vec</var> must be an expression with integral or floating
  234. vector type and <var>vectype</var> an integral or floating vector type with the
  235. same number of elements. The result has <var>vectype</var> type and value of
  236. a C cast of every element of <var>vec</var> to the element type of <var>vectype</var>.
  237. </p>
  238. <p>Consider the following example,
  239. </p><div class="smallexample">
  240. <pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
  241. typedef float v4sf __attribute__ ((vector_size (16)));
  242. typedef double v4df __attribute__ ((vector_size (32)));
  243. typedef unsigned long long v4di __attribute__ ((vector_size (32)));
  244. v4si a = {1,-2,3,-4};
  245. v4sf b = {1.5f,-2.5f,3.f,7.f};
  246. v4di c = {1ULL,5ULL,0ULL,10ULL};
  247. v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
  248. /* Equivalent of:
  249. v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
  250. v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
  251. v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
  252. v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
  253. v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */
  254. </pre></div>
  255. <a name="index-vector-types_002c-using-with-x86-intrinsics"></a>
  256. <p>Sometimes it is desirable to write code using a mix of generic vector
  257. operations (for clarity) and machine-specific vector intrinsics (to
  258. access vector instructions that are not exposed via generic built-ins).
  259. On x86, intrinsic functions for integer vectors typically use the same
  260. vector type <code>__m128i</code> irrespective of how they interpret the vector,
  261. making it necessary to cast their arguments and return values from/to
  262. other vector types. In C, you can make use of a <code>union</code> type:
  263. </p><div class="smallexample">
  264. <pre class="smallexample">#include &lt;immintrin.h&gt;
  265. typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
  266. typedef unsigned int u32x4 __attribute__ ((vector_size (16)));
  267. typedef union {
  268. __m128i mm;
  269. u8x16 u8;
  270. u32x4 u32;
  271. } v128;
  272. </pre></div>
  273. <p>for variables that can be used with both built-in operators and x86
  274. intrinsics:
  275. </p>
  276. <div class="smallexample">
  277. <pre class="smallexample">v128 x, y = { 0 };
  278. memcpy (&amp;x, ptr, sizeof x);
  279. y.u8 += 0x80;
  280. x.mm = _mm_adds_epu8 (x.mm, y.mm);
  281. x.u32 &amp;= 0xffffff;
  282. /* Instead of a variable, a compound literal may be used to pass the
  283. return value of an intrinsic call to a function expecting the union: */
  284. v128 foo (v128);
  285. x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});
  286. </pre></div>
  287. <hr>
  288. <div class="header">
  289. <p>
  290. Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="prev">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
  291. </div>
  292. </body>
  293. </html>