<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StreamComputing</title>
	<atom:link href="http://www.streamcomputing.eu/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.streamcomputing.eu</link>
	<description>for Fast and Scalable Software</description>
	<lastBuildDate>Fri, 18 May 2012 10:17:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>NVIDIA: mobile phones, tablets and HPC (cloud)</title>
		<link>http://www.streamcomputing.eu/blog/2012-05-12/nvidia-mobile-phones-tablets-and-hpc-cloud/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-05-12/nvidia-mobile-phones-tablets-and-hpc-cloud/#comments</comments>
		<pubDate>Sat, 12 May 2012 15:53:23 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Future]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[OpenCL]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3499</guid>
		<description><![CDATA[If you want to see what is coming up in the market of consumer-technology (PC, mobile and tablet), then NVIDIA can tell you the most. The company is very flexible, and shows time after time it really knows in which markets is currently operates and can enter. I sometimes strongly disagree with their marketing, but [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3499" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><div id=""><img class="alignright size-medium wp-image-3500" title="Nvidia-Tegra-3-depremier" src="http://www.streamcomputing.eu/wp-content/uploads/2012/05/Nvidia-Tegra-3-depremier-300x225.jpg" alt="" width="300" height="225" />If you want to see what is coming up in the market of consumer-technology (PC, mobile and tablet), then NVIDIA can tell you the most. The company is very flexible, and shows time after time it really knows in which markets is currently operates and can enter. I sometimes strongly disagree with their marketing, but watch them closely as they are in the most important markets to define the near future in: PCs, Mobile/Tablet and <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>.</div>
<div></div>
<div>You might think I completely miss interconnects (buses between processors, devices and memory) and memory-technologies as clouds have a large need for high-speed data-transport, but the last 20 years have shown that this is a quite stable developing market based on IP-selling to the hardware-vendors. With the <a href="http://www.bit-tech.net/news/hardware/2012/04/25/intel-cray-interconnect/1" target="_blank">acquisition of Cray&#8217;s interconnect technology</a>, we have seen this is serious business for Intel, so things might change indeed. For this article I want to focus on NVIDIA&#8217;s choices.<span id="more-3499"></span></div>
<div>
<h1>Profit markets</h1>
<p>NVIDIA sees two growth-markets where there is growth in profit: <acronym title='High Performance Computing, or super-computing.'>HPC</acronym> and mobile. PC-sales are declining.</p>
<h2>HPC</h2>
</div>
<div>AMD has not targeted <acronym title='High Performance Computing, or super-computing.'>HPC</acronym> as a specific market with GPUs and Intel <a href="http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html" target="_blank">MIC</a> has not arrived in the stores yet, so that market is an open road now. I first was a bit disappointed with the <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>-capabilities of the current GPUs in the 600-series, but it actually makes sense if NVIDIA wants to get more into <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>. Splitting up the market more clearly in <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>-cards and gaming-cards could support marketing to get a Tesla-card in each computer-centre around the world. If they would have invested on <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>-capabilities on their gaming-cards, then it could get disastrous if a research-centre published wrong results computed on their GPUs.</div>
<h2>PCs</h2>
<div>Windows-gaming has been a main drive behind GPUs, but with the release of DirectX 10 the comments in comparison articles got a lot more sarcastic. Also more gamers bought consoles for their games. Why do you need an even faster <acronym title='The processor on the videocard'>GPU</acronym>? And what drove sales after 2006 besides new PCs? So hence the decline.</div>
<div></div>
<div>Why still make GPUs for PCs? NVidia could easily say: goodbye PC, we don&#8217;t believe in this sinking ship &#8211; go buy a mobile phone with Tegra. Sales are <a href="http://www.forbes.com/sites/patrickmoorhead/2012/05/10/discrete-graphics-the-sky-is-not-falling/" target="_blank">still high enough</a>, but this moment will come. The split between <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>-cards and gaming-cards I see as the beginning.</div>
<h2>Mobile</h2>
<div>A possible future with mobile phones+tablets well-connected with <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>-clouds having completely replaced PCs is very probable in 2016/2017.Where in the market of PC-components it was very hard to come with better products <em>actually needed</em> by its buyers is very hard since years, hence the growing embedded <acronym title='The processor on the videocard'>GPU</acronym>-market (competition on price, not performance). In such markets it is important to become unique. At the mobile platform there is lots of competition possible, as faster mobiles using less power will be requested for years to come. It was a wise choice to specialise on the mobile platform.</div>
<h1>Where <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> and where <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>?</h1>
<div>NVIDIA serves quite some markets currently, and this is what I&#8217;ve found on what choice they made for OpenCL/CUDA:</div>
<div id="">
<ul>
<li>HPC: definitely CUDA</li>
<li>Gaming at consoles: NVIDIA lost most market to AMD &#8211; but margins are very low anyway</li>
<li>Mobile: <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> (see what Neil Trevett of NVIDIA/Khronos <a title="External link" href="http://www.streamcomputing.eu/blog/2012-04-21/neil-trevett-on-opencl/" rel="external">says on mobiles and OpenCL</a>)</li>
<li>PCs: DirectCompute, <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>, <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> (undecided &#8211; see remark below)</li>
<li>Browser: WebCL (see also the mentioned link)</li>
</ul>
<p>Most of all, NVIDIA has discovered LLVM to be able to handle all types of programming languages. While last December a presenter from the <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>-team got updated on <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-capabilities (last minute of <a href="http://www.youtube.com/watch?v=Ux3F5MKuPjI" target="_blank">this video</a>), they&#8217;ve seemed to have figured out all the possibilities of using LLVM and are heavily betting on it. This will solve their multi-language problem, so even DirectCompute could get an LLVM-frontend. More about LLVM later, as this a very important subject.</p>
<p>We will certainly hear more about how NVIDIA sees the different market segments at their <a href="http://www.gputechconf.com/page/home.html" target="_blank">GTC conference</a>.</p>
</div>
<h1>Read more</h1>
<div>
<ul>
<li>The future of NVIDIA in 2009:  <a href="http://www.bit-tech.net/hardware/graphics/2009/08/20/does-nvidia-have-a-future/1">http://www.bit-tech.net/hardware/graphics/2009/08/20/does-nvidia-have-a-future/1</a></li>
<li>Why The Market Is Valuing Nvidia Incorrectly: <a href="http://seekingalpha.com/article/569461-why-the-market-is-valuing-nvidia-incorrectly">http://seekingalpha.com/article/569461-why-the-market-is-valuing-nvidia-incorrectly</a></li>
<li>The future of the $200 tablet: <a href="http://www.computerworld.com/s/article/9226572/The_future_of_the_200_tablet">http://www.computerworld.com/s/article/9226572/The_future_of_the_200_tablet</a></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-05-12/nvidia-mobile-phones-tablets-and-hpc-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Neil Trevett on OpenCL</title>
		<link>http://www.streamcomputing.eu/blog/2012-04-21/neil-trevett-on-opencl/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-04-21/neil-trevett-on-opencl/#comments</comments>
		<pubDate>Sat, 21 Apr 2012 14:49:33 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[WebCL]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3320</guid>
		<description><![CDATA[The Khronos Group gave some talks on their technologies in Shanghai China on the 17th of March 2012. Neil Trevett did some interesting remarks on the position of NVidia on OpenCL I would like to share with you. Neil Trevett is both an important member of Khronos and employee of NVidia. To be more precise, he [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3320" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><a href="http://www.streamcomputing.eu/wp-content/uploads/2012/04/Selectie_092.png"><img class="alignright size-medium wp-image-3327" title="Selectie_092" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/Selectie_092-300x204.png" alt="" width="300" height="204" /></a>The Khronos Group gave some talks on their technologies in Shanghai China on the 17th of March 2012. Neil Trevett did some interesting remarks on the position of NVidia on <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> I would like to share with you. Neil Trevett is both an important member of Khronos and employee of NVidia. To be more precise, he is the Vice President Mobile Content of NVidia and the president of Khronos. I think we can take his comments serious, but we must be very careful as these are mixed with his personal opinions.</p>
<p>Regular readers of the blog have seen I am not enthusiastic at all about NVidia&#8217;s marketing, but am a big fan of their hardware. And exactly I am very positive they are bold enough in the industry to position themselves very well with the fast-changing markets of the upcoming years. Having said that, let&#8217;s go to the quotes.</p>
<p>All quotes were from this video. Best you can do is to start at 41:50 till 45:35.</p>
<p></p>
<p>At 44:05 he states: &#8220;<em>In the mobile I think space <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> is unlikely to be widely adopted</em>&#8220;, and explains: &#8220;A<em> party API in the mobile industry doesn&#8217;t really meet market needs</em>&#8220;. Then continues with his vision on OpenCL: &#8220;<em>I think <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> in the mobile is going to be fundamental to bring parallel computation to mobile devices&#8221; </em>and then<em> &#8220;and into the web through WebCL</em>&#8220;.</p>
<p>Also interesting at 44:55: &#8220;<em>In the end NVidia doesn&#8217;t really mind which API is used, <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> or <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>. As long as you are get to use great GPUs</em>&#8220;. He ends with a smile, as &#8220;great GPUs&#8221; refers to NVidia&#8217;s of course. <img src='http://www.streamcomputing.eu/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>At 45:10 he puts NVidia&#8217;s plans on <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>, before getting back to : &#8220;<em>NVidia is going to support both [<acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> and <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>] in <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>. In Mobile it&#8217;s going to be all OpenCL</em>&#8220;.</p>
<p>At 45:23 he repeats his statements: &#8220;<em>In the mobile space I expect <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> to be the primary tool</em>&#8220;.</p>
<h1><span id="more-3320"></span>What does it mean?</h1>
<p>The main question in the viral video &#8220;double rainbow&#8221; is a good one. Specially because it is always a hard question.</p>
<p>For me these statements mean that at NVidia they have been thinking it through very well. I like that for two reasons. First that they are long term thinkers, second that they get the competition at the ARM-market on hardware-capacities and pricing.</p>
<p>Maybe not emphasised enough, but the market is going to change a lot. In short: the &#8220;middle market&#8221; (PC desktops and laptops) are going to disappear and/or being replaced. NVidia is anticipating a lot by focusing on two areas at both sides: <acronym title='High Performance Computing, or super-computing.'>HPC</acronym> and mobile.</p>
<h2>HPC</h2>
<p>Know Cobol and Fortran? Those are ancient languages that is still used a lot in <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>. I don&#8217;t know who&#8217;s law it is, but &#8220;The more security they want it, the more ancient the methods&#8221;. Another example of an industry that is prey of that law is the space-exploration &#8211; &#8220;better to have fully tested code than anything else&#8221;. <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> would be a very welcome add to these industries, and they know it.</p>
<p>NVidia mostly aims for the (incredibly growing) cloud-part of the future world. It&#8217;s most direct competitor: Intel.</p>
<h2>Mobile &amp; ARM</h2>
<p>NVidia sees that the faster they get to a standard, the faster ARM gets accepted as the next X86. The slower AMD moves, the bigger the market share. Also the &#8220;old world&#8221; of ARM-companies are not used to the ways of the consumers very well knows in the X86 world &#8211; ARM vendors are mostly not directly tied to consumers, but have one (or more) vendors between consumers and themselves. So NVidia has all the space to get big as they have all the stage for their own marketing.</p>
<p>If NVidia is backing <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, then this could give a good push in the right direction to get one standard for compute on mobile devices.</p>
<h2>Desktop &amp; Laptop</h2>
<p>While AMD and Intel go hybrid with X86, NVidia keeps making discrete GPUs. The result: losing market share each quarter. But if you look at the decreasing sales of these devices, the only growth is by harsher competition. I don&#8217;t think it is a bad choice at all.</p>
<p>The 2012 GPUs are split in two groups: gaming and <acronym title='High Performance Computing, or super-computing.'>HPC</acronym>. The differences between these two will give more answers on the path NVidia takes.</p>
<p>Also note that Trevett did not say anything about computers &#8211; neither ARM nor X86.</p>
<h2>OpenCL in the browser</h2>
<p>He was very explicit on WebCL as a standard for computing on the HTML5 web. It is still <a href="https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.html" target="_blank">working draft</a>, but things must go really strange if this standard will be replaced by another standard, such as Intel&#8217;s RiverTrail. There are no similar competing products based on <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>, DirectCompute or RenderScript.</p>
<h1>Perspective</h1>
<p>Neil Trevett said that a standard in the mobile industry is very important. To get quick acceptance of ARM as replacement for X86, standards (such as <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>) would need to get in place soon. NVidia is ready for a quickly shifting market, so some quotes could be seen differently in that light. Say demands for ARM-based computers will raise after the release of Windows 8, and games recompiled on ARM get released, then NVidia will have a better quarter than Intel or AMD for sure.</p>
<p>As these games-on-ARM and other software that needs processing power must start using <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, StreamComputing <a href="http://www.streamcomputing.eu/consultancy/">does not mind</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-04-21/neil-trevett-on-opencl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>USB-stick sized ARM-computers</title>
		<link>http://www.streamcomputing.eu/blog/2012-04-18/usb-stick-sized-arm-computers/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-04-18/usb-stick-sized-arm-computers/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 14:18:21 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3275</guid>
		<description><![CDATA[Now that smartphones get more powerful and internet makes it possible to have all functionality and documents with you anywhere, the computer needs to be reinvented. You see all big IT-companies searching for how that can be, from Windows Metro to complete docking stations to replace the desktop by your phone. A turbulent market. One [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3275" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><a href="http://www.brainstuck.com/2008/05/24/identity-crisis/"><img class="alignright size-medium wp-image-3276" title="identity-crisis" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/identity-crisis-296x300.jpg" alt="" width="296" height="300" /></a>Now that smartphones get more powerful and internet makes it possible to have all functionality and documents with you anywhere, the computer needs to be reinvented. You see all big IT-companies searching for how that can be, from Windows Metro to complete docking stations to replace the desktop by your phone. A turbulent market.</p>
<p>One of the new products are USB-stick sized computers. Stick them into a TV or monitor, zap in your code and you have your personal working environment. You never need to carry laptops to your hotel-room or conference, as long as a screen is available &#8211; any screen.</p>
<p>There are several USB-computers entering the market, but I wanted to introduce you to two. Both of these see a future in a strong processor in a portable device, and both do not have a real product with these strong processors. But you can expect that in 2013 you can have a device that can do very fast parallel processing to have a smooth Photoshop experience&#8230; at your key-ring.</p>
<h1><span id="more-3275"></span>FXI Tech Lollipop</h1>
<p>Just recently a company called <a href="http://www.fxitech.com/" target="_blank">FXI technologies</a> had a presentation in Japan and showed their roadmap containing <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-capable USB-computers. Of course we picked that up:</p>
<p><img class="alignnone size-large wp-image-3278" title="Selectie_076" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/Selectie_076-550x392.png" alt="" width="550" height="392" /></p>
<p>It has the A15 processor, wifi, bluetooth and USB 3.0. It is not very clear which <acronym title='The processor on the videocard'>GPU</acronym> they use, but very possible it will be the <a href="http://www.arm.com/products/multimedia/mali-graphics-hardware/mali-t658.php" target="_blank">ARM Mali T658</a>.</p>
<h1>NVIDIA Tegra-stick</h1>
<p>According to <a href="http://www.tomshardware.com/news/nvidia-patent-portable-computer-usb,13878.html" target="_blank">Tom&#8217;s Hardware</a>: &#8220;Nvidia&#8217;s filing outlines the idea of packing a Tegra processor, Flash storage, RAM, at least one USB port, and other interface types such as a parallel port, a serial port, IEEE 1394 (Firewire), VGA, HDMI, S-Video, AV, DVI, LAN and WiFi into a package that is just 40-60 mm long, 10-20 mm wide and 5-10 mm thick. The feature set would be comparable to a entry-level computer system with limited storage and cloud connectivity&#8221;. So very comparable with FXI Tech&#8217;s product, except it has a Tegra-GPU.</p>
<p><img class="alignnone size-full wp-image-3285" title="nvidia_USB" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/nvidia_USB.jpg" alt="" width="550" height="407" /></p>
<p>NVidia chose not to go for the hybrid X86-based CPU+GPU for desktops and laptops as AMD and Intel did, but to focus on mobile and on (<acronym title='High Performance Computing, or super-computing.'>HPC</acronym>) servers.</p>
<h1>Hardware potential in the coming years</h1>
<p>In 2015 memory-sizes, processing power per Watt and storage-capabilities will be at least 4 times of what we have now. When <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> (and alike techniques) are used by the masses, this can even be more &#8211; and that is where these two products bet on. Think 30+GB memory, many-core hybrid processors, GBit internet, Dropbox Free starting at 20+GB, etc.</p>
<p>Companies will have a hard(er) time to earn enough money as the market is mostly satisfied with a TV that has a possibility to attach an USB-stick containing somebody&#8217;s personal environment. So instead of wanting more stuff, getting rid of stuff is a stronger reason to sped money. A powerful computer starting at 35 euro (like the Raspberry Pi) plugged in an unpersonal screen+keyboard will help that demand.</p>
<p>As for StreamComputing (and customers) it is very important to know where the growth-market is for <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, soon more on this subject.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-04-18/usb-stick-sized-arm-computers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PDFs of Monday 16 April</title>
		<link>http://www.streamcomputing.eu/blog/2012-04-16/pdfs-of-monday-16-april/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-04-16/pdfs-of-monday-16-april/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 11:55:04 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[GPGPU]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[PDF-Monday]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3245</guid>
		<description><![CDATA[By exception, another PDF-Monday. OpenCL vs. OpenMP: A Programmability Debate. The one moment OpenCL and the other mom ent OpenMP produces faster code. From the conclusion: &#8220;OpenMP is more productive, while OpenCL is portable for a larger class of devices. Performance-wise, we have found a large variety of ratios between the two solutions, depending on the [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3245" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p>By exception, another PDF-Monday.</p>
<p><a href="http://www.pds.ewi.tudelft.nl/fileadmin/pds/homepages/shenjie/papers/CPC2012.pdf">OpenCL vs. OpenMP: A Programmability Debate</a>. The one moment <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and the other mom ent <acronym title='A framework for adding parallel programming to C, C++ and Fortran. Like MPI.'>OpenMP</acronym> produces faster code. From the conclusion: &#8220;<acronym title='A framework for adding parallel programming to C, C++ and Fortran. Like MPI.'>OpenMP</acronym> is more productive, while <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> is portable for a larger class of devices. Performance-wise, we have found a large variety of ratios between the two solutions, depending on the application, dataset sizes, compilers, and architectures.&#8221;</p>
<p><a href="http://www.cdl.uni-saarland.de/papers/karrenberg_opencl.pdf">Improving Performance of <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> on CPUs</a>. Focusing on how to optimise <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>. From the abstract: &#8220;First, we present a static analysis and an accompanying optimization to exclude code regions from control-flow to data-flow conversion, which is the commonly used technique to leverage vector instruction sets. Second, we present a novel technique to implement barrier synchronization.&#8221;</p>
<p><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/mtgp3.pdf">Variants of Mersenne Twister Suitable for Graphic Processors</a>. Source-code at <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/" target="_blank">http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/</a></p>
<p><a href="http://www.cs.manchester.ac.uk/resources/library/thesis_abstracts/BkgdReportsMSc11/Livesey-Matthew-bkgd-rept.pdf">Accelerating the FFTD method using SSE and GPUs</a>. &#8220;The Finite-Difference Time-Domain (FDTD) method is a computational technique for modelling the behaviour of electromagnetic waves in 3D space&#8221;. This is a project-plan, but describes the theories pretty well.<span id="more-3245"></span></p>
<p><a href="http://www.eecis.udel.edu/~grauerg/autoTuneGpu.pdf">Auto-tuning a High-Level Language Targeted to GPU Codes</a>. Auto-tuning is (according to us) a very important field in OpenCL/GPGPU the coming years. This article describes HMPP, a language which uses pragmas, and how auto-tuning can help to maximise performance.</p>
<p><a href="http://pl887.pairlitesite.com/papers/ispc/ispc_inpar_2012.pdf">Intel SPMD Program Compiler: A SPMD Compiler for High-Performance CPU Programming</a>. If you followed and liked NVidia-vs-Intel on &#8220;speed-up for free&#8221;, then you should read this.</p>
<p><a href="http://impact.crhc.illinois.edu/ftp/report/impact-12-01.parboil.pdf" target="_blank">Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing</a>. Various benchmarks pop up with <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>-support. This is one of them for <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>-only, (indirectly) funded by Microsoft, Intel and NVidia, but open source. <a href="http://impact.crhc.illinois.edu/parboil.php" target="_blank">http://impact.crhc.illinois.edu/parboil.php<br />
</a></p>
<p><a href="http://www.maths.ed.ac.uk/~jmarecek/opencl/opencl-data.pdf">Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> Accelerators</a>. This study is about &#8220;stochastic scheduling with precedencies&#8221;. Also interesting is that it is done in cooperation with ARM.</p>
<p><a href="http://dosen.narotama.ac.id/wp-content/uploads/2012/03/A-domain-specific-approach-to-heterogeneous-parallelism.pdf">A Domain-Specific Approach To Heterogeneous Parallelism</a>. From the abstract: &#8220;We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning&#8221;.</p>
<p><a href="http://www.altera.com/literature/wp/wp-01173-opencl.pdf">Implementing FPGA Design with the OpenCL Standard</a>. The official whitepaper of Altera for their upcoming product.</p>
<p><a href="http://web.info.uvt.ro/~petcu/COST/timisoara_skelCL-dOpenCL_presentation.pdf">Towards A High-Level Approach for Programming Distributed Systems with GPUs</a>. Introducing SkelCL, a higher level language, and dOpenCL, <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> for a distributed systems. No source or binaries, but the presentation describes their approach well to be able to compare with other solutions.</p>
<p><a href="http://crpit.com/confpapers/CRPITV127Dinneen.pdf">A Comparative Study of Parallel Algorithms for the Girth Problem</a>. A research that compared <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> against <acronym title='A framework for adding parallel programming to C, C++ and Fortran. Like MPI.'>OpenMP</acronym> with all the results at the last two pages.  The girth of a graph is the length of a shortest cycle contained in the graph.</p>
<p><a href="http://www.mcs.anl.gov/uploads/cels/papers/P2068-0312.pdf">AESOP: Expressing Concurrency in High-Performance System Software</a>. An alternative to current distributed and <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym> programming languages. Read to see what could be improved in <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> according to these researchers.</p>
<p><a href="http://wnbell.com/media/2012-SISC-AMG/GPU-AMG-SISC.pdf">Exposing Fine-grained parallelism in Algebraic Multigrid Methods</a>. 30 pages to describe the development of  &#8221;a parallel algebraic multigrid method which exposes substantial fine-grained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage&#8221;.</p>
<p><a href="http://www.cescg.org/CESCG-2012/papers/Toth-Robust_Volume_Segmentation_using_an_Abstract_Distance_Transform.pdf">Robust Volume Segmentation using an Abstract Distance Transform</a>. Segmenting noisy CT and MRI data using <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>.</p>
<p><a href="http://cis565-spring-2012.github.com/lectures/04-11-Mobile-GPUs.pdf">Mobile GPUs</a>. A short, nice overview of the current state of ARM-based GPUs.</p>
<p>I hope you liked this overview &#8211; help the other readers by telling interesting discoveries in the PDFs in the comments. And if you have been done research in this field, please <a href="http://www.streamcomputing.eu/about-us/contact/">contact</a> us.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-04-16/pdfs-of-monday-16-april/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lots of loops</title>
		<link>http://www.streamcomputing.eu/blog/2012-04-08/lots-of-loops/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-04-08/lots-of-loops/#comments</comments>
		<pubDate>Sun, 08 Apr 2012 19:03:58 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=2965</guid>
		<description><![CDATA[In &#8220;Separation of compute, control and transfer&#8221; I talked about node-wise programming as a way we should embrace instead of try to replace by loops. In this article I get into loops and discuss for a few types hopw they can be run in a parallel form. Dependency is the big variable in each type: the [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_2965" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><img class="alignright size-full wp-image-3247" title="240272" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/240272.jpg" alt="" width="250" height="214" />In &#8220;<a href="http://www.streamcomputing.eu/blog/2012-03-21/separation-of-compute-control-and-transfer/">Separation of compute, control and transfer</a>&#8221; I talked about node-wise programming as a way we should embrace instead of try to replace by loops. In this article I get into loops and discuss for a few types hopw they can be run in a parallel form. Dependency is the big variable in each type: the lower the dependency on previous iterations, the better it can be parallelised. Another one is the known iteration-dimensions known before the loop is started.</p>
<p>The more you think about it, the more you find out that a loop is not a loop.</p>
<h1><span id="more-2965"></span>Loop types</h1>
<p>The following loops are the ones I thought of. I did not find a collection of loop types, so they might be incomplete. You&#8217;ll find at many places that &#8220;some iterative problems let them better be described in while-loops, while others in for-loops&#8221;, not the exact definition of the various iterative problems. Please <a href="http://www.streamcomputing.eu/about-us/contact/">let me know</a> if you know research-articles of books, which describe this problem-field in detail.</p>
<p>Notation-style is from Matlab/Octave.</p>
<h2>Independent, fixed dimension</h2>
<p>This is code very easily run in parallel.</p>
<blockquote>
<pre>for i=1:n
    B[i]=A[i]*2;
end;</pre>
</blockquote>
<p>The only problem that needs to be solved is that each processor needs to do enough and memory is arranged well. These are the problems tackled in all <acronym title='The processor on the videocard'>GPU</acronym>-programming guides. Any problem that can be described in this form, is a winner. For large enough n, this is <em>the</em> example of code that can be sped up the easiest.</p>
<h2>Intermediate answers</h2>
<p>Each repetition the intermediate answer is updated, giving a result at the end. For instance the following loop.</p>
<blockquote>
<pre>k=1;
for i=1:n
    k=k*i;
end;</pre>
</blockquote>
<p>The result is the faculty of k and is not straight-forward to parallelise. One of the ways would be to split the loop in pieces and bring the results together at the end.</p>
<p>Some loops can be replaced by a single mathematical function (without any loops). That is without hardly any exception the fastest way to get the answer.</p>
<h2>Iterative search</h2>
<p>Finding the answer, by checking the intermediate-result each iteration. These are loops with intermediate answers, but without the fixed dimensions.</p>
<blockquote>
<pre>i=0;
while result&lt;10
    i++;
    result=A[i];
end;</pre>
</blockquote>
<p>If the programmer wanted to have the first element in A which is smaller than 10, then the parallel implementation would be different than if any element smaller than 10 had to be found. To speed things up, decisions are made in such code, we would have done differently assuming there are more cores in the processor. But say the first element needs to be found, striping-like techniques would best be used at the <acronym title='The processor on the videocard'>GPU</acronym> such that each of 100 cores searches at element x+100. So this type of loops can be very dependent on interpretation of the actual goal.</p>
<h2>Recursive</h2>
<p>Recursive functions are not really loops, but I do want to mention them. Recursive functions are very close to human way of thinking in more complex problems. These loops are very serial.</p>
<blockquote>
<pre>function recur (a, b)
    if a&gt;0
        recur (a-1, b*2+a)
    end;
end;</pre>
</blockquote>
<p>These functions can be transformed to a simple function (Mathematical software are a great help here), or <a href="http://stackoverflow.com/questions/4956336/turning-a-recursive-function-into-a-for-loop" target="_blank">unrolled</a> to stack-based loops (see below) or loops with intermedita answers. As a recursive function they cannot be run in parallel.</p>
<h2>Stack-based and backtracking</h2>
<p>A stack-based loop needs more intermediate answers and keeps them at a stack. An example I like is to solve a maze: the moment a dead end is found, the current trace is marked dead and the loop continues a step back. Another one is parsing text.</p>
<p>A parallel function is made by finding split-points in the search-area such that each branch has the same weight. These loops (most times) don&#8217;t have fixed dimensions, so weighting the branches running-time can be tricky.</p>
<h2>Controlled</h2>
<p>Within the loop there are control-statements (like &#8220;if&#8230;then&#8221;) or statements that alter the variables of the loop. Also while-loops are controlled code, as the result is checked each iteration. See also the iterative search.</p>
<blockquote>
<pre>k=3;
for i=1:n
    if mod(k,29)==0
        k=3;
        i=i-1;
    else
        k=(k-1)*k;
    end;
    B[i]=A[i]*k;
end;</pre>
</blockquote>
<p>Altering the loop while running is quite hard to interpret by compilers (both human and machine). So first, clean up the code by finding out how many times the code needs to be repeated. Second decide if it is</p>
<p>(<em>Note that control-statements are expensive on GPUs (especially AMD&#8217;s) and should be avoided, if possible &#8211; in this case very doable as all possible values of k are known and very limited</em>).</p>
<h1>Final words</h1>
<p>But if you work with 3D-coordinates, it seems no problem to solve many types of puzzles. And what about that we can get al multiples of 5 up to 100 by adding 5 iteratively, but also by multiplying [1..20] by 5. All can be described in loops, while all are very different problems.</p>
<p>Instead of a double loop, why not the following to stay close to node-wise programming?</p>
<blockquote>
<pre>rangerun i=0:0.001:pi, j=pi:0.001:2*pi
    result[i,j]=sin(i)*cos(j);
end;</pre>
</blockquote>
<p>Maybe new keywords will not be the best solution &#8211; IDEs could have a wizard to transform loops into range-runs, decreasing guess-work by the compiler. I strongly think we should not mix this no-dependency loops with high-dependency loops, and do try to make the programmer aware what he or she is doing. Iterating, building up to an answer, or running a range of data-computations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-04-08/lots-of-loops/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Supporting OpenCL on your own hardware</title>
		<link>http://www.streamcomputing.eu/blog/2012-04-02/supporting-opencl-on-your-own-hardware/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-04-02/supporting-opencl-on-your-own-hardware/#comments</comments>
		<pubDate>Mon, 02 Apr 2012 13:49:27 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3158</guid>
		<description><![CDATA[Say you have a device which is extremely good in numerical trigoniometrics (including integrals, transformations, etc to support mainly Fourier transforms) by using massive parallelism. You also have an optimised library which takes care of the transfer to the device and the handling of trigoniometric math. Then you find out that the strength of your company is [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3158" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><img class="alignright size-medium wp-image-3161" title="Selectie_002" src="http://www.streamcomputing.eu/wp-content/uploads/2012/04/Selectie_002-300x236.png" alt="" width="300" height="236" />Say you have a device which is extremely good in numerical trigoniometrics (including integrals, transformations, etc to support mainly Fourier transforms) by using massive parallelism. You also have an optimised library which takes care of the transfer to the device and the handling of trigoniometric math.</p>
<p>Then you find out that the strength of your company is not the device alone, but also the powerful and easy-to-use library. You also find out that companies are willing to pay for the library, if it would work with other devices too. From your own helpdesk you hear that most questions are about extending the library with specialised functions. Giving this information, you <a href="http://www.businessmodelgeneration.com/" target="_blank">define</a> new customer groups for device-only and library-only &#8211; so just by adopting a standard you can increase revenue. Read below which steps you have to take to adopt <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>.</p>
<h1><span id="more-3158"></span>OpenCL as intermediate language</h1>
<p><acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> could function very well as an intermediate language. It is an open standard (no royalties need to be paid and specifications are freely available), so ownership of the software is not given away while it is plugged in a larger ecosphere. The <a href="http://www.khronos.org/opencl/" target="_blank">specifications</a> need to be implemented including compiler &#8211; this is quite some work, as you can understand. By aiming at the same audience in version 1 (less support for the non-trigonometric functions), you can separate the library and the device with minimal overhead. This can be done without telling anybody, as it is an open standard. But mentioning you use <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> is not allowed, yet.</p>
<h1>Claiming the library supports OpenCL</h1>
<p>The moment you want to sell your library as a separate product, claim it supports <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and use the logo on your product, you need to sign a <a href="https://developer.apple.com/softwarelicensing/agreements/opencl.html" target="_blank">form</a> and send it to Apple. That is all. You are responsible yourself it actually supports other devices &#8211; not supporting devices well or not being explicit about which devices have been tested, will not do good for your name. This includes testing a lot of devices (and optimising the kernels for them) &#8211; I see that most target the well-known ones (recent Intel/AMD CPUs and NVidia/AMD GPUs of 2010 and later).</p>
<h1>Claiming the device supports OpenCL</h1>
<p>Claiming the device supports <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> need to follow a few more steps, the so called <a href="http://www.khronos.org/conformance/" target="_blank">Khronos Conformance Process</a>. As you also gain access to any available sample implementations by becoming a member, you might want to do this before implementing the specifications.</p>
<p>First you need to contact Khronos, sign a agreement and pay a fee to enter the process. This is US$ 15 000 for unlimited products per version &#8211; so if <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> 1.3 or 2.0 comes out, you need to pay the fee again, regardless if you have one or ten devices and the number of tests you do (so in other words you can submit unlimited conformance results for that version). Conformance fees for an API cover any number of product submissions using any version of that API up to the paid API Level (some exceptions can apply). From that moment you are allowed to use the name <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, but not the logo for the device. This is handy if you want to tell you are working on it for marketing purposes. Academic members pay US$ 1500 and Khronos members pay US$ 10 000. You can also become a member of Khronos for US$ 10 000 a year, giving you the right to join a working group and voting rights &#8211; but to be clear, you do NOT need to be a member to adopt <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> (or any other Khronos technology).</p>
<p>Second you need to run a suite of tests provided by Khronos. If your drivers work and you send back the results, you can fully use logo and trademark (with small disclaimer). After a peer-review of the results with positive results, you can fully use the <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-logo for your device and are listed on the Khronos website.</p>
<h1>Final words</h1>
<p>What I wanted to show here is what the &#8220;open&#8221; in OpenCL actually means, while showing how accessible the Khronos group is to new adopters of its technologies. Simply put: anybody can join, if you follow the guidelines and pay for the costs. So anybody can compete in the field, bringing more specialised and faster products.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-04-02/supporting-opencl-on-your-own-hardware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Separation of compute, control and transfer</title>
		<link>http://www.streamcomputing.eu/blog/2012-03-21/separation-of-compute-control-and-transfer/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-03-21/separation-of-compute-control-and-transfer/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 16:06:59 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3115</guid>
		<description><![CDATA[A while ago I spoke with Patrick Viry, CEO of Ateji. We shared ideas on GPGPU, OpenCL and programming in general. While talking about the strengths of Ateji PX (a Java-like language for parallel programming), he came with a remark which I found important and interesting: separation of transfer. Separation of focus-areas increase effectiveness but are said to [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3115" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><a href="http://www.exploringnature.org/db/detail.php?dbID=26&amp;detID=2300"><img class="alignright size-medium wp-image-3143" title="tree_parts_poster72" src="http://www.streamcomputing.eu/wp-content/uploads/2012/03/tree_parts_poster72-231x300.jpg" alt="" width="231" height="300" /></a>A while ago I spoke with Patrick Viry, CEO of <a href="http://www.ateji.com/" target="_blank">Ateji</a>. We shared ideas on <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>, <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and programming in general. While talking about the strengths of Ateji PX (a Java-like language for parallel programming), he came with a remark which I found important and interesting: <em>separation of transfer</em>.</p>
<p>Separation of focus-areas increase effectiveness but are said to be for experts only. For example the concepts of loops is well-known to programmers, so that seems to be reason enough it should be the starting-position for any goal concerning repetition. Current lower-level <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym> languages are kernel-host languages and describe what has to be done at one coordinate (or group of coordinates) in the data instead looping over the whole data. What I see is that the idea is getting abandoned in higher level languages instead of turned into a design pattern. I would like to discuss these separations, to see where the focus on higher-level languages on top of these low-level languages could and/or should focus on.</p>
<p>I chose the image of a tree, as you have the food coming through the roots, the transport goes through the trunk and the complex stuff happens in the separate leaves.</p>
<h1><span id="more-3115"></span>Separation of compute</h1>
<p>Where in the concept of looping, one can choose to loop over (i, j) of the input-data or (k, l) of the result-data, in <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> one is forced to do the computations seen from each (k, l) of the result-data (with some exceptions). An important difference is that you cannot have temporary-answers and use them in the next iteration. If iterations are needed, then data can be computated in several steps.</p>
<p>By forcing the programmer to think this way by separating the single computation from the repetition, the code can be optimised and scaled easier. And exactly this is what is abandoned in higher level <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>-languages. You see the low-level languages are all on the left, and most of the higher-level on the right. Exception is ArrayFire, which stays close to the Matlab-concept.</p>
<table>
<tbody>
<tr>
<td align="left"><strong>Node-wise</strong></td>
<td align="left"><strong>Functional</strong></td>
<td align="left"><strong>Iterative &amp; directives</strong></td>
</tr>
<tr>
<td align="left">OpenCL<br />
<acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym><br />
DirectCompute</td>
<td align="left">ArrayFire</td>
<td align="left">C++ AMP<br />
HMPP<br />
Ateji PX<br />
Aparapi<br />
OpenACC</td>
</tr>
</tbody>
</table>
<p>Node-wise (each group of data-elements on which is computed on) has the advantage of scaling, but takes some time to implement by programmers who are used to loops. Functional programming solves a lot, but is not applicable to all kinds of problems &#8211; I do like this approach and it could be a good direction. Unlooping is a research-area very well explored, but it is still not optimal if it comes to scaling &#8211; that&#8217;s why it is still a big research-area.</p>
<p>I think the strength of separating the computation from the rest is undervalued.</p>
<h1>Separation of transfer</h1>
<p>When using GPUs, but also when reading a file from disk, part of the time that the whole operation takes is transferring data. This needs scheduling. Scheduling data-transfer is the most part of the host-code in <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>.</p>
<table>
<tbody>
<tr>
<td align="left"><strong>In host-code</strong></td>
<td align="left"><strong>Explicit</strong></td>
<td align="left"><strong>By compiler</strong></td>
</tr>
<tr>
<td align="left">OpenCL<br />
<acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym><br />
DirectCompute</td>
<td align="left">Ateji PX</td>
<td align="left">C++ AMP<br />
HMPP<br />
Aparapi<br />
OpenACC<br />
ArrayFire</td>
</tr>
</tbody>
</table>
<p><a href="http://www.ateji.com/blog/java-on-gpu-with-ateji-px/" target="_blank">Here</a> is explained how Ateji PX does this explicit transfer-scheduling. The choice for most new higher level languages is to leave it to the compiler to find out, even if explicit transfers could increase time. <em>Please let me know, if it can be done with i.e. <a href="http://developer.nvidia.com/content/openacc-directives-gpus" target="_blank">OpenACC</a>&#8216;s async</em>.</p>
<p>While <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> have room to improve this separation, only Ateji PX did not abandon it. Forcing programmers to explicitly defining this, increases the overall speed. If the programmer does not think about where things happen, then next year hardware with low transfer-speeds will perform best &#8211; this decreases potential growth of many types of dedicated co-processors such as FPGAs.</p>
<h1>Final words</h1>
<p>Even if the old ways of programming have shown their <a href="http://www.extremetech.com/extreme/120353-the-future-of-cpu-scaling-exploring-options-on-the-cutting-edge" target="_blank">scaling</a> <a href="http://www.extremetech.com/computing/116561-the-death-of-cpu-scaling-from-one-core-to-many-and-why-were-still-stuck" target="_blank">limits</a>, most higher level languages which aim to replace <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>, focus on trying to trust the old paradigms. The good things that <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> and <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> brought is exactly this separation of computation and transfer, and these could help us all entering the multi-core era. <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> and <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> could still use a lot of optimisation such as automatic pinning of memory, optimisation of simple kernels without coalescing and manual caching, etc. I invite language and compiler designers to focus on that, instead of making the new way of programming look like what we are already used to.</p>
<p>What do you think? More or less separation to cope with scaling on multi-core processors?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-03-21/separation-of-compute-control-and-transfer/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Evernote for the tweeted PDFs</title>
		<link>http://www.streamcomputing.eu/blog/2012-03-19/evernote-for-the-tweeted-pdfs/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-03-19/evernote-for-the-tweeted-pdfs/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 21:38:06 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Our company]]></category>
		<category><![CDATA[PDF-Monday]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3124</guid>
		<description><![CDATA[Since quite some time StreamComputing has shared many research-papers on GPGPU, OpenCL, CUDA, WebCL and more via twitter. This activity has built up a priceless collection of PDFs, which actually got less easy to structure as the collection grew. Recently Evernote came to the rescue and slowly all these PDFs get archived in a way [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3124" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><img class="alignright size-full wp-image-3126" title="evernote-logo" src="http://www.streamcomputing.eu/wp-content/uploads/2012/03/evernote-logo.jpg" alt="" width="256" height="256" />Since quite some time StreamComputing has shared many research-papers on <acronym title='General Purpose GPU, a common name for programming GPUs for non-graphics purposes'>GPGPU</acronym>, <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>, WebCL and more via twitter. This activity has built up a priceless collection of PDFs, which actually got less easy to structure as the collection grew. Recently <a href="http://www.evernote.com/" target="_blank">Evernote</a> came to the rescue and slowly all these PDFs get archived in a way I can find back those papers concerning my current tasks.</p>
<p>As I noticed quite some people read (or at least favorite, share or download) the tweets/PDFs, I was always looking for ways to be able to share this growing library in a way it did not fall out of the short-term memory of Twitter. Evernote seems to be the best solution for this goal. The idea is that the notebook containing all <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> papers will be <a href="https://support.evernote.com/link/portal/16051/16058/Article/629/An-Introduction-to-Sharing" target="_blank">shared</a> read-only with the subscribers. This means that you can search through related papers easily and have a short description of where it is about without putting time in archiving the PDF. Comments (by me and by people e-mailing me) will help you to determine if you should read it. Tags such as &#8220;paper&#8221;, &#8220;opinion&#8221;, &#8220;poster&#8221;, &#8220;thesis&#8221;, &#8220;fpga&#8221;, &#8220;cuda&#8221;, &#8220;webcl&#8221;, &#8220;memory&#8221;, give you easy ways to filter.</p>
<p>The idea is that in the near future this shared notebook (or a variation on it) will be a paid service (academic and other discounts available). To test its usability for this goal, I seek people who want to give feedback (read: fill in short questionnaires or send spontaneous mails with ideas). Interested? Just fill in the <a href="http://www.streamcomputing.eu/about-us/contact/" target="_blank">contact-form</a>, mentioning &#8220;evernote&#8221;. Testers will get the service for free for at least 6 months. The results of the test I will share in this post.</p>
<p>I will continue posting all PDFs on <a href="http://twitter.com/StreamComputing" target="_blank">twitter</a>, so no pay-wall but a sort of freemium-model. The reason to ask money for the service, is that all hours spent on information-sharing has increased over the past year &#8211; choice is between reducing time spent on sharing, or offering services like this. If you think I should offer this service for free anyway or have other feedback already, please let me know in a comment or via the <a href="http://www.streamcomputing.eu/about-us/contact/" target="_blank">contact-form</a>.</p>
<p><em>Edit: at stage 2, the PDFs will be shared via Dropbox or alike service. Problem is that it is writable for each member and more than 2GB, so only possible for paid accounts. Comment if you have ideas.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-03-19/evernote-for-the-tweeted-pdfs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>StreamComputing flirts with ARM</title>
		<link>http://www.streamcomputing.eu/blog/2012-03-08/streamcomputing-flirts-with-arm/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-03-08/streamcomputing-flirts-with-arm/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 12:26:00 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Our company]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=3091</guid>
		<description><![CDATA[ With the launch of twitter-channel @OpenCLonARM we now officially show a strong interest in ARM for compute. And we are not the only ones, as the twitter already has 80 followers (60 in 1.5 day and 12 retweets of the welcome-message). ARM has made tremendous progress in both technology and market-share. With ARM-64, companies like [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_3091" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><img class="alignright size-medium wp-image-3089" title="OpenCLonARM" src="http://www.streamcomputing.eu/wp-content/uploads/2012/03/OpenCLonARM-300x271.png" alt="" width="300" height="271" /> With the launch of twitter-channel <a href="https://twitter.com/OpenCLonARM" target="_blank">@OpenCLonARM</a> we now officially show a strong interest in ARM for compute. And we are not the only ones, as the twitter already has <span id="followers_openclonarm">80</span> followers (60 in 1.5 day and 12 retweets of the welcome-message).</p>
<p>ARM has made tremendous progress in both technology and market-share. With ARM-64, companies like NVidia (and maybe AMD) in the field, X86 seems to be getting a real competitor. This could happen because since a few years computers are fast enough and are not being replaced by a faster one, but a smaller one (tablet, phone) or extra one. By the rules of the market, current technologies are replaced by the ones that give those other needs. ARM is fast (enough), flexible in design, very cheap, low-power and passively cooled. The biggest obstacle seems to be only getting a standard for a docking-station to connect your mobile, tablet or watch to keyboard, mouse and large screen.</p>
<p><acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> is perfect for ARM, as it gives the computation-power to the intensive computations not already covered by hardware-support. In the world of X86 this interests high performance and big data companies, where on ARM this interests also more. Without the need for <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> you can already watch HD video, with <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> you can encode the video with MP4. This year you will certainly hear more about new possibilities of <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> on ARM.</p>
<p>What do you think. Why does Intel not sell IP to ARM-companies as many technologies could be reused? Could Intel be the next ARM as an IP-seller, or will they stay the defender of X86 for many years to come?</p>
<p><script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script><em>StreamComputing.eu is not affiliated with ARM.</em><br />
<script type="text/javascript">// < ![CDATA[
$(function() {
  $.ajax({
    url: 'http://api.twitter.com/1/users/show.json',
    data: {screen_name: 'openclonarm'},
    dataType: 'jsonp',
    success: function(data) {
      $('#followers_openclonarm').html(data.followers_count);
    }
  });
});
// ]]&gt;</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-03-08/streamcomputing-flirts-with-arm/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>AccelerEyes ArrayFire</title>
		<link>http://www.streamcomputing.eu/blog/2012-03-02/accelereyes-arrayfire/</link>
		<comments>http://www.streamcomputing.eu/blog/2012-03-02/accelereyes-arrayfire/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 13:56:29 +0000</pubDate>
		<dc:creator>Vincent Hindriksen</dc:creator>
				<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://www.streamcomputing.eu/?p=2747</guid>
		<description><![CDATA[There is a lot going on at the path to GPGPU 2.0 &#8211; the libraries on top of OpenCL and/or CUDA. Among many solutions we see for example Microsoft with C++ AMP on top of DirectCompute, NVidia (and more) with OpenACC, and now AccelerEyes (most known for their Matlab-extension Jacket and libJacket) with ArrayFire. I want you to [...]]]></description>
			<content:encoded><![CDATA[<span style="float:right; display:inline;"><a href="http://www.streamcomputing.eu/wp-content/plugins/kalins-pdf-creation-station/kalins_pdf_create.php?singlepost=po_2747" target="_blank" ><img src="http://www.streamcomputing.eu/wp-content/uploads/2011/02/pdf_icon-e1297035459917.png" /></a></span><p><em><img class="alignright size-medium wp-image-3079" title="product_arrayfire" src="http://www.streamcomputing.eu/wp-content/uploads/2012/03/product_arrayfire-259x300.png" alt="" width="259" height="300" /></em>There is a lot going on at the path to <em>GPGPU 2.0</em> &#8211; the libraries on top of <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and/or <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>. Among many solutions we see for example Microsoft with C++ AMP on top of DirectCompute, NVidia (and more) with OpenACC, and now <a href="http://www.accelereyes.com/" target="_blank">AccelerEyes</a> (most known for their Matlab-extension Jacket and libJacket) with <a href="http://www.accelereyes.com/products/arrayfire" target="_blank">ArrayFire</a>.</p>
<p>I want you to show how easy programming GPUs can be when using such libraries &#8211; know that for using all features such as complex numbers, multi-GPU and linear algebra functions, you need to buy the full version. Prices start at <a href="http://www.accelereyes.com/products/arrayfire_licensing">$2500,-</a> for a workstation/server with 2 GPUs.</p>
<p>It comes in two flavours: for <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> (C++) and for <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> (C, C++, Fortran). The code for both is the same, so you can easily switch &#8211; though you still see references to cuda.h you can compile most examples from the <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym>-version using the <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-version with little editing. Let&#8217;s look a little into what it can do.</p>
<h1><span id="more-2747"></span>Getting started</h1>
<p><em>Note. If you use ArrayFire on Linux-64 with AMD, be sure you have at least AMD APP 2.5. Older drivers lock up your computer due to a bug in the AMD driver.</em></p>
<p>Be sure you have <acronym title='A programming-language like OpenCL only for NVIDIA&#039;s GPUs'>CUDA</acronym> 4.0 or 4.1 installed for your NVidia <acronym title='The processor on the videocard'>GPU</acronym>, and <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> 1.1 installed for your Intel and AMD devices. Check <a href="http://www.streamcomputing.eu/blog/2011-12-29/opencl-hardware-support/">here</a> to see how to get this going.</p>
<p>You can download ArrayFire from <a href="http://www.accelereyes.com/download_arrayfire">http://www.accelereyes.com/download_arrayfire</a> after registering. You get both 64 and 32 bit libraries in one package and it installed without any problems. You can start right-away and go to bin32 or bin64, but if you want to recompile it then go the examples-directory and run &#8220;make clean &amp;&amp; make&#8221;.</p>
<p>The compile-line gives an idea what it uses:</p>
<blockquote>
<pre>g++ -I../include -L../lib64 -Wl,-rpath=../lib64 -l<strong>afcl</strong> -L<strong>libOpenCL</strong>.so.1 \
-Llibcl<strong>AmdBlas</strong>.so.1 -Lcl<strong>AmdFft</strong>.Runtime.so.1.4.82 \
../examples/helloworld.cpp -o ../bin64/helloworld</pre>
</blockquote>
<p>You see it bundles the BLAS-library and FFT-library from AMD.</p>
<p>Let&#8217;s test if it installed correctly, by running &#8220;hello-world&#8221;. Results on my current PC:</p>
<blockquote>
<pre>Arrayfire (<acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> alpha)</pre>
<pre>Device0: Barts (in use)
Device1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Device2: GeForce GTX 560</pre>
</blockquote>
<p>The remark &#8220;<em>(in use)</em>&#8221; means the currently selected device for ArrayFire, not that it is in use by another process. If you don&#8217;t get any devices then you have not correctly installed the <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> drivers for your hardware.</p>
<h1>Example: BlackScholes</h1>
<p>BlackScholes is very popular demo-application for <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> and related techniques as it is slow as a single-threaded CPU version and very speedy when using <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-capable devices. I wrote on it <a href="http://www.streamcomputing.eu/blog/2011-11-29/black-scholes-mixing-on-sandybridge-radeon-and-geforce/">before</a> in case you want to check how it looks like in <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>.</p>
<p>Benchmarks for Input Data Size = 184000 x 1:</p>
<ul>
<li>AMD: 1.939260 s</li>
<li>Intel: 0.471669 s (around 0.1686 s in following runs)</li>
<li>NVidia &#8211; OpenCL: <em>TBD</em></li>
<li>NVidia &#8211; CUDA: <em>TBD</em></li>
</ul>
<div><em>(Currently busy with other project, so temporarily no GTX available for this test)</em></div>
<p>You can see the full code in the SDK, so I only focus on a few commands. All is in the namespace &#8220;af&#8221;, but I leave out &#8220;af::&#8221; before the commands. Under every command I describe what it does.</p>
<blockquote>
<pre>device(1);</pre>
</blockquote>
<p>This selects the second device (on this machine the Intel). Nice is that the rest of the initialisation is done when needed, so no real need to be very precise with it. Skipping this line selects device 0.</p>
<blockquote>
<pre>int N = 6;
float C1[] = {5.0f, 6.0f, 7.0f, 8.0f, 1.0f, 10.0f}; //different in code, 1000 times as long
array array1 = array(C1, N, 1);</pre>
</blockquote>
<p>This is all that is needed to produce an array of dimensions 6 x 1. If you would used N/2 and 2 as last two parameters, it would have given an array of 3 x 2 using the same data from C1.</p>
<blockquote>
<pre>for (int i = 0; i &lt; iter; i++) {
   black_scholes(Cg, Pg, Sg, Xg, Rg, Vg, Tg);
}</pre>
</blockquote>
<p>This calls the &#8220;kernel&#8221; several times with arrays prepared as in the previous step. Below is the kernel.</p>
<blockquote>
<pre>void black_scholes(array&amp; C, array&amp; P, array&amp; S, array&amp; X, array&amp; R, array&amp; V, const array&amp; T) {
   array d1_ = log(S / X);
   d1_ = d1_ + (R + (V * V) * 0.5) * T;
   array d1 = d1_ / (V * sqrt(T));
   array d2 = d1 - (V * sqrt(T));
   C = S * cnd(d1) - (X * exp((-R) * T) * cnd(d2));
   P = X * exp((-R) * T) * cnd(-d2) - (S * cnd(-d1));
}</pre>
</blockquote>
<p>Alternatively the code can just be called without separate function:</p>
<blockquote>
<pre>for (int i = 0; i &lt; iter; i++) {
  array d1_ = log(Sg / Xg);
  d1_ = d1_ + (Rg + (Vg * Vg) * 0.5) * Tg;
  array d1 = d1_ / (Vg * sqrt(Tg));</pre>
<pre>  array d2 = d1 - (Vg * sqrt(Tg));</pre>
<pre>  Cg = Sg * cnd(d1) - (Xg * exp((-Rg) * Tg) * cnd(d2));
  Pg = Xg * exp((-Rg) * Tg) * cnd(-d2) - (Sg * cnd(-d1));
}</pre>
</blockquote>
<p>The nice thing of ArryFire is that arrays are overloaded such that computations are off-loaded to the selected compute-device. So &#8220;<em>log(Sg / Xg)</em>&#8221; is translated into optimised <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-code and run on <acronym title='The processor on the videocard'>GPU</acronym> or AVX/SSE. In this case each element of array Sg is devided by the element at the same location in array Xg. Log is called on each element of the resulting array.</p>
<blockquote>
<pre>af::sync();</pre>
</blockquote>
<p>Sync-function forces the compute-device to work.</p>
<p>You see. Once you get to understand the ArrayFire <em>array</em> class then you have all luggage needed to program an <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>-program.</p>
<h1>Advantages &amp; Disadvantages</h1>
<p>I will get more into libraries built on top of <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym>, to explain the common (dis)advantages. Any library has a scope for which it works best and products of AccelerEyes focus much on linear algebra and 2D data (arrays), less on 3D and images.</p>
<p>It is hard to say how much the library squeezes out the GPU(s) for you. But sometimes it works almost as good as manually optimised code and in many cases comparable, which is great as you get results much faster and can focus on your algorithm instead of coding. As a bonus each new version the optimisation gets better.</p>
<p>ArrayFire does not give you the generated kernels, so you depend on their license and libraries. Neither does it provide a fall-back to the CPU. This makes it a good solution for research and in-company deployments, but less for product development.</p>
<p>I am a big fan of splitting device-computations from the host-code. ArrayFire gives you the possibility to mix it again giving less optimisable code. But others will disagree with me.</p>
<p>Currently with version 0.3 it is still in alpha, so if you want to use it in production-code then you need to extensively test it. AccelerEyes has built up its good name in many years and is therefor careful before officially putting the library out of alpha/beta.</p>
<h1>Learning more</h1>
<p>I was happy to discover ArrayFire is so <a href="http://www.accelereyes.com/arrayfire_opencl/" target="_blank">well</a> <a href="http://www.accelereyes.com/arrayfire_opencl/modules.htm" target="_blank">documented</a> and therefore you can accelerate your algorithm within a day. It depends on your exact situation and goals if ArrayFire is the right choice for you. The best is to give it a try and put your algorithm to the test. If you want to discuss the best option for your software, feel free to <a href="http://www.streamcomputing.eu/about-us/contact/">contact</a> us.</p>
<p>StreamComputing has selected ArrayFire as one of the libraries on top of <acronym title='Where our company is all about. A programming-language for GPUs and other massively parallel processors. OpenCL is a trademark of Apple Inc.'>OpenCL</acronym> with potential and therefore will start offering consulting-services for ArrayFire from April 2012.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.streamcomputing.eu/blog/2012-03-02/accelereyes-arrayfire/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

