<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Reasonably Logical</title>
	<link>http://www.philipreames.com/Blog</link>
	<description>Reflections on technology, society, and their infinite interactions</description>
	<pubDate>Tue, 28 Feb 2012 20:04:53 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>
	<language>en</language>
			<item>
		<title>Type systems: coercion, casts, and conversions</title>
		<link>http://www.philipreames.com/Blog/2012/02/28/type-systems-coercion-casts-and-conversions/</link>
		<comments>http://www.philipreames.com/Blog/2012/02/28/type-systems-coercion-casts-and-conversions/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 20:00:48 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[PL Theory/Design]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/02/28/type-systems-coercion-casts-and-conversions/</guid>
		<description><![CDATA[Programming language designers and users spend a lot of time talking about casts.  The core idea of a cast is to convert between two types - either statically, or dynamically.  Reading through a number of sources recently, I&#8217;ve been noticing that the term &#8220;cast&#8221; is massively overloaded.  This blog post is an [...]]]></description>
			<content:encoded><![CDATA[<p>Programming language designers and users spend a lot of time talking about casts.  The core idea of a cast is to convert between two types - either statically, or dynamically.  Reading through a number of sources recently, I&#8217;ve been noticing that the term &#8220;cast&#8221; is massively overloaded.  This blog post is an attempt to break down the various uses I&#8217;ve seen into their component parts.  </p>
<p>Before jumping into the discussion of type casts, conversions, and coercions, let me remind you that every value has at least two types associated with it.  The dynamic type is the actual type of the value at execution time.  The static type(s) is(are) an approximation of that type available at compile time.  Generally, the static type of a value must be accurate, but need not be precise.  (i.e. It&#8217;s perfect legal and common to refer to a value by a base-type pointer.)</p>
<p>A <strong>type conversion</strong> is a programmatic way to convert a value from one type to another.  Depending on the language involved, this may involving copying the contents, or applying arbitrary user defined conversion logic.  The core part though is that the <em>value is changing</em>, not merely the type associated with that value.  </p>
<p>A <strong>type cast</strong> is the replacement of one static type with another without changing the actual value. Generally, the type cast to is assumed to be accurate (if potentially less precise) representation of the actual type of the value.  A type cast may be checked or unchecked depending on the semantics of the language.  For <strong>checked casts</strong>, if a type cast fails the program executes defined error behavior such as aborting, or throwing an exception.  For <strong>unchecked casts</strong>, if the type cast fails the program is left in an undefined state and no further guarantees are given.  </p>
<p>A <strong>type coercion</strong> is the forceful reinterpretation of memory as a value of another type.  Arguable, such coercions are also unchecked type casts, but the semantics are slightly different.  A type coercion relies on the structure of the two types and not on their nominal relationship.  To put it another way, a type coercion <em>is expected to violate the type system</em>.  Some languages do provide minimal checking for type coercion, but generally, this is a use at your own (extreme) risk feature.   If a value is converted to an incompatible type - whatever that might mean - behavior is generally ill-defined.  </p>
<p>Each of the above can be either explicit or implicit.  <strong>Explicit (casts, conversions, coercions)</strong> require explicit annotation from the programmer.  They do not happen silently, and must appear directly in code.  <strong>Implicit (casts, conversions, coercions)</strong> are inserted by the compiler based on the language semantics.  C and C++ <a href="http://en.cppreference.com/w/cpp/language/implicit_cast">implicit numeric conversion</a> are a well known example of the later.  As a matter of personal opinion, I think implicit (cast, conversions, coercions) are a horrible mistake in any language design.  </p>
<p>To put all of this terminology in the context of a language you may know, let&#8217;s consider the <a href="http://www.cplusplus.com/doc/tutorial/typecasting/">&#8220;casts&#8221; available in C++</a>.  <strong>C-style casts</strong> in C++ are generally type coercions, though they may act like type conversions if a cast operator is available.  <strong>static_cast</strong> is a mixture of type casting and conversion; it will only convert between types which are known statically to be compatible or convertible via a user-defined cast operator.  <strong>dynamic_cast</strong> is a checked type cast.  It&#8217;s semantics are well defined if the dynamic type of the value doesn&#8217;t match the cast.  <strong>const_cast</strong> is a restricted form of type coercion which only applies to type qualifiers (const, volatile).  <strong>reinterpret_cast</strong> is a full power unchecked type coercion.  </p>
<p>If you&#8217;re interested in type systems, you may find <a href="http://www.philipreames.com/Blog/2011/11/01/understanding-type-systems/">my earlier blog post</a> interesting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/02/28/type-systems-coercion-casts-and-conversions/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Typesetting formal semantics in LaTeX</title>
		<link>http://www.philipreames.com/Blog/2012/02/13/typesetting-formal-semantics-in-latex/</link>
		<comments>http://www.philipreames.com/Blog/2012/02/13/typesetting-formal-semantics-in-latex/#comments</comments>
		<pubDate>Mon, 13 Feb 2012 23:12:57 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[PL Theory/Design]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/02/13/typesetting-formal-semantics-in-latex/</guid>
		<description><![CDATA[For my programming language theory and design course, I&#8217;ve recently been writing up a series of formal proofs in LaTeX for operational and denotational language semantics.  (Axiomatic semantics are up next.)  I can&#8217;t claim to particularly like LaTeX as a tool, but it seems to be significantly better than anything else out there. [...]]]></description>
			<content:encoded><![CDATA[<p>For my <a href="https://sites.google.com/a/cs.berkeley.edu/cs263-sp12/info">programming language theory and design course</a>, I&#8217;ve recently been writing up a series of formal proofs in LaTeX for operational and denotational language semantics.  (Axiomatic semantics are up next.)  I can&#8217;t claim to particularly like LaTeX as a tool, but it seems to be significantly better than anything else out there.  </p>
<p>I couldn&#8217;t find a ready made Latex package, so I figured I&#8217;d share my solution.  It&#8217;s a combination of a few custom macros and a general proof style file.  </p>
<p>First, the macros:<br />
\newcommand{\denote}[1]{\text{$[\![ $#1$ ]\!]$}}<br />
\newcommand{\opsem}[3]{\text{$<$#1,#2$>\Downarrow$#3}}</p>
<p>The first one wraps its single argument in the denotation symbols.  The second takes three arguments and generates the full <op,state>||state expression.  </p>
<p>(In case you haven&#8217;t noticed, I&#8217;m approximating symbols for this post that show up much better in LaTeX!)</p>
<p>For the formal judgments, I used the <a href="http://www.math.ucsd.edu/~sbuss/ResearchWeb/bussproofs/index.html">&#8220;bussproofs&#8221; package</a>.  (\usepackage{bussproofs})  Once you get your head around it&#8217;s post-order handling of arguments, it seems to work pretty well.  One thing to note is that the error handling recovery for this package is <em>awful</em>.  If you have the slightest mistake, it silently omits elements from the proof.  Make sure you hand check your results!</p>
<p>Update (Feb 24, 2012): Here&#8217;s the macros I created for Hoare triples and axiomatic semantics:<br />
\newcommand{\hoarepred}[1]{\text{ $\{$#1$\}$ }}<br />
\newcommand{\hoarerule}[3]{\text{\hoarepred{#1}#2\hoarepred{#3}}}</p>
<p>If you&#8217;re looking for various extra symbols used in formal proofs, you might find <a href="http://www.artofproblemsolving.com/Wiki/index.php/LaTeX:Symbols">this page</a> useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/02/13/typesetting-formal-semantics-in-latex/feed/</wfw:commentRss>
		</item>
		<item>
		<title>OpenCL Performance Gotcha w/Atomics (in global memory)</title>
		<link>http://www.philipreames.com/Blog/2012/02/09/opencl-performance-gotcha-watomics-in-global-memory/</link>
		<comments>http://www.philipreames.com/Blog/2012/02/09/opencl-performance-gotcha-watomics-in-global-memory/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 05:14:56 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/02/09/opencl-performance-gotcha-watomics-in-global-memory/</guid>
		<description><![CDATA[One of the lesser documented facts about the OpenCL global memory atomics on AMD platforms, is they force all memory accesses in the entire program onto the complete (i.e, slow) path.  Having even a single atomic in your program has this effect.  
In our program - which we&#8217;re developing for a research project [...]]]></description>
			<content:encoded><![CDATA[<p>One of the lesser documented facts about the OpenCL global memory atomics on AMD platforms, is they force all memory accesses in the entire program onto the complete (i.e, slow) path.  Having even a single atomic in your program has this effect.  </p>
<p>In our program - which we&#8217;re developing for a research project I hope to discuss here by March - , we saw <strong>a 50% plus</strong> performance gain by removing atomic memory ops.  (We were using them to coordinate between workgroups.) </p>
<p>This fact is mentioned in the <a href="http://developer.amd.com/sdks/AMDAPPSDK/ assets/AMD_Accelerated_Parallel_Processing_OpenCL_ Programming_Guide.pdf">AMD OpenCL Programming Guide</a>[1], but is not featured prominently.  There&#8217;s some mention there of being able to mark different memory regions as independent UAVs (which supposedly avoids this), but I haven&#8217;t found information on how to do that just yet.  </p>
<p>There&#8217;s an <a href="http://synergy.cs.vt.edu/pubs/papers/elteir-ieeecluster11-atomic-operations.pdf">interesting paper</a>[2] which includes some fairly decent benchmarking of various cases.  They also provide a software implementation of atomics for memory bound programs.  (Note: Do NOT use this blindly.  Read the discussion about when this is a good idea and not.)  I found the paper to bit a bit weak overall, but it does provide useful information if you&#8217;ve run into this particular problem.  </p>
<p>There&#8217;s also an OpenCL <a href="http://www.khronos.org/registry/cl/extensions/ext/cl_ext_atomic_counters_32.txt">extension</a>[3] from AMD for coordinating workgroups on a single device which apparently doesn&#8217;t have this problem.  It can&#8217;t be used for coordinating across devices, but within one device, it may solve the problem.  There&#8217;s a few catches though; the most obvious one is that - according to the documentation - you can&#8217;t directly read the counter.  </p>
<p>References:</p>
<ol>
<li>AMD. Amd accelerated parallel processing (app) sdk opencl programming<br />
guide. <a href="http://developer.amd.com/sdks/AMDAPPSDK/ assets/AMD_Accelerated_Parallel_Processing_OpenCL_ Programming_Guide.pdf">http://developer.amd.com/sdks/AMDAPPSDK/<br />
assets/AMD_Accelerated_Parallel_Processing_OpenCL_<br />
Programming_Guide.pdf</a></li>
<li>Elteir, M.;  Heshan Lin;   Wu-Chun Feng Performance Characterization and Optimization of Atomic Operations on AMD GPUs, 2011 IEEE International Conference on Cluster Computing (CLUSTER). <a href="http://synergy.cs.vt.edu/pubs/papers/elteir-ieeecluster11-atomic-operations.pdf">http://synergy.cs.vt.edu/pubs/papers/elteir-ieeecluster11-atomic-operations.pdf</a></li>
<li>cl_ext_atomic_counters_32.<a href=" http://www.khronos.org/registry/cl/extensions/ext/cl_ext_atomic_counters_32.txt"> http://www.khronos.org/registry/cl/extensions/ext/cl_ext_atomic_counters_32.txt</li>
</ol>
<p></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/02/09/opencl-performance-gotcha-watomics-in-global-memory/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Coding Best Practice: 5 Whys applied to bug fixing</title>
		<link>http://www.philipreames.com/Blog/2012/01/22/coding-best-practice-5-whys-applied-to-bug-fixing/</link>
		<comments>http://www.philipreames.com/Blog/2012/01/22/coding-best-practice-5-whys-applied-to-bug-fixing/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 21:44:01 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/01/22/coding-best-practice-5-whys-applied-to-bug-fixing/</guid>
		<description><![CDATA[When you find a bug, fixing it is not enough.  You should take a moment or two to think about how the bug was able to occur.  Frequently, a single bug report (particularly in more complex systems) hints at a number of underlying issues.  
I generally try to find at least three [...]]]></description>
			<content:encoded><![CDATA[<p>When you find a bug, fixing it is not enough.  You should take a moment or two to think about how the bug was able to occur.  Frequently, a single bug report (particularly in more complex systems) hints at a number of underlying issues.  </p>
<p>I generally try to find at least three independent fixes to ensure that a given bug couldn&#8217;t occur again.  You could think of this as <a href="http://en.wikipedia.org/wiki/Defense_in_Depth_%28computing%29">defense in depth</a> or <a href="http://en.wikipedia.org/wiki/Defensive_programming">defensive programming</a>, but my original inspiration to apply this came from the <a href="http://en.wikipedia.org/wiki/5_Whys">5-whys principle</a>.  In my case, I use 3, but the exact number isn&#8217;t the important part.  </p>
<p>Good questions to reflect on:<br />
1) Was there an earlier point in code where we could have noticed something was going wrong?  Can I improve the error reporting anywhere along the call trace?<br />
2) Can I refactor the interface or code to make this class of mistakes less likely?  Can I better document what the interface is?<br />
3) Is there another way I could hit this same error case?  If so, can I quickly (with low risk) fix that one too?<br />
4) What test could I write which would find this error?  (Is it worth the time?)<br />
5) Are there simple code changes which would have made debugging this much faster?  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/01/22/coding-best-practice-5-whys-applied-to-bug-fixing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Coding Best Practice: The role of comments in source code</title>
		<link>http://www.philipreames.com/Blog/2012/01/16/coding-best-practice-the-role-of-comments-in-source-code/</link>
		<comments>http://www.philipreames.com/Blog/2012/01/16/coding-best-practice-the-role-of-comments-in-source-code/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 18:44:34 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/01/16/coding-best-practice-the-role-of-comments-in-source-code/</guid>
		<description><![CDATA[The purpose of a comment in source code is to document why you are doing something or to provide a quick summary of a non-obvious algorithm.  It is not to describe what you are doing.  Generally, a skilled programmer - i.e. hopefully your peers - can figure out what you&#8217;re doing from code [...]]]></description>
			<content:encoded><![CDATA[<p>The purpose of a comment in source code is to document <em>why</em> you are doing something or to provide a <em>quick summary</em> of a non-obvious algorithm.  It is not to describe <em>what</em> you are doing.  Generally, a skilled programmer - i.e. hopefully your peers - can figure out<em> what</em> you&#8217;re doing from code just fine.  The often confusing part is <em>why</em>.  (i.e. &#8220;Is this intentional or a bug?  If it&#8217;s intentional, why is it needed?&#8221;.  </p>
<p><strong>Good example comments:</strong><br />
&#8220;An implementation of merge sort.  See http://en.wikipedia.org/wiki/Merge_sort for an overview.&#8221;</p>
<p>&#8220;There are two obvious implementations here:<br />
a) describe<br />
b) describe<br />
Benchmarking (using the test cases in tests/mybench/*.cxx) shows that using option 1 faster by ~15%. &#8221;</p>
<p><strong>Bad example comments:</strong></p>
<p>&#8220;This is broken. (with no explanation or testcase)&#8221;<br />
(some complicated bit of code here)</p>
<p>&#8220;Sorting an array&#8221;<br />
std::sort(vec.begin(), vec.end());</p>
<p>&#8220;(none)&#8221;<br />
(massive block of hard to read code here)</p>
<p>&#8220;(none)&#8221;<br />
(edge case or tricky non-obvious behavior)</p>
<p>If you find yourself writing lots of comments (or none), you&#8217;re probably &#8220;doing it wrong&#8221;.  Go read up on <a href="http://en.wikipedia.org/wiki/Self-documenting">self-documenting code</a> and start practicing.  Remember that self documenting code isn&#8217;t writing code without comments; it&#8217;s writing code where the comments are executable code themselves.  </p>
<p>Please note: Nothing above should be read to discourage the use of function documentation.  That&#8217;s a separate topic which I may mention later.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/01/16/coding-best-practice-the-role-of-comments-in-source-code/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Introduction to extending Clang/LLVM</title>
		<link>http://www.philipreames.com/Blog/2012/01/10/introduction-to-extending-clangllvm/</link>
		<comments>http://www.philipreames.com/Blog/2012/01/10/introduction-to-extending-clangllvm/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 19:45:17 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[PL Theory/Design]]></category>

		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/01/10/introduction-to-extending-clangllvm/</guid>
		<description><![CDATA[I&#8217;ve spent the first part of today watching some of the videos from the LLVM Dev Meeting that occurred back in November.  (I really wish I&#8217;d been able to attend!)  The first talk I watched was the Extending Clang talk by Doug Gregor with Apple.  If you&#8217;re thinking about playing around with [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve spent the first part of today watching some of the videos from the <a href="http://llvm.org/devmtg/2011-11/">LLVM Dev Meeting that occurred back in November</a>.  (I really wish I&#8217;d been able to attend!)  The first talk I watched was the <a href="http://llvm.org/devmtg/2011-11/#talk4">Extending Clang talk by Doug Gregor with Apple</a>.  If you&#8217;re thinking about playing around with Clang, I strongly suggest you watch this video.  I&#8217;ve spent the last few months hacking on clang for a language extension I&#8217;m working on and this was by far the best introduction I&#8217;ve seen.  I really wish I&#8217;d come across this before I spent hours learning it myself.  <img src='http://www.philipreames.com/Blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Other useful links:</p>
<ul>
<li><a href="http://clang.llvm.org/docs/InternalsManual.html">Clang Internals Manual</a> (a great reference for folks looking to extend clang)</li>
<li><a href="http://clang.llvm.org/docs/InternalsManual.html#AddingAttributes">How to add custom attributes </a>(from the above)</li>
<li><a href="http://blog.llvm.org">The LLVM Project Blog </a>- which is a good way to track major project starts/</li>
<li><a href="http://llvm.org/docs/LangRef.html">LLVM Reference Manual</a> - Authoritative documentation on the LLVM language</li>
<li>Some useful docs on the <a href="http://llvm.org/docs/LinkTimeOptimization.html">Link Time Optimization project</a> (LTO)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/01/10/introduction-to-extending-clangllvm/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Reflections from a crash course in OpenCL</title>
		<link>http://www.philipreames.com/Blog/2012/01/01/reflections-from-a-crash-course-in-opencl/</link>
		<comments>http://www.philipreames.com/Blog/2012/01/01/reflections-from-a-crash-course-in-opencl/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 04:21:19 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2012/01/01/reflections-from-a-crash-course-in-opencl/</guid>
		<description><![CDATA[Over the last few months, I&#8217;ve had an opportunity to spend some time playing with OpenCL.  In short, we&#8217;re trying to use a GPU to accelerate garbage collection for Java.   (Once the work is published, I&#8217;ll post more here.)  We&#8217;ve implemented a simple graph traversal algorithm on an AMD chip using [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last few months, I&#8217;ve had an opportunity to spend some time playing with OpenCL.  In short, we&#8217;re trying to use a GPU to accelerate garbage collection for Java.   (Once the work is published, I&#8217;ll post more here.)  We&#8217;ve implemented a simple graph traversal algorithm on an AMD chip using OpenCL.  This article doesn&#8217;t talk about that effort directly, but instead focuses on a few of the lessons we learned the hard way while getting up to speed on OpenCL.  (So I remember them for next time!)</p>
<p>This has been a group effort, but the content, opinions, and mistakes herein are all my own.   </p>
<p><strong>Stability &#038; Dev Environment</strong></p>
<p>The first and most important lesson we learned was that <strong>each developer needs a dedicated test machine</strong> which is <em>not</em> their primary development box.  This box needs to be local.  When debugging OpenCL programs on real hardware, it is <em>shockingly</em> easy to lock up the entire box.  On multiple occasions, we had to perform hard power cycles on our test machine to get it into a usable state.  </p>
<p>Even when the box didn&#8217;t lock up entirely, a crashed program with a OpenCL kernel outstanding has a bad tendency to prevent future kernels from being executed.  Supposedly, there should be a time out that will terminate a run away program, but we never saw this happen in practice. Instead, we ended up rebooting the box quite frequently.   </p>
<p>In a related vein, we quickly started replacing every while-loop with a for-loop (over a large, but <em>fixed </em>number of iterations).  This allows you to (sometimes) recover from what would otherwise have been an infinitely loop without rebooting the box.  </p>
<p>Another important note is that the documentation available from <a href="http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/">Khronos</a> is a best incomplete and in a couple cases potentially wrong.  Many of the function descriptions don&#8217;t provide relevant details about usage and none of them provide useful examples.  (Can can get some of the latter from the <a href="http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx">AMD</a> and <a href="http://developer.nvidia.com/opencl-sdk-code-samples">NVIDIA</a> SDKs.)  I strongly suggest searching Google for examples before taking the documentation at its word.  </p>
<p>OpenCL does not appear to support a mechanism to forceably abort a kernel.  Nor does it support an assertion mechanism.  Nor does it have any form of debug logging (i.e. printf or the like.)  The only way to exit a kernel function is to return from the <em>main </em>kernel function <em>with all threads.</em>  Unfortunately, this means that error reporting - even for cases where you can easily tell what happened - is extremely hard.  I don&#8217;t have a great solution.  We ended up writing data into global memory - so the CPU could access it after termination - and then trying to exit cleanly.  This worked sometimes, but was error prone to say the least.  </p>
<p>I haven&#8217;t played with the various <a href="http://developer.amd.com/tools/gDEBugger/Pages/default.aspx">debuggers </a>and <a href="http://code.google.com/p/ocl-emu/">emulators</a> available, but I suspect that would help greatly in debugging.  </p>
<p><strong>Synchronization</strong></p>
<p><strong>OpenCL has different synchronization models for threads within a workgroup vs across workgroups on the same device.  </strong>As far as I can tell, <strong>there is <em>no </em>synchronization available between kernels running on different devices </strong>on the same machine.  (You can use the CPU to coordinate starting and stopping kernels of course.)  </p>
<p><strong><a href="http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/barrier.html">Barriers</a> apply only to threads within a single workgroup. </strong> The CLK_LOCAL/GLOBAL_MEM_FENCE parameters enforce memory consistency within a single workgroup, not across workgroups.  Note that you can have a barrier - where all threads stop - but not have a consistent view of memory if you don&#8217;t pass the appropriate flags.  </p>
<p><strong><em>ALL </em>threads within a workgroup must encounter the <em>same </em>barrier</strong>.  If even a single thread does not, the program will hang indefinitely.  (And require a hard reboot of the machine.)  This is unpleasant to debug to say the least.  </p>
<p><strong>Atomic operations are the only way to synchronize between workgroups. </strong> To avoid memory contention (and thus serialization of requests), you probably want only a single thread per workgroup to execute the atomic operation.  Doing this requires an additional synchronization (using a barrier within the workgroup and a temporary local memory value) to get all threads within a workgroup consistent.  </p>
<p><strong>Be careful about which versions of the atomic functions you use.</strong>  OpenCL provides 32 bit vs 64 bit and local vs shared memory versions.  The ones we used - which unfortunately are extensions not part of the language, but thankfully seem pretty common - were <a href="http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/cl_khr_int64_base_atomics.html">cl_khr_int64_base_atomics</a> and <a href="http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/cl_khr_int64_extended_atomics.html">cl_khr_int64_extended_atomics</a>.  I&#8217;ve read some reports that the atomic_op functions don&#8217;t function the same as the atom_op versions.  I can&#8217;t find confirmation of this in the documentation, but we used the atom_op versions just in case.  Another gotcha is that some cards apparently don&#8217;t support the local versions.  Check your documentation carefully since by some reports the functions will simply fail silently.  </p>
<p>Note that it is unclear whether the atomic operations on global memory are visible by the CPU, different GPUs, or merely different workgroups on the same device.  I haven&#8217;t spent much time digging through the documentation, but if this matters to you, check!  The one thing that is clear from the documentation is that atomic operations executed by different GPUs on a shared address are <em>not guaranteed to be atomic</em>!</p>
<p><strong>Infrastructure</strong></p>
<p>To get good performance - even just to minimize testing time - you should probably be using precompiled files.  (Note: These are not binary files and can not be moved between machines.  They are purely a caching mechanism.)  You&#8217;ll need a mechanism - hash, command line parameter, build system, etc.. - to make sure your cached files stay in sync with your source code of course.  </p>
<p>Having a separate program which sanity checks your files - i.e. part of your build system - will save you time in the long run.  If I get time, I&#8217;ll clean the hacky mess I&#8217;ve been using and post it here.  </p>
<p><strong>Generally, the best way to get data from the CPU to the GPU (at least on our setup) is to use CL_MEM_USE_HOST_PTR. </strong> There seems to be a lot of confusion on exactly what this does, the top Google results appear inaccurate, and the <a href="http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clCreateBuffer.html">documentation</a> isn&#8217;t super clear, but some micro benchmarks gave much better results than for either of the other two options.  (As always, you can not assume that the CPU and GPU have consistent views of this data or that it&#8217;ll be mapped to the same address on both platforms.  All synchronization with the GPU kernels has to be explicit.)  It&#8217;s also unclear to me if OpenCL is required to copy the data back into the host memory after termination or that region can be entirely stale.  That wasn&#8217;t important for our case, so I never tested it.  The documentation is unclear.  The best discussion I&#8217;ve seen is <a href="http://www.khronos.org/message_boards/viewtopic.php?f=41&#038;t=3226">here</a>, but even that&#8217;s somewhat unclear on the finer points.  </p>
<p>Depending on what you&#8217;re doing, you may find some of the various utility libraries useful - <a href="http://www.browndeertechnology.com/coprthr_stdcl.htm">COPRTHR: STDCL</a>, <a href="http://www.bigncomputing.org/Big_N_Computing/Big_N_Computing/Entries/2010/2/22_Small_Brick,_Big_%E2%80%98N%E2%80%99.html">SOCL</a>, or oclUtils from the NIVIDIA SDK.  The only one of these I&#8217;ve used is the oclUitls files which were moderately useful.  </p>
<p><strong>Conclusion</strong></p>
<p>I hope this was useful to you.  If you have corrections, or suggestions, please feel free to <a href="http://www.philipreames.com/#contact">contact me</a>.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2012/01/01/reflections-from-a-crash-course-in-opencl/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Principles of Good Programming</title>
		<link>http://www.philipreames.com/Blog/2011/12/02/principals-of-good-programming/</link>
		<comments>http://www.philipreames.com/Blog/2011/12/02/principals-of-good-programming/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 21:36:58 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2011/12/02/principals-of-good-programming/</guid>
		<description><![CDATA[I&#8217;d been planning to write a post for a while now on what I consider core principals of programming, but instead I found someone who said most of what I would.  Rather than repeat what&#8217;s already been said, I&#8217;ll just recommend you go read Christopher Diggins list.  
http://www.artima.com/weblogs/viewpost.jsp?thread=331531
The one&#8217;s I personally rate most [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d been planning to write a post for a while now on what I consider core principals of programming, but instead I found someone who said most of what I would.  Rather than repeat what&#8217;s already been said, I&#8217;ll just recommend you go read Christopher Diggins list.  </p>
<p><a href="http://www.artima.com/weblogs/viewpost.jsp?thread=331531">http://www.artima.com/weblogs/viewpost.jsp?thread=331531</a></p>
<p>The one&#8217;s I personally rate most important are: Write Code for the Maintainer, and Embrace Change.  I see quite a few others on his list as being subitems of the first.  For example, KISS, Avoid Premature Optimization, Don’t make me think, Principle of least astonishment, Single Responsibility Principle, and Hide Implementation Details are all about making sure the reader of the code can scan quickly and not get bogged down.  This is extremely important if any large code base is going to be maintained over the long haul.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2011/12/02/principals-of-good-programming/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Understanding Type Systems</title>
		<link>http://www.philipreames.com/Blog/2011/11/01/understanding-type-systems/</link>
		<comments>http://www.philipreames.com/Blog/2011/11/01/understanding-type-systems/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 05:00:09 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[PL Theory/Design]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2011/11/01/understanding-type-systems/</guid>
		<description><![CDATA[The purpose of this post is to briefly summarize a few important terms that I&#8217;ve seen thrown around for type systems and try to clarify what they mean.  Usually, I&#8217;d go to Wikipedia for such a thing, but the article on type systems is ill organized and hard to understand.  I&#8217;ve spent the [...]]]></description>
			<content:encoded><![CDATA[<p>The purpose of this post is to briefly summarize a few important terms that I&#8217;ve seen thrown around for type systems and try to clarify what they mean.  Usually, I&#8217;d go to Wikipedia for such a thing, but the <a href="http://en.wikipedia.org/wiki/Type_system">article on type systems</a> is ill organized and hard to understand.  I&#8217;ve spent the last few weeks reading papers on type systems, and I still don&#8217;t understand it!</p>
<p>I&#8217;ve chosen to organize this in terms of several orthogonal axises of typing systems.  This organization is mostly my own, but I&#8217;m freely stealing the best ideas from the papers I&#8217;ve read as well. </p>
<p><strong>Typed vs Untyped</strong> - Typing is an organization of data which classifies fields or records into distinct groups.  These groups can be either predefined or user defined.  A language is typed if there exists a feature which helps to express such typing or if such a distinction is implicit in the language specification.  Note that a language does not need to check or enforce this semantic organization in any way to be typed. </p>
<p>By this definition, even something like assembly language is typed for some aspects.  On most ISAs, there are distinct instructions for operating on floating point numbers and integers.  On the other hand, not all ISAs define separate instructions for operating on signed vs unsigned integers.  This highlights the important point that a language can be typed with respect to some attribute and not typed with respect to another. </p>
<p><strong>Strong vs Weak</strong> - The strength of the typing is essentially just how easy it is to get around.  A strongly typed language has a type system which can not be avoided.  A weakly typed system is one that is more of a suggestion than an enforced rule.  Note that the strength of the typing system says nothing about when the enforcement may happen.  (We&#8217;ll get to that in a second.) </p>
<p>In practice, every language I know of is somewhere in the middle.  A language may be closer to strongly typed or more weakly typed, but it&#8217;s a matter of degree.  As with typing itself, a language can also be more strongly (or weakly) typed <em>with respect to a particular attribute</em>.  </p>
<p><strong>Static vs Dynamic</strong> - Expresses <em>when </em>the type is checked.  A statically checked language is checked at compile time.  A dynamically checked language is checked at execution time.  There&#8217;s been much debate as to which is preferable over the years, but the basic arguments come down to safety &#038; performance (static) vs ease of use &#038; expressiveness (dynamic). </p>
<p>The term <strong>hybrid typing</strong> is an acknowledgment that most practical languages are both statically and dynamically typed.  While the terminology has only entered the academic literature in the last few years, it&#8217;s been around in practice for much longer.</p>
<p><strong>Gradual typing</strong> is another recent invention that explicitly merges static and dynamic typing.  The basic idea is that a program can be written in a dynamic style and moved to a fully statically checked version incrementally.  In terms of real languages, the best example I&#8217;ve seen is <a href="http://cython.org/">Cython</a>.  Cython focuses on incremental performance, but incremental safety and changeability are also reasons to consider gradually typed systems.  Expect a full article on gradual typing in the not too distant future; I&#8217;ve been quite engrossed with the idea. </p>
<p><strong>Nominal vs Structural</strong> - Another important distinction between typing systems is how they define equivalence.  A nominal system defines it by the name of a type.  A structural system defines it by the actual interface and field layout of the respective types. </p>
<p>As a side note, the same language may define equivalence differently for varying stages of validation and/or execution.  One common optimization performed in nominally typed languages is to combine the implementations of structurally equivalent types during code generation.  Some languages also expose this distinction in their syntax. </p>
<p>A <strong>duck type</strong> system is an odd extension of a structural type system which only considers the structural equivalence at point of use.  In particular, two different paths through the same function may have different required interfaces (and thus types.)  You could also think of duck typing as a extreme form of dependent typing which includes conditions with runtime values. </p>
<p>A related classification is the rules the system defines for when substituting one type for another is legal.  In short, a <strong>strictly classic type</strong> system allows no substitution, a <strong>subtype type</strong> system allows substitution along lines of inheritance, and a <strong>dependent type system</strong> allows substitution if the dependent conditions are met.  Given that this is a highly complex topic which I&#8217;m not sure I fully understand yet, I plan a future post to discuss this separately.  For now, don&#8217;t worry to much if this paragraph didn&#8217;t make a lot of sense. </p>
<p>I also plan on drilling into type conversion in detail.  It has important implications both with regards to practical usability of any type system and their theoretical design.  </p>
<p>Update (Feb 28, 2012) - A follow up post on <a href="http://www.philipreames.com/Blog/2012/02/28/type-systems-coercion-casts-and-conversions/">type conversions, casts, and conversions</a> has been posted.  It doesn&#8217;t address more complicated type systems, but does cover the basics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2011/11/01/understanding-type-systems/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Coding Best Practice: Make your assumptions explicit</title>
		<link>http://www.philipreames.com/Blog/2011/10/19/coding-best-practice-make-your-assumptions-explicit/</link>
		<comments>http://www.philipreames.com/Blog/2011/10/19/coding-best-practice-make-your-assumptions-explicit/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 02:31:25 +0000</pubDate>
		<dc:creator>reames</dc:creator>
		
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.philipreames.com/Blog/2011/10/19/coding-best-practice-make-your-assumptions-explicit/</guid>
		<description><![CDATA[This is a topic I&#8217;m going to be expanding on in future posts, but for now let&#8217;s cover the basics.
Assertions are your single best tool for software reliability.  Why?  Unlike tests, you can write assertions which check any property of your system as it runs. Tests can only check the results of a [...]]]></description>
			<content:encoded><![CDATA[<p>This is a topic I&#8217;m going to be expanding on in future posts, but for now let&#8217;s cover the basics.</p>
<p>Assertions are your single best tool for software reliability.  Why?  Unlike tests, you can write assertions which check any property of your system as it runs. Tests can only check the results of a given execution.</p>
<p>Assertions serve several purposes:</p>
<ol>
<li><strong>Checking your assumptions</strong> - If you write new code that does something you don&#8217;t expect, you&#8217;ll find out on first execution that violates your assumptions.  Ideally, you&#8217;ll be running your code in the debugger at the point this happens and can immediate inspect the entire state around the failure.  </li>
<li><strong>Enforceable documentation of intent and expectations</strong>.  If you&#8217;re working with a team of folks and someone uses a library in a way you didn&#8217;t expect, asserts will tell them this immediately.  You should still document <em>why</em> your asserts are there mind you.  </li>
<li><strong>Fault isolation. </strong> If you&#8217;ve written good assertions, your error statement will be something like &#8220;global state does not comply with expectations&#8221; right after your update function.  This is much easier to debug than noticing a corrupt output thousands of operations later.</li>
<li><strong>Preventing corruption.  </strong>If you&#8217;re using an assertion package which calls abort or triggers an exceptions, you don&#8217;t need to worry about the line following the assertion running if the assertion has been violated.  This simplifies error handling immensely.</li>
<li><strong>Performance.  </strong>Depending on your compiler, it may be able to take advantage of your assertions to optimize the code that follows.  If the compiler &#8220;knows&#8221; - because you told it - that an loop iteration count must be a multiple of four, it can unroll and generate much more efficient code.</li>
</ol>
<p>The downsides of assertions - as implemented in C with <cassert> at least - are that they are extra code which executes at runtime.  Some of your assertions will be pruned by the compiler, but most will remain.  As such, if you add an assertion in the &#8220;wrong&#8221; spot - for example inside a tight loop - you can slow your program down quite a bit.</p>
<p>Before you panic, remember a few classic quotes about optimization:</p>
<ul>
<li>&#8220;More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity.&#8221; — W.A. Wulf</li>
<li>&#8220;We should forget about small efficiencies, say about 97% of the time: <strong>premature optimization is the root of all evil</strong>. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only <strong>after that code has been identified</strong>&#8220;[5] — Donald Knuth</li>
<li>&#8220;Bottlenecks occur in surprising places, so don&#8217;t try to second guess and put in a speed hack until you have proven that&#8217;s where the bottleneck is.&#8221; — Rob Pike</li>
</ul>
<p>(Credit: <a href="http://en.wikipedia.org/wiki/Program_optimization#Quotes">Wikipedia</a>, emphasis mine)</p>
<p>You should write your assertions, and only remove (or restructure) those that your profiler tells you are actually at issue.  The tiny amount of performance you might gain by omitting them is not worth hours spent debugging or (more importantly) the lower quality software that would result.  </p>
<p>As a side note: There&#8217;s plenty of ongoing research out there trying to either prove/disprove assertions and/or prune redundant assertions.  Expect your compiler to get substantially better over the next few years about giving compiler time errors on assertion violations and pruning unnecessary assertions before runtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.philipreames.com/Blog/2011/10/19/coding-best-practice-make-your-assumptions-explicit/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

