C++ Accelerate Massive Parallelism

by Call me dave... 25. June 2011 07:43

I have been an avid follower of Herb Sutter’s writings for years, from his magazine articles in the C++ Report and the C/C++ Users Journal, his books on C++ development and recently his series of posts on parallel programming and concurrency for Dr Dobbs.  Herb is the chair of the C++ standards committee where he and others are near to finalising C++0x standard.  In essence Herb knows his stuff.

Herb recently announced C++ AMP at the AMD Fusion conference and I was blown away.  Microsoft have modified their C++ compiler so that it can target both the CPU and a DirectX compatible graphics card within the same program.  This enables a developer to write code that utilises the hundreds of cores available on graphics cards, to perform blazingly fast parallel computations, without resorting to the current practice of writing code in a language designed for graphics processing (High Level Shader Language [HLSL] or GL Shading Language [GLSL]) or within a C like language (CUDA or OpenCL) where the OO benefits of C++ are missing.

Instead the keyword restrict is used to mark a method as being able to be executed on the graphics card.  The developer then uses a subset of the C++ language (enforced by the compiler) to implement the computation.  The complexity of moving data between the CPU, RAM and the GPU is also simplified by the introduction of a set of classes that automatically handle the marshalling of data between the devices.  The compiler converts the C++ code into HLSL code which is then embedded into the executable.  At runtime the HLSL code is sent to the DirectX driver which in turn converts it into the appropriate machine code (for a particular device) and executes it on the graphics card.

The example below (from Daniel Moth’s presentation - URL below) is a simple matrix by matrix multiplication.  Its clear that the outer for loop can be parallelised.  An interesting point is the complex indexing to pull out values from vA and vb which are both one dimensional arrays but actually store a two dimensional structure.

void MatrixMultiply( vector<float>& C,
const vector<float>& vA,
const vector<float>& vB, int M, int N, int W )
{
  for (int y = 0; y < M; y++) {
    for (int x = 0; x < N; x++) {
      float sum = 0;
      for(int i = 0; i < W; i++)
      sum += vA[y * W + i] * vB[i * N + x];
      vC[y * N + x] = sum;
    }
  }
}
Using C++ AMP the matrices are marshalled across to the GPU by wrapping the vectors in array_views.  Further the array_views are used to project the two dimensional matrix onto the one dimensional vector and so the complex indexing code isn’t present in the C++ AMP version.  The code that executes on the GPGPU are the lines within the lambda expression that marked with the restrict(direct3d) keyword.  On the graphics card the texture that is getting the result of computation is write only hence variable c is of type array_view<writeonly<>>.
void MatrixMultiply( vector<float>& vC,
const vector<float>& vA,
const vector<float>& vB, int M, int N, int W )
{
  array_view<const float,2> a(M,W,vA),b(W,N,vB);
  array_view<writeonly<float>,2> c(M,N,vC);
  parallel_for_each(c.grid, [=](index<2> idx) restrict(direct3d) {
   float sum = 0;
   for(int i = 0; i < a.extent.x; i++)
     sum += a(idx.y, i) * b(i, idx.x);
     c[idx] = sum;
   }
  );
}

There are millions of lines of C++ code used in finance industry for modelling derivatives and many of the larger institutions are now looking at GPGPU technology to give them an edge in the competitive world of algorithmic trading.  CUDA is currently the default choice for GPGPU development it will be very interesting in twelve months to see if this trend reverses due to both the simplicity of parallelising a function using AMP C++.

Herb’s keynote.

Daniel Moth: Blazing-fast code using GPUs and more, with C++ AMP

NVidia will also support C++ AMP but points out CUDA is available for Linux and Mac as well as Windows

Finally Microsoft have indicated that they will take their C++ extensions to a standards board thus enabling other compiler vendors to implement the restrict keyword in their products.  Hopefully this will mean that in a couple of years it will be possible to have C++ code target other devices such as FPGAs, PS3’s cell processor and Intel’s Many Integrated Cores (MIC) daughter board.

 

Side Note: 

Eventually all good ideas come back again. When C with Classes was being created it had a readonly and a writeonly keyword and when C++ was created the C standard committee liked the concept so much they asked that the readonly keyword by renamed to const and added it to the language. Meanwhile writeonly was dropped entirely. Instead of adding writeonly as a keyword Microsoft have gone down the library root i.e. writeonly<T> instead of writeonly T. Personally I think a keyword would have been far cleaner but then again I don’t work on the standards committee and have no idea how complex it would be to add into the official ISO C++ standard…  But considering how long its taken to complete this version of the standard I cant say I am terribly surprised...

Currently rated 1.4 by 465 people

  • Currently 1.377147/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Coding Dojos at the workplace

by Call me dave... 25. June 2011 05:58

Your an IT manager at a small manufacturing company and have a staff of 5.  In two months time your team is slated to replace the internal HR web site that is currently running on Classic ASP. The Architect and lead developer have selected the following software stack:

  • ASP.NET MVC
  • Entity Framework
  • .NET 4.0

The team is very experienced however for the last 2 years they have been maintaining the sales web site which runs on ASP.NET and .NET 2.0.  Due to production support constraints you are unable to send the team on an externally run training course… 

Over my career I have seen this situation played out far too many times.  Teams spend the first couple of weeks or months learning on the job and end up having to replace most of the code that was written at the project’s commencement.

A Coding Dojo is a meeting where a bunch of coders get together to work on a programming challenge. They are there have fun and to engage in DeliberatePractice in order to improve their skills. (ref Coding  Dojo)

What’s the best way to learn a new technology, pattern or programming language?  By playing, doing, prototyping, fiddling and experimenting with it.  What’s the best way for a team to learn a new technology, pattern or programming language? By doing it together!!

Coding Dojos rules:

  • A challenge isn’t solved or challenge beaten without code i.e. A paper design doesn’t cut it
  • Code doesn’t exists unless there are tests
  • Work as a team/hook up with an expert

Formats:

  • Prepared Kata – Presenter solves the problem in front of an audience but can/should/will change his solution based on ideas/enhancements/feedback from the audience
  • Randori Kata – A pair of developers are selected from the audience.  The pair work together together to solve the problem and every 15 or 20 minutes one of the pair is swapped with another audience member.  Preferably everyone in the audience should have an opportunity to sit at the computer and bang out some code.

In a Coding Dojo “meet up” the Randori Kata can be intimidating for a new comer – No one wants to get a mental blank in front of a group of people or have a group commenting of their coding style.  This should be less of an issue in a work environment where the developers already know each other.  In fact the format ensures that everyone participates in the learning experience.

Workplace <Insert Technology here> Coding Dojo:

  1. Prep work
    1. Find the most experienced person in the team who has already used the technology
    2. Ask them to create a 30 minute demo (PowerPoint is fine) where the technology is used
    3. Ask them to create a “Cheat sheet” that consists of 3 or 4 pages of generic code snippets that show how the technology is used for different situations
    4. Create a coding problem (don’t tell the team or the person that wrote the Cheat Sheet) that can be solved reasonably easily using the technology and ensure that the solution to the problem utilises items within the Cheat Sheet
  2. Kata
    1. Have the team member present the 30 minute introduction demo
    2. Provide everyone with a copy of the “Cheat Sheet”
    3. Spend 10 minutes discussing the technology and the contents of the Cheat Sheet
    4. Explain the coding problem
    5. Randomly chose a developer to join the presenter – Coding in a pair is less stressful
    6. Have the presenter/colleague pair start solving the problem (make sure they write tests)
    7. Following Randori Kata rotate through the developers every 15 to 20 minutes
    8. After two hours of coding stop the clock
    9. Spend 10 minutes in a discussion
      1. What was hard?
      2. What was easy?
      3. How confident do people feel?
      4. What else do we need to learn?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Coding Dojo

WPF, Silverlight and MVVM presentation slide deck

by Call me dave... 27. May 2011 05:03

During the week I presented an introduction to WPF & Silverlight using the Model View ViewModel pattern.  The audience was a group of developers that had been working on a .NET 2.0 based application for the last 5 years and were going to transition to .NET 4.0.  Hence the session material was designed to briefly touch on the main WPF concepts and allow them to drill into the topics in more detail at their own leisure.

Enjoy…

WPF & Silverlight Development using Model View ViewModel.pptx (164.07 kb)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Back in Australia

by Call me dave... 27. May 2011 04:52

After living in the UK since mid 2008 my wife and I have moved back to Sydney Australia.  We are also bringing back our baby boy Logan who is the reason I haven’t blogged in almost a year.  I’m looking forward to learning about the Open Source, Alt.NET and Agile community in Sydney. 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

BattleLang – A Domain Specific Language in .NET – Part III

by Call me dave... 22. December 2009 15:45

In this series of blog posts I am exploring the creation of a DSL for a mythical Role Playing Game called “Dragons of Kin” at the Puzzle Guzzler software house.  The language will allow the game designers to interact with the underlying game engine.  All the business logic and heavy lifting is performed by the engine meanwhile the language will be used a conduit into the engine.  Characters, layout and animations will be configured using the language.  In this post I will show the first implementation of the DSL.

In the last post Douglas (the development lead) and Jasmine (the level designer) had created the following example of the first cut of the language they had developed.

 Creature Crocodile {
    Life := random_select [30..40] LP
    Magic := 15 MP
    LifeHealingSpeed   := 3 LP/sec
    MagicHealingSpeed   := 1 MP/min
    Weapons
    {
      Bite:
        Inflicts:= random_select [5..10] LP
      PoisonBite:
        Inflicts:= random_select [5..10] LP + 3 LP/sec
        Costs:= 5 MP
    }
  }

Douglas takes the language fragment to Jimmy a developer on his team.

Jimmy So this is the infamous BattleLang
Douglas Yep.  As we discussed in the Stand Up.  We have three weeks to implement a text mode engine that will allow Christoph to move from the town to the spring.  There he will kill the crocodiles and save the town.
Jimmy Everything hard coded then.  So we will have three Creature types – Jimmy, Crocodile and Boss Crocodile.  Lets not do spawning either.  The Crocodiles will be created as soon as the engine starts.  There really should be an Area type in the language so that Jasmine can assign the Crocs to a location.
Douglas Definitely.  Lets put that up onto the Product backlog.  So can you do the first cut of the parser over the next couple of days?
Jimmy I think so.  I was looking at F# Lex support, ANTLR, Bison but if I only have 2 days I don’t want to learn a new library or toolset.  I’ll just crank out something quickly and we’ll throw it away bit further down the track. It looks pretty easy to parse.
Douglas Lets go to the white board and draw out a transition diagram.

image  

Each line represents a token that the scanner is reading from the input.  We start with “Creature” read in the name of the name Creature then we have an open and close bracket and then we reach the end. In this case we are saying its legal to have a Creature with no attributes defined at all.
Jimmy And what we assume that during each transition all the whitespace is thrown away?  It doesn’t make sense for that to be legal does it?
Douglas For now lets assume that around every token there is whitespace.  When the code is rewritten we can throw that assumption away.  So the code should actually look like this.
  Creature Crocodile {
    Life := random_select [ 30 .. 40 ] LP
    Magic := 15 MP
    LifeHealingSpeed   := 3 LP / sec
    MagicHealingSpeed   := 1 MP / min
    Weapons
    {
      Bite :
        Inflicts := random_select [ 5 .. 10 ] LP
      PoisonBite :
        Inflicts := random_select [ 5 .. 10 ] LP + 3 LP / sec
        Costs := 5 MP
    }
  }

What is I mean by it being legal is that the checking if the Creature is valid is a different step in the process. 
image 
By shifting the complex validation out of the Syntax Analyser it makes the code a lot simpler. 
Jimmy Ok.  Let me see so to do the parsing of the character attributes there is another string literal and then we parse the function itself.
 image  

And the function parser is as follows.
image
Douglas I’ll add in the Weapons
 image

 

Building the Proof of Concept BattleLang DSL

As described above there are a couple of stages to process a language:

  • Lexical Analyser which converts the input stream into into Tokens
  • Syntax Analyser which implements the transition diagram
  • Semantic Analyser validates the business rules

 

Lexical Analyser

By simplifying the language definition so that all Tokens are separated by whitespace the scanner is very simple.

public static TokenAndStringIndex GetNextToken(string text, int currentIndex)
{
    string token;
    if (HaveReachedEndOfInput(currentIndex, text))
        return new TokenAndStringIndex("", text.Length);

    currentIndex = RemoveWhitespace(currentIndex, text);
    currentIndex = GetToken(currentIndex, text, out token);
    return new TokenAndStringIndex(token, currentIndex);
}

The TokenAndStringIndex is a tuple that stores the current Index into the array and the Token that was found.  Retrieving a list of all the tokens is a simple matter of looping through the input.  While we could return an IEnumberable<string> in a more complete solution the TokenAndStringIndex would be renamed and extended to to include information such as the line number within the string the token was extract from.  This information is invaluable when it comes to creating useful error messages when problem are found during the Syntax and Semantic Analyser steps.

 

public static IEnumerable<TokenAndStringIndex> GetAllTokens(string text)
{
    int currentIndex = 0;
    while (currentIndex < text.Length)
    {
        var result = GetNextToken(text, currentIndex);
        currentIndex = result.IndexIntoInputString;
        if (!string.IsNullOrEmpty(result.Token))
            yield return result;
    }
}

 

Syntax Analyser

The syntax analyser eats through the streams of tokens moving through the transition diagram.  On nodes such as the Open Bracket it needs to grab the next token and determine which of the three subsequent nodes should be selected (Close Bracket, retrieve Creature attribute or Enter Weapons scope).  There are lots of different ways to model the nodes and transitions.  One very clean way is to have a class per node and implementing the state pattern.  All classes implement an interface INode and member INode ProcessToken(Token t)

However this time I decided to use an even simpler (although far less scalable) approach.

First three enums were created:

  • Keywords – Include nodes for Creature,  Weapons, Inflicts, Costs,  MagicPoints, LifePoints, Seconds, Minutes, RandomSelect and Colon
  • Operators – Includes nodes for Assign, Add, Subtract, Multiply, Divide, Range
  • Scope – Includes nodes for OpenCreateScope, CloseCreateScope, OpenWeaponScope, CloseWeaponScope, OpenRangeScope, CloseRangeScope

I then created the TokenAction class which links the Token string, to the enum and to a delegate that will manage the transition logic and the retrieval of the information about the Creature.

internal class TokenAction<T>
{
    private readonly T _tokenType;
    private readonly string _token;
    private readonly Action _action;

    public TokenAction(T tokenType, string token, Action action)
    {
        _tokenType = tokenType;
        _action = action;
        _token = token;
    }

    public Action Action
    {
        get { return _action; }
    }

    public string Token
    {
        get { return _token; }
    }

    public T TokenType
    {
        get { return _tokenType; }
    }
}

If we are at Node “Creature” the next token will be the Creature’s name then the following node should be a “{“.  The “{“ node includes its own transition logic that we can access by calling execute on its Action.  Hence:

new TokenAction<KeyWords> (KeyWords.Creature, "Creature", () =>

  {
    _creature.Name = GetLiteral(); 
    TransitionToScopeToken(new[] { Scope.OpenCreateScope });
  }

)

The following code searches for either “MP” or “LP” and sets the _dimension to the required value:

TransitionToKeywordToken(new[] { KeyWords.MagicPoints, KeyWords.LifePoints });

new TokenAction<KeyWords>(KeyWords.MagicPoints, "MP", () => _dimension = DimensionType.Magic)

new TokenAction<KeyWords>(KeyWords.LifePoints, "LP",  () => _dimension = DimensionType.Life)

As you can see the code “collects” values such as the DimesionType as it parsers the tree.  All of the Tokens and their relevant Actions were placed into three symbol tables using the TokenActions class which manages the moving executing of the Actions.

internal class TokenActions<T> 
{
      private readonly Dictionary<T, TokenAction<T>> _symbolTable = new Dictionary<T, TokenAction<T>>();
      public void Add(TokenAction<T> tokenAction)
      {
          _symbolTable.Add(tokenAction.TokenType,tokenAction);
      }

      private TokenAction<T> GetTokenAction(string token)
      {
          return _symbolTable.Values.First(x => x.Token == token);
      }

      private TokenAction<T> GetTokenAction(T tokenType)
      {
          return _symbolTable[tokenType];
      }

      public void InvokeAction(string token)
      {
          GetTokenAction(token).Action();
      }

      public void InvokeAction(T tokenType)
      {
          GetTokenAction(tokenType).Action();
      }

      public bool IsTokenFor(string s, T tokenType)
      {
          return GetTokenAction(tokenType).Token == s;
      }

      public bool TokenSymbolExists(string token)
      {
          return _symbolTable.Values.Count(x => x.Token == token) > 0;
      }
  }

The parser populates the following objects:

  • Unit which models the following “10 MP / sec”
  • UnitsType a collection of Unit objects and automatically reduces “10 MP / sec + 11 MP / sec” to “21 MP / sec”
  • Weapon which includes the damage that is inflicted and the cost to the user
  • Creature which will be passed to the engine – Currently implemented as a fake shell to demonstrate the parser
using System.Collections.Generic;
using System.Linq;

namespace BattleLangParser
{
    public enum DimensionType
    {
        Magic,
        Life
    }

    public enum TimespanType
    {
        Now,
        Second,
        Minute
    }

    public class Unit
    {
        public float Amount { get; set; }
        public DimensionType Dimension { get; set; }
        public TimespanType Timespan { get; set; }
    }

    public class UnitsType
    {
        private readonly List<Unit> _units = new List<Unit>();
        public IList<Unit> Units { get { return _units; } }
        
        public void Add(Unit u, Operator @operator)
        {
            if(_units.Where(x => x.Dimension == u.Dimension && x.Timespan == u.Timespan).Count()>0)
            {
                var result = _units.First(x => x.Dimension == u.Dimension && x.Timespan == u.Timespan);
                switch(@operator)
                {
                    case Operator.Add:
                        result.Amount += u.Amount;
                        break;
                    case Operator.Subtract:
                        result.Amount -= u.Amount;
                        break;
                }                
            }
            else
            {
                _units.Add(u);
            }
        }
    }

    public class Weapon
    {
        public Weapon()
        {
            Inflicts = new UnitsType();
            Costs = new UnitsType();
        }

        public UnitsType Inflicts { get; private set; }
        public UnitsType Costs { get; private set; }
    }

    public class Creature
    {
        public Creature()
        {
            Stats = new Dictionary<string, UnitsType>();
            Weapons = new Dictionary<string, Weapon>();
        }

        public string Name { get; set; }
        public Dictionary<string, UnitsType> Stats { get; private set; }
        public Dictionary<string, Weapon> Weapons { get; private set; }
    }
}

Semantic Analyser

In this version of the software it does not include a semantic Analyser.

 

 

Unit Tests

An example of one of the Unit tests:

[Test]
public void ParseCreature_MultipleDimensions_EachDifferent()
{
    const string emptyCreature = @"
Creature TestMonster
{
    RingOfhealing := 3 MP / min + 8 LP / min
}";
    var b = new Parser(emptyCreature);
    Creature result = b.Parse();
    Assert.AreEqual("TestMonster", result.Name);

    Assert.IsNotNull(result.Stats["RingOfhealing"]);
    Assert.AreEqual(3.0, result.Stats["RingOfhealing"].Units[0].Amount);
    Assert.AreEqual(DimensionType.Magic, result.Stats["RingOfhealing"].Units[0].Dimension);
    Assert.AreEqual(TimespanType.Minute, result.Stats["RingOfhealing"].Units[0].Timespan);
    Assert.AreEqual(8.0, result.Stats["RingOfhealing"].Units[1].Amount);
    Assert.AreEqual(DimensionType.Life, result.Stats["RingOfhealing"].Units[1].Dimension);
    Assert.AreEqual(TimespanType.Minute, result.Stats["RingOfhealing"].Units[1].Timespan);
}

Verifying the Creature has been created correctly involves:

  • Checking that expected attributes have been loaded
  • Checking that attributes have the correct Units of Measure
  • Providing the user with appropriate error messages

 

Summary

This post has shown how to very easily write a simple DSL (less than 500 lines of code in total) that can be used to populate a relatively complex object. 

There are problems with the approach I’ve taken:

  • The Lexical Analyser was written by hand – There are some great libraries and frameworks that can do the heavy lifting for us on this one
  • Logic for populating the Creature was intermingled with logic for verifying the grammar – An Abstract Syntax Tree and the Visitor pattern is the answer for decoupling those two domains
  • Currently very poor error messages for badly formed user code

 

Source code

BattleLang.SimpleImplementation.7z (91.06 kb)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

Who is Dave

David Ross - Gold Coast QLD

A QLD'er living in Sydney

Email:willmation(at)gmail.com