Friday, January 29, 2010

Brutal cold…

I never thought that I would miss the rain that much…

100129 Weather

Thursday, January 28, 2010

GDP per Capita versus FIFA Rank II

Some observation based on the plot shown in the post GDP per Capita versus FIFA Rank:

1. This year 17 countries have GDP per capita that is above average and 15 countries have GDP per capita that is below the World Cup (WC) 2010 average. It is almost an even split!

Question: Could that be the case for other World Cups?

Answer: Most probably.

The reason is that all African countries and all of the South American countries (including French Guiana) have a GDP per capita that is below the WC average. Africa normally gets 5 births (6 this year) and South America 4.5 births. Therefore, unless the economic landscape of the world changes dramatically, at least a third of the countries participating in the WC will have low GDP per capita.

On the flip side most European countries (13 births) have a GDP per capita that is above the WC average. Therefore, by definition, at least a third of the countries participating in the WC will have high GDP per capita.

Finally, the exact distribution would be defined by the participants from North+Central America and Asia+Oceania. However, I would expect it to be close to a 50-50 split with a bias towards the higher GDP.

The data are definitely biased by the distribution of WC births per continent!

2. Another thing:

  • From the top 5 countries 1 has low GDP per capita (20%)
  • From the top 10 countries 3 have low GDP per capita (30%)
  • From the top 15 countries 6 have low GDP per capita (40%)
  • From the top 20 countries 9 have low GDP per capita (45%)
  • From the top 25 countries 11 have low GDP per capita (44%)
  • From the all 32 countries 15 have low GDP per capita (47%)

That is outside the top 5 the distribution between high and low GDP per capita countries is balanced. Money and the quality of the national leagues seem to have limited impact to the ranking of the national team. As it should be!

Monday, January 25, 2010

GDP per Capita versus FIFA Rank

Another typical excuse offered to justify the ability of some countries to win consistently is their economic output. It makes sense to some extent. The richer countries have more money to invest in sports and high tech training facilities. The citizens of the rich countries have a higher standard of living and tend to expose their children to sports at an early age. Rich countries have established leagues and actively participate in international sports associations and sporting events.

But is that really the case? It is well established that money buy superb talent and thus success at the club level. (E.g. exhibit A: Chelsea FC; Exhibit B: Real Madrid C.F.; Exhibit C: F.C. Internazionale; etc.) But does money create football talent that can compete at  a national level? Again it is possible, but is it likely?

In other words: Is it really the economy, stupid?

It easy to compare FIFA Rank with GDP per capita found in the CIA factbook. GDP is the Gross Domestic Product which is the value of all goods and services made within a country. The GDP per capita is GDP divided by population and a better metric that shows how well off the citizens of the country are. Also helps dealing with England. The plot of the data versus the 12/2009 FIFA Rank is shown below.

FIFARankvsGDPCapita

It was more fun generating that plot! The MATLAB code to generate the plot is shown below:

plot( points, GDPcapita*1e-3, 'o', 'markersize', 5, ...
  'markerfaceColor', [0 0.8 0], 'markerEdgeColor', [0 0.2 0.7])

P = polyfit(points, GDPcapita, 1);

vv = polyval(P,points);

residualSTD = std( vv - GDPcapita);

R = corrcoef(points, GDPcapita);

hold on
plot(points, vv *1e-3, 'k-', 'linewidth', 1.5)
hold
off
grid on

axis([200 1800 1 ceil(max( GDPcapita*1e-3 )) ])
set(gca, 'XTick', [200:200:1800], ...
   'YTick', [1:4:ceil(max( GDPcapita*1e-3 )) ceil(max( GDPcapita*1e-3 ))], ...
   'fontsize', 9, 'fontname', 'Consolas')
xlabel('FIFA Rank Points')
ylabel('GDP per capita in thousands USD')
title('Data (dots) and 1st order fit (line)')

text(points + 30, GDPcapita*1e-3, strrep(country, '_', ' '), ...
   'fontname', 'consolas', 'fontsize', 10, 'color', 'b')

MATLAB also offers a built-in GUI (plottools) that can do a lot of great things with mouse-clicks. My preference is to use command line interfaces. (The good thing about GUIs is that What You See Is What You Get. The bad thing about GUIs is that What You See Is The Only Thing You Get!) Actually, that particular GUI is good in that it does not seem to leave anything out and some times I use it because it is faster than writing code for an unusual aspect of an one off plot.

Back to the plot: There is a general trend that says rich countries field good football teams. But the correlation is very weak. Also, the data seems shifted above and below the trendline which is probably the result of a binomial distribution (rich versus poor.) And Brazil and Argentina that are perennial winners do not have very wealthy citizens. Therefore:

It is not likely that national economic output and thus national wealth is a significant factor in the success of a country’s national team in world football!

Basic statistics for the 32 countries participating in the 2010WC:

Sample size = 32

Max: $47,500 (USA)

Min: $1,500 (Ghana)

(GDP per capita for Greece: $32,100)

Mean: $21,391

Median: $22,100

Intercept: 8,362.5

Slope: 14.1

Residual std. deviation: $13,978

Correlation = 0.2772

Friday, January 15, 2010

MATLAB: Read formatted input file

The data collected from the FIFA Ranking table and the CIA Factbook were entered in a 32-row by 8-column tab delimited text file shown below.

WCTextFileCapture_

Obviously, it can be easily manipulated with Excel but I also want to do some more work that is easier to do with MATLAB.

It is straight forward to open the file with Excel, copy a column, and cut and paste as input to variable in the command window or a file. But that is too crude for my taste and it does not work for very large files. Plus MATLAB (7.1 R14) offers a number of functions that can read formatted data. These functions are outlined below. For each case I get an estimate of runtime performance by measuring elapsed time using tic-toc. Prior to running each case I clear the memory using clear all as shown in Case 1.

Case 1: use IMPORTDATA
Function importdata will read the text file and assign the data in parameters data and text to textdata. Works OK but not perfect because it can easily confuse data and text.

clear all

tic

M = importdata('WC2010_GK Stats.txt');

fifaRank = str2num( char(M.textdata(:,1)) );
country = M.textdata(:,2);
points = M.data(:,1);
GDPcapita = M.data(:,3);

toc

Case 2: use STRREAD
Function can read formatted data from a string. It requires a loop to go through the file and string manipulation. However, it is pretty fast and reliable. 

fid = fopen('WC2010_GK Stats.txt', 'r');

n = 1;

% If fgetl encounters EOF indicator, it returns -1
while 1
   tline = fgetl(fid); % return the next line of the file associated w/ fid
   if ~ischar(tline),   break,   end % terminate loop
   dummy = strread(tline, '%s');
   % Preallocate to speed up though in this case not much difference
   fifaRank(n) = str2num( char( dummy(1,:) ) );
   % NOTE: theoretically str2double is faster than str2num but I have never
   % seen any real advantage!

   country(n,:) = dummy(2,:);
   points(n) = str2num( char( dummy(3,:) ) );
   GDPcapita(n) = str2num( char( dummy(5,:) ) );
   n = n + 1;
end

fclose(fid);

 

Case 3: use TEXTREAD
Function can read formatted data form text file. It easy to use once you read the (extensive) documentation. 

[fifaRank, country, points, x, GDPcapita, y, z, w] = ...
   textread('WC2010_GK Stats.txt', '%d %s %f %f %f %f %f %f');

 

Case 4: use SSCANF
Function sscanf reads data from the MATLAB string s, converts it according to the specified format string, and returns it in matrix A in column format. For a mixed number + character string the function returns all numbers which can be painful!!

fid = fopen('WC2010_GK Stats.txt', 'r');

n = 1;

% If fgetl encounters EOF indicator, it returns -1
while 1
   tline = fgetl(fid);
%return the next line of the file associated w/ fid
   if ~ischar(tline),   break,   end
  
% Read in the numerical values; Use * to ignore character input for
   % first pass. The reason is that countrly name is variable length.
   % Recall that conversion characters marked with asterisk are NOT
   % returned.

   A = sscanf(tline, '%e %*s %e %e %e %e %e %e', Inf);
   fifaRank(n) = A(1);
   points(n) = A(2);
   GDPcapita(n) = A(4);
  
% Now do a second pass with sscanf ignoring numerical characters.
   % Then use char to put name together.
   B = sscanf(tline, '%*e %s %*e %*e %*e %*e %*e %*e', inf);
   country(n,:) = cellstr( char(B) );

   n = n + 1;
end

fclose(fid);

 

Case 5: use FSCANF
Function fscanf is good about handling numerical data and is pretty fast but it does not handle character data very gracefully. I would need to write additional code to format the character data properly.

fid = fopen('WC2010_GK Stats.txt', 'r');

[A, count] = fscanf(fid, '%e %*s %e %e %e %e %e %e', [7 32]);

fifaRank = A(1,:);
points = A(2,:);
GDPcapita = A(4,:);

% Need to set the file position indicator to the beginning of the file
frewind(fid)

% and use fscanf a second time
B = fscanf(fid, '%*e %s %*e %*e %*e %*e %*e %*e');

fclose(fid);

Case 6: use TEXTSCAN
Function textscan is a fairly new function intended to replace textread and strread. Function textscan reads in the data, formats them according to format specifiers, and place them in cells in a cell array.

fid = fopen('WC2010_GK Stats.txt', 'r');

A = textscan(fid, '%d %s %f %f %f %f %f %f'); % data are placed in cell array

fifaRank = A{:,1};
country = A{:,2};
points = A{:,3};
GDPcapita = A{:,5};

fclose(fid);

I mentioned earlier that I quantified the performance of each case on an AMD X2 Dual Core Processor 3800+ with 3 GB of RAM. Average runtime results for my small file are tabulated below.

 

Case Function Elapsed time [sec] Comments
1 importdata 0.2177

41x slower. Standard import data function. Need to check data parameters.

2 strread 0.0352

6.6x slower. Pretty reliable.

3 textread 0.0222

4.2x slower. Least code.

4 sscanf 0.0175

3.3x slower. Needs a bit of code and care with text data. Possibly the function I have used the most.

5 fscanf 0.0104

2x slower. Text not properly displayed. More code needed. Very good for numerical data.

6 textscan 0.0053

Fastest! Efficient! Backwards compatibility issue…

Sunday, January 10, 2010

Population versus FIFA Rank

It is a classic excuse mostly heard by small(er) countries: Brazil (or USA, Spain, France – pick your favorite adversary) is a very big country and it is much easier to find 20 very good players in a population of 100 million that it is to find 20 equally good players in 10 million.

It is possible but is it likely?

Based on the data available at the CIA Factbook per country participating in the World Cup and the corresponding FIFA Rank the following bar graph can be created (with Excel 2007 and Adobe Photoshop Elements.)

FIFA Rank vs Population

It seems that population is not a significant factor. Two out of the top 5 countries have populations under 20 million and one of them (Portugal) is barely over 10 million. In the Top 20 there are 3 countries under 9 million (Switzerland, Serbia, Uruguay – though not expected to do very well) and one more country (Greece) that is listed as close to 11 million. Between 21 and 32 there are 6 more countries with populations under 10 million. Not to mention that the top 2 countries by population (China and India) are not participating.

However, if you look at the data in some more detail there is a bit of a pattern that seem to favor the more populous countries.

Plot population vs. Rank and fit a first order polynomial:

FIFA Rank vs Population_MATLAB

Though the polynomial fit indicates that, in general, higher population translates to higher performance in national team soccer, the correlation coefficient (R = -0.225) is such that the two quantities are very weakly correlated. That is:

It is not likely that population is a significant factor in the success of a country’s national team in world soccer!

Some population statistics for the WC2010 countries:

N = 32

Mean: 49.5 millions

Median: 22 millions

Intercept: 75.8

Slope: –1.6

Residual std. deviation: 64.7 millions

Correlation = -0.225

Technorati Tags: ,,,,,,,,,,,,,,,,,,,,

Saturday, January 9, 2010

The sporting event of the year

That is easily the World Cup in South Africa. I know the Winter Olympics start on February 12 in Vancouver BC but I do not expect very high TV ratings, not even in the US. The last Winter Olympics in Turin were pretty forgettable. For my taste, other than ski jumping (which to me it is a mind boggling sport!), downhill slalom, and the biathlon, all other events seem pretty boring (with curling being the queen of bore!)

In my opinion the best thing about the Vancouver Olympics will be Vancouver itself. Vancouver is a beautiful city. It looks like a cross between a European and an American city with a lot of parks, high-rises, sidewalks full of people especially on Robson street (shown in second picture below;  among dozens of stores and restaurants there are two Starbucks coffee shops right across from each other!) and Gastown. I instantly fell in love with it. It is unfortunate for the visitors that it is going to be cloudy, rainy, wet, and mostly crappy for the duration of the event.

DSC00466_ DSC00524_

(We took the pictures during our last trip to Vancouver in June 2006.)

Back to South Africa! So what is it that makes some countries successful qualifiers? Obviously the luck of the draw. But theoretically all groups are fairly even in composition since the include seeded teams etc. In addition, most powerhouses regularly field good teams. Why is it that Brazil and Italy (two countries that have nothing in common) are such consistent winners? Why is it that Italy has a higher performing team than Greece and France? What are the intangibles?

So I decided to collect some data about each country that is going to be part of WC2010 and compare them relative to their FIFA rank. The source of data is the CIA Factbook which provides all kind of information about each country. Results will be published in the next few days.

BTW, soccer is a great way to catch up with all the latest geographical changes!

Saturday, January 2, 2010

HAPPY 2010!

New Year, new state, new city, new job, new home address, etc, etc

2009 was a pretty intense year but we managed!! I expect 2010 to be equally intense (at least until the summer.) Here’s to some more life management!

Seattle, much like Athens, will always be in our hearts. So many friends, such natural beauty, so many great memories. We will never forget.

NewYearsEve2010