Soccer World Cup 2018 Average Face by Team

World Cup 2018 Average Player Face by Team
World Cup 2018 Average Player Face by Team

 

We all know that Spain is going to win this world cup ūüėČ, but watching the games is still entertaining. I thought today while watching England vs. Tunisia that soccer players look very much alike, especially within a team, and so I thought I could compute the average face by soccer team for this world cup.

Data

Getting the data, in this case, images for each player in the current soccer world cup is arguably critical. The images need to have a plain background, similar ilumation, and ideally, equal size. Luckly, the FIFA has done all of this, and the images are available online: https://www.fifa.com/worldcup/players/

If you inspect one of the player’s elemtents, you can access the URL for the player’s image, which can then be downloaded programmatically

Inspecting the player element on the browser console shows the URL.

I downloaded the page’s source code, and downloaded the 736 300 pixels squared images.

Averaging a face

There is a naive way of averaging an image, using numpy, and it only takes a few lines:

import numpy as np
import imageio

countries = ['argentina', 'belgium', 'colombia', 'croatia', ...]

for country in countries:
    images_stacked = np.zeros((23, 300, 300, 3))

    for i in range(0, 23):
        images_stacked[i] = imageio.imread('images-input/%s/%s_%s.jpg' % (country, country, i))

    results = np.zeros((300, 300, 3))

    for color in range(0, images_stacked.shape[3]):
        for x_pos in range(0, images_stacked.shape[1]):
            for y_pos in range(0, images_stacked.shape[2]):
                results[x_pos, y_pos, color] = np.mean(images_stacked[:, x_pos, y_pos, color])

    imageio.imwrite('images-output/%s_naive_averager.png' % country, results)

There is room for improving on this code, but I am not interested in that, rather on the actual results. And here is how they show:

Sweden Naive Average Face

Let’s look at another example:

Brazil Naive Average Face

Even though images are standard, averaging the pixel values doesn’t build a compeling image; sure, we can distinguish some features such as color of the t-shirt or hair, but we can’t put a face to it.

Face morpher

Luckily, there are better ways. Meet Face Morpher (FM from now own). FM works in a different way to find the average face, instead of naively averaging the pixel values, it builds a geometry of the face by identifying elements on it such as the eyes. It then proceeds to average those sections across images.

The results are much more compeling, and we are definitely able to put a face to this teams now.

Sweden FM Average Face

Or in the case of Brazil:

It looks great! I love it! And yet, aren’t we missing important and distinctive elements, such as the hair, ears, or even team t-shirt?

Face art

I decided to them combine the two images: naive + FM for a more compeling result. I blended the images using Sketch, used the naive image as background, and overlayed a semi-transparent FM face.

Quite happy with the result:

World Cup 2018 Average Player Face by Team
World Cup 2018 Average Player Face by Team

Extra

How does the average face look across all countries then?

Average face across all countries
Average face across all countries

Tennis API

I published a tennis API on https://market.mashape.com/f-sm-jorquera/tennis-database.

This API has methods to query all tennis matches on the ATP since 1968,  along with players and qualifications. Dynamic filtering, sorting and pagination of the results are allowed.

Example request to get matches played on Clay that lasted over 350 minutes (almost 6 hours!):

curl --get --include 'https://f-sm-jorquera-tennis-database-v1.p.mashape.com/atp_match/?max_results=10&page=1&sort=tourney_date&where={"minutes":">350", "surface": "Clay"}' \
  -H 'X-Mashape-Key: {mashape_key}' \
  -H 'Accept: application/json
{
  "_items": [
    {
      "id": 125145,
      "tourney_id": "2004-520",
      "tourney_name": "Roland Garros",
      "surface": "Clay",
      "draw_size": 128,
      "tourney_level": "G",
      "tourney_date": "2004-05-24",
      "match_num": 61,
      "winner_id": 102148,
      "winner_seed": 0,
      "winner_entry": "",
      "winner_name": "Fabrice Santoro",
      "winner_hand": "R",
      "winner_ht": 178,
      "winner_ioc": "FRA",
      "winner_age": 31.4552,
      "winner_rank": 58,
      "winner_rank_points": 662,
      "loser_id": 103096,
      "loser_seed": 32,
      "loser_entry": "",
      "loser_name": "Arnaud Clement",
      "loser_hand": "R",
      "loser_ht": 173,
      "loser_ioc": "FRA",
      "loser_age": 26.4339,
      "loser_rank": 33,
      "loser_rank_points": 1005,
      "score": "6-4 6-3 6-7(5) 3-6 16-14",
      "best_of": 5,
      "round": "R128",
      "minutes": 393,
      "winner_ace": "22",
      "w_df": 2,
      "w_svpt": 229,
      "w_1stIn": 108,
      "w_1stWon": 77,
      "w_2ndWon": 60,
      "w_SvGms": 36,
      "w_bpSaved": 13,
      "w_bpFaced": 22,
      "l_ace": 10,
      "l_df": 9,
      "l_svpt": 234,
      "l_1stIn": 121,
      "l_1stWon": 84,
      "l_2ndWon": 52,
      "l_SvGms": 34,
      "l_bpSaved": 17,
      "l_bpFaced": 27,
      "_created": "Thu, 01 Jan 1970 00:00:00 GMT",
      "_updated": "Thu, 01 Jan 1970 00:00:00 GMT",
      "_etag": "d6e3dcd73f0a27191ce602f0d537e697a022e690",
      "_links": {
        "self": {
          "title": "Atp_match",
          "href": "atp_match/125145"
        }
      }
    },
    {
      "id": 164617,
      "tourney_id": "2017-0308",
      "tourney_name": "Munich",
      "surface": "Clay",
      "draw_size": 32,
      "tourney_level": "A",
      "tourney_date": "2017-05-01",
      "match_num": 297,
      "winner_id": 111202,
      "winner_seed": 0,
      "winner_entry": "",
      "winner_name": "Hyeon Chung",
      "winner_hand": "R",
      "winner_ht": 0,
      "winner_ioc": "KOR",
      "winner_age": 20.95,
      "winner_rank": 78,
      "winner_rank_points": 659,
      "loser_id": 105373,
      "loser_seed": 0,
      "loser_entry": "",
      "loser_name": "Martin Klizan",
      "loser_hand": "L",
      "loser_ht": 190,
      "loser_ioc": "SVK",
      "loser_age": 27.8056,
      "loser_rank": 53,
      "loser_rank_points": 880,
      "score": "6-4 3-6 6-2",
      "best_of": 3,
      "round": "QF",
      "minutes": 987,
      "winner_ace": "4",
      "w_df": 2,
      "w_svpt": 81,
      "w_1stIn": 47,
      "w_1stWon": 36,
      "w_2ndWon": 15,
      "w_SvGms": 13,
      "w_bpSaved": 4,
      "w_bpFaced": 6,
      "l_ace": 8,
      "l_df": 4,
      "l_svpt": 95,
      "l_1stIn": 49,
      "l_1stWon": 28,
      "l_2ndWon": 27,
      "l_SvGms": 14,
      "l_bpSaved": 8,
      "l_bpFaced": 12,
      "_created": "Thu, 01 Jan 1970 00:00:00 GMT",
      "_updated": "Thu, 01 Jan 1970 00:00:00 GMT",
      "_etag": "3e4d5985b601dad5b0229b881b6a0fe52aa06af0",
      "_links": {
        "self": {
          "title": "Atp_match",
          "href": "atp_match/164617"
        }
      }
    }
  ]
}

 

Tablifyme API

I created an API that creates a text table from a JSON array. Useful to send tables of data over email, Slack or Markdown.

The API is available on Mashape: https://market.mashape.com/f-sm-jorquera/tablifyme

Example:

curl -X POST --include 'https://tablifyme.p.mashape.com/tablifyme' \
-H 'X-Mashape-Key: My-Secret-key' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
--data-binary '{"body":[[1,2,3],[4,"test",6]],"header":["Y","A","B"]}'

Result:

+-----+------+-----+
|  Y  |  A   |  B  |
+=====+======+=====+
|  1  |  2   |  3  |
+-----+------+-----+
|  4  | test |  6  |
+-----+------+-----+

 

 

Choropleth: percentage of foreigners by US county

Building maps is very easy with Folium. I got my hands on some data from the US Census, specifically, the foreign born population and total population per US county.

To plot it, first download the CSV from the  2011-2015 American Community Survey 5-Year Estimates.

import os
import folium
import pandas as pd

# Load data
data = pd.read_csv('data.csv')

# Add column with ratio of foreign born vs total pop.
data['ratio_foreign_born_vs_total_population'] = 100 * data['Estimate; Foreign born:'] / data['Estimate; Total:']

# Create map
map_1 = folium.Map(location=[39, -96], zoom_start=4)
high_res_county_geo = os.path.relpath('gz_2010_us_050_00_500k.json') # from http://eric.clst.org/Stuff/USGeoJSON

# Add choropleth layer
map_1.choropleth(
 geo_data=high_res_county_geo,
 name='choropleth',
 data=data,
 columns=['Id', 'ratio_foreign_born_vs_total_population'],
 key_on='properties.GEO_ID',
 fill_color='YlGnBu',
 fill_opacity=0.8,
 line_opacity=0.05,
 legend_name='Percentage Foreigners(%)',
 threshold_scale=[0, 5, 10, 15, 20, 25]
)

folium.LayerControl().add_to(map_1)

# Save as index.html
map_1.save('index.html')

The result:

Redshift UDF Phone Number to Country

Redshift’s UDFs (User Defined Functions) permit to execute, with some limitations, certain Python libraries and custom code.

In my case, I wanted to find a way to extract the country code from a phone number in E.164 format. UDFs are a perfect fit for this, the implementation in SQL would most certainly require creating custom views, and hacking your way around, while in python, we can use the library phone-iso3166

>>> from phone_iso3166.country import *
>>> phone_country('+1 202-111-1111')
'US'
>>> phone_country('+34645678901')
'ES'

To upload a library to Redshift, we first need to check it follows the structure:

directory/
    __init__.py
    extra_files.py
    subdirectory/
        __init__.py
        other_files.py

In our case, phone-iso3166 is already in that structure. Now we need to zip the library:

tar -xvzf phone-iso3166.tar.gz
zip -r phone-iso3166.zip phone-iso3166

With our zipped library, we need to upload it to S3. I did this part manually into a bucket named s3://redshift/custom-udfs/

Now, connect to Redshift and issue:

CREATE LIBRARY phone_iso3166 LANGUAGE plpythonu 
FROM 's3://redshift/custom-udfs/phone_iso3166.zip' 
CREDENTIALS 'aws_access_key_id=<your-aws-key>;aws_secret_access_key=<your-aws-pass>' 
region as '<your-region>';

Last step:

CREATE OR REPLACE FUNCTION udf_phone_country (phone_number VARCHAR(64)) RETURNS VARCHAR(64) IMMUTABLE as $$ 
from phone_iso3166.country import phone_country, InvalidPhone
try:
    return phone_country(phone_number)
except:
    return None
$$ LANGUAGE plpythonu;

You should be all set to use this in your queries:

SELECT 
analytics.udf_phone_country('+14151111975'),
analytics.udf_phone_country('+7652112311'),
analytics.udf_phone_country('+34626472918')

That returns:

"US","RU","ES"

 

Redshift Force Drop User

Ever tried dropping a user in Redshift only to discover that user “user_1” cannot be dropped because the user has a privilege on some object.

That’s not of great help Redshift. Luckily for us, there is a solution. Kamlesh Gallani posted a response about revoking permissions for tables and schemas that I user might still be assigned to, along with changing the ownership of tables that the user might have created. After that, dropping the user is straightforward.

I created a python snippet with his response that might save you a few minutes. You will need the view v_get_obj_priv_by_user from amazon-utils, simply create the view on your Redshift cluster, then, copy and paste this Python script into your favorite editor (e.g. Sublime Text) fill in the connection information and the list of users to drop and execute it.

import psycopg2
 
# Connect to Redshift
conn = psycopg2.connect(host="", dbname="", user="", password="", port="5439")
conn.autocommit = True
cursor = conn.cursor()
 
# List of users to drop
users = ['user_to_drop_1', 'user_to_drop_2']
 
# New owner, used when changing ownership
new_owner = 'new_user'
 
# Templates to change ownership, revoke permissions, and drop users
change_ownership = "select 'alter table '+schemaname+'.'+tablename+' owner to %s;' from pg_tables where tableowner like '%s'"
revoke_schema_permissions = "select distinct 'revoke all on schema '+schemaname+' from %s;' from admin.v_get_obj_priv_by_user where usename like '%s'"
revoke_table_permissions = "select distinct 'revoke all on all tables in schema '+schemaname+' from %s;' from admin.v_get_obj_priv_by_user where usename like '%s'"
drop_user = "drop user %s;"
 
for user in users:
    # Change ownership
    cursor.execute(change_ownership % (new_owner, user))
    for r in cursor.fetchall():
        print("Executing: %s" % r[0])
        cursor.execute(r[0])
    # Revoke schema permissions
    cursor.execute(revoke_schema_permissions % (user, user))
    for r in cursor.fetchall():
        print("Executing: %s" % r[0])
        cursor.execute(r[0])
    # Revoke table permissions
    cursor.execute(revoke_table_permissions % (user, user))
    for r in cursor.fetchall():
        print("Executing: %s" % r[0])
        cursor.execute(r[0])
    # Drop user
    cursor.execute(drop_user % (user))

SET Game Visual Solver

For those of you that have never played SET¬ģ before, SET¬ģ is a fast-paced game involving 81 unique cards with different combinations of four characteristics (color [red, green, or purple], shape [diamond, oval, or squiggle], shading [hollow, shaded, or filled], and number [1, 2, or 3]). The point of the game is to find as many 3-card “sets” as possible. If the three cards are either all the same or all different in each of the four characteristics, you know you’ve found a set. Twelve cards are presented on the table at a time, and are continually replenished when a set is taken out, until no cards remain in the deck. Sometimes, however, there are no sets in the twelve cards, in which case 3 more cards are dealt. But, how do you know if there really is no set? I aim to solve this problem using image processing software and eventually create a set recognition mobile application.

So, what determines a set? Let’s take the example¬†below¬†(although it is probably one of the most difficult sets in the game). Looking at the cards in terms of the four characteristics, we have the following: color – red, green, and purple (all different); shape – diamond, oval, and squiggle (all different); shading – filled, hollow, shaded (all different); number – 1, 2, 3 (all different). If the cards were exactly the same, but all red, this would also be considered a set, because, in terms of color, they are all the same.

 

There’s the simplified breakdown of how I used an image processing algorithm to determine sets from ordinary jpeg¬†images:

Step 1: Convert the image to black and white and invert the image to calculate the number of shapes per card

Step 2: Compare the pixel surface area of one of the symbols on each card to determine the shape

Step 3: Normalize the RGB color of the images with an equation to figure out the color

Step 4: Convert the images to gray scale and apply a border detection to determine the shading

Step 5: Run the simple algorithm that deduces the sets

For more details, check out the paper I wrote.

BlackJack Hitting vs. Standing chances

Who has never thought about beating the house in a Casino? As the MIT students said in their documentary Breaking Vegas, it’s not all about the money; but about the feeling of beating a huge corporation in their own game.

It’s very easy to find online the basic strategy for BlackJack, i.e., the correct decision when the only information that you have about the game are your cards, and the dealers facing card. Understanding by correct decision, the one that would minimize your losing chances. Even with this basic strategy the odds against the house are negative, but greatly reduced to a mere -3 $ per 100$ played (aprox).¬†Of course the Breaking Vegas students didn’t just played basic strategy, they also counted cards and played in a team not to raise any attention from the Big Brother.

For this post, I wrote a small Python notebook that computes what are the chances for the different combinations of cards and strategies in their most simplified way, standing and hitting.

The full notebook: https://github.com/Koff/blackjack/blob/master/blackjack_simulation.ipynb

And the results in two images:

 

On the Y-Axis, the starting score of the player, and on the X-Axis, the dealer’s facing up card (11 being the Ace). The color and values represent the chances of winning if the following the strategy.