Model vs. Algorithm

As discussion about Artificial Intelligence has become a mainstream topic, we hear a lot of terms used in loosely. One key example is model and algorithm, typically calling everything an algorithm (because it sounds more scientific?). In fact, when we are talking about AI in applications (in society) we are almost always talking about models, not about algorithms. A model is a stylized way to describe a relation between two things. Just like a London supermodel is a stylized way to show clothes.

Using Days of the Week to Understand Modulo

Today is Monday January 1st 2024, what day of the week will it be in 7 days? That is pretty simple, a week has seven days, so seven days from now exactly one week will have passed, and it will again be a Monday. That’s correct, now let’s make this slightly harder, what day of the week will it be in 14 days? That is also pretty simple, a week has seven days, 2 times 7 is 14, so in 14 days exactly 2 weeks will have passed and it will again be a Monday.

Computing on Encrypted Data - What? Why? Toy example

Encryption is typically used to protect information like written communication from evedropping, such as messages sent over the internet. However, encryption can also be used to encrypt data, moreso using homomorphic encryption, the data can still be computed upon while encrypted, the results will still be encrypted (but valid), and can be decrypted again by the orignal data owner/supplier. A couple of years ago I was skiing on Mont Blanc, I had a bad fall and an X-ray was made. This being a mountain village (Chamonix) there was not medical docter available, so a vetinarian (there are lots of cows in those mountains) took it. A vetinarian may be able to operate the X-ray machine, but they cannot interpret the photograph (all true up to here).

CKKS encode encrypt in R

This blog posts shows how to perform CKKS encoding and encryption, followed by decryption and decoding to obtain the original vector of complex numbers. The code uses the R package polynom for polynomials. It also uses the HomomorphicEncryption package. If you are reading this before 30 December 2023, you need to install the development version of HomomorphicEncryption from Github. If you are reading it from 30 December 2023, you can install it from CRAN (make sure it is up to date).

libactivation on PyPI

My new Python package libactivation is now on the Python Package Index (PyPI): https://pypi.org/project/libactivation/ The package implements a series of activation functional - sigmoidal and others, i.a. the Rectified Linear Unit (ReLU) - as well as their derivatives, for various machine learning purpose, such as neural networks. Development takes place on Github: https://github.com/bquast/libactivation Bugs can also be filed on Github: https://github.com/bquast/libactivation/issues Much of the inspiration came from my 2015 R package sigmoid, which was split out of my Recurrent Neural Network framework RNN:

attention package on CRAN

The attention R package, describing how to implement from scratch the attention mechanism - which forms the basis of transformers - in the R language is now available on CRAN. A key example of the results that were achieved using (much larger and more complex forms of) transformers is the change from AlphaFold (1) (which relied primarily on LSTM) to AlphaFold2 (which is primarily based on transformers). This change pushed the results in the protein folding competition CASP-14 to a level of accuracy that made the protein structure prediction accurate enough for practical purposes. A major scientific breakthrough, the impact of which can barely overstated.

Self-Attention from Scratch in R

EDIT 2022-06-24: this code is now available (with helper functions) in the R package attention, which is on CRAN. You can install it simply using: install.packages('attention') See also my blog post attention on CRAN. The development takes place on GitHub. This post describes how to implement the attention mechanism - which forms the basis of transformers - in the R language. The code is translated from the Python original by Stefania Cristina (University of Malta) in her post The Attention Mechanism from Scratch

Online Office Hours

With over a year working from home and an end not immediately in sight, I felt it was time to think a bit structurally about how to work remotely as effectively as possible. The clear missing element is the watercooler conversations / coffees at the cafeteria. Universities have to some extend always have had to deal with faculty not having a default schedule for being at their desks, which normally makes dropping in easy. The way in which this is typically dealt with is by holding office hours at a set time every week (when not traveling). Normally for the person holding office hours, when nobody comes by, this is a good time to get done with some paperwork that otherwise gets forgotten.

Ron Graham's Game

For a job interview at WHO I was asked by build a numeric version of Noughts and Crosses (Tic-Tac-Toe to some), this is called Ron Graham’s Game (repo). Ron Graham’s Game Ron Graham’s Game is a numerical variant of Naughts and Crosses / Tic-Tac-Toe. In the general form, the board is a square matrix of length L >= 3, Player 1 has stones for all the Odd numbers in the range 1:L, Player 2 has stones for all the Even numbers in the range 2:(L-1), a player wins if it completes a row/column/diagonal and the sum equals (1:L)L/2.

Tech Learn Talks

Today I gave a presentation at the UN Innovation Network’s TechLearnTalks (archived, backup): The slides from my presentation are available here: https://docs.google.com/presentation/d/1qDtY8jrMnDz3tGpqg-AvgIBB5iiY54Jko6lMDZP7c5o/ The live demo spreadsheet is available here: https://docs.google.com/spreadsheets/d/1j1dXgZ_9RzvBdKFyA1Goii6IzPniNWSLs14D-kN_RNo/ EDIT (2020-06-11): due to popular demand I am turning the spreadsheet into a somewhat more formal product: http://spreadsheet.network/ a neural network in a spreadsheet With an FAQ. Paper to follow.

homomorphic encryption in R

Homomorphic encryption is allows computations to be performed on encrypted data. This has enormous potential in areas of machine learning that deal with private data, such as medical records. Below is an implementation of homomorphic encryption in R. It encrypts two pieces of data m=10 and m1=2, once they are encrypted (as cipher and cipher2 respectively), the two encrypted forms can be added up together (cyphertotal). They can then be descrypted to reveal mess2 to equal 12 (i.e. the sum of 10 and 2).

Compiling TensorFlow on Arch Linux

Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA The above notification keep popping up whenever you use TensorFlow to remind you that your models could be training faster if you used binaries compiled with the right configuration. When TensorFlow first came out, it was available as a package, it became available to Arch Linux users as a package in the Arch User Repository, meaning that it was compiled on your local system.

Promoting Content in Africa

In the keynote at African Peering Forum (AfPIF) 2016 I presented the Promoting Content in Africa report written together with Michael Kende, these are the slides. or otherwise here: https://drive.google.com/file/d/11uHhY4yLrvBbmf4N2aZm0SE1aUyoDhWt/ The recording is available here (archived, backup): Blog posts The report is accompanied by a blog post on local content creation and a blog post on local content availability, each of which summarises one of the chapters of the report.

Making the Next Billion Demand Access

The Local-Content Effect of google.co.za in Setswana I presented this paper at EEA 2016 in Geneva Abstract This paper shows that an exogenous increase in accessibility of local language content leads to an increase in demand for internet connectivity among native speakers. Internet connectivity provides enormous improvements in quality of life as well as opportunities for the newly connected, yet recent attempts to connect the current ’next billion’ in places such as sub-Saharan Africa have not met expectations. In places where infrastructure has come online and prices have gone down, the expected consequent increase in usage was not observed. The introduction of the Setswana language in the South-African Google Search website was a spillover of the Botswana Google search website being translated from English to Setswana. This exogenous improvement in the accessibility of Setswana-language content has resulted in a substantial increase in the number of native Setswana speakers coming online and owning personal computers. It has also led to increased usage of the Setswana language online, creating a positive-feedback loop. This suggests that connecting the fourth billion will require a greater focus on the demand-side of connectivity, specifically by means of local content.

sigmoid package

The sigmoid package makes it easy to become familiar with the way neural networks work by demonstrating the key concepts using straightforward code examples. Installation The package can now be installed from CRAN using: {% highlight r %} install.packages(‘sigmoid’) # case sensitive! {% endhighlight %} Usage After installation, the package can be loaded using: {% highlight r %} library(sigmoid) {% endhighlight %} For information on using the package, please refer to the help files.

Handcoding a Difference in Differences

In this post we will discuss how to manually implement a Difference-in-Differences (DiD) estimator in R, using simulated data. {% highlight r %} reproducible random numbers set.seed(123) untreated and treated independent variable for period 0 xutr <- rnorm(1000, mean=5) xtr <- rnorm(1000, mean=1) create a data.frame with the dep. var., indep. var., time and id vars for period 0 dfutr <- data.frame(time = 0, id= 1:1000, y=xutr+15+rnorm(1000), x=xutr) dftr <- data.frame(time = 0, id=1001:2000, y=xtr +9+rnorm(1000), x=xtr ) df0 <- rbind(dfutr, dftr)

Handcoding a Panel Model

The most basic panel estimation is the Pooled OLS model, this model combines all data across indices and performs a regular Ordinary Least Squares Estimation. {% highlight r %} load the PLM library for panel estimation library(plm) load the Crime data set data(Crime) {% endhighlight %} {% highlight r %} define the model m1 <- formula(crmrte ~ prbarr + prbconv + polpc) create a panel data.frame (pdata.frame) object PanelCrime <- pdata.frame(Crime, index=c(“county”, “year”) )

Hand Coding Instumental Variables

In a previous post we discussed the linear model and how to write a function that performs a linear regression. In this post we will use that linear model function to perform a [Two-Stage Least Squares estimation]. This estimation allows us to […] Recall that we built the follow linear model function. {% highlight r %} ols <- function (y, X, intercept=TRUE) { if (intercept) X <- cbind(1, X) solve(t(X)%%X) %% t(X)%*%y # solve for beta } {% endhighlight %}

Linear Model and Neural Network

In this short post I want to quickly demonstrate how the most basic neural network (no hidden layer) gives us the same results as the linear model. First we need data {% highlight r %} data(swiss) str(swiss) {% endhighlight %} {% highlight text %} ‘data.frame’: 47 obs. of 6 variables: $ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 … $ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 … $ Examination : int 15 6 5 12 17 9 16 14 12 16 … $ Education : int 12 9 5 7 15 7 7 8 7 13 … $ Catholic : num 9.96 84.84 93.4 33.77 5.16 … $ Infant.Mortality: num 22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 … {% endhighlight %}

Hand Coding Hilton's Dropout

Andrew Trask wrote an amazing post at I am Trask called: A Neural Network in 11 lines of Python In the post Hand Coding a Neural Network I’ve translated the Python code into R. In a follow up post called: A Neural Network in 13 lines of Python Andrew shows how to improve the network with optimisation through gradient descent. The third post called: Hinton’s Dropout in 3 Lines of Python explains a feature called dropout. The R version of the code is posted below.

Hand Coding Gradient Descent

Andrew Trask wrote an amazing post at I am Trask called: A Neural Network in 11 lines of Python In the post Hand Coding a Neural Network I’ve translated the Python code into R. In a follow up post called: A Neural Network in 13 lines of Python Andrew shows how to improve the network with optimisation through gradient descent. Below I’ve translated the original Python code used in the post to R. The original post has an excellent explanation of what each line does. I’ve tried to stay as close quto the original code as possible, all lines and comments correspond directly to the original code.

WIOD data sets package

The wiod package is now available on CRAN. The package contains the complete WIOD data sets, in a format compatible with the decompr and gvc package. Installation The package can be installed using: {% highlight r %} install.packages(‘wiod’) # case sensitive! {% endhighlight %} Usage Following installation, the package can be loaded using: {% highlight r %} library(wiod) {% endhighlight %} Data can be loaded using the the data() function, using wiod followed by the last two digits of the required year, as the argument, e.g.

introducing diagonals

A new R package diagonals is now available on CRAN. The package implements several tools for dealing with fat diagonals on matrices, such as this one: {% highlight text %} [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] 1 1 1 1 0 0 0 0 0 0 0 0 [2,] 1 1 1 1 0 0 0 0 0 0 0 0 [3,] 1 1 1 1 0 0 0 0 0 0 0 0 [4,] 1 1 1 1 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 1 1 1 1 0 0 0 0 [6,] 0 0 0 0 1 1 1 1 0 0 0 0 [7,] 0 0 0 0 1 1 1 1 0 0 0 0 [8,] 0 0 0 0 1 1 1 1 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 1 1 1 1 [10,] 0 0 0 0 0 0 0 0 1 1 1 1 [11,] 0 0 0 0 0 0 0 0 1 1 1 1 [12,] 0 0 0 0 0 0 0 0 1 1 1 1 {% endhighlight %}

plot.ly

Quick experiment on embedding plot.ly graphics. {% highlight r %} library(ggplot2) library(plotly) {% endhighlight %} Basic plotly {% highlight r %} plot_ly(iris, x = Petal.Length, y = Petal.Width, color = Species, mode = “markers”) {% endhighlight %} {% highlight text %} Error in html_screenshot(x): Please install the webshot package (if not on CRAN, try devtools::install_github(“wch/webshot”)) {% endhighlight %} Now using ggplot2 {% highlight r %} ggiris <- qplot(Petal.Width, Sepal.Length, data = iris, color = Species) ggplotly(ggiris) {% endhighlight %}

Male/Female Bargaining Power and Child Growth

Increased male bargaining power in households causes greater expenditure on food, an improvement in Weight-for-Age Z-scores in young children, and a deterioration in Height-for-Age Z-scores in very young children, as observed in the context of South Africa’s 2010 state pension expansion for males. In 2010 the male eligibility age for the South-African state pension was brought to par with female eligibility age (60, previously 65). I exploit this policy change in order to estimate the effect of the increased male bargaining power in the household, on growth of young children living in the same household, as well as food expenditure. The policy change took place shortly after the completion of the first wave of South Africa’s National Income Dynamics Survey and shortly before the start of the second wave, which lends itself well for a Difference-in-Differences approach on the right hand side. On the left hand side I use z-scores of growth anthropometrics of young children in the household (against WHO standards) as well as food expenditure.

gvc package on CRAN

A new R package gvc is now available on CRAN. The package implements several global value chain indicators Importing to Export (i2e(), a.k.a. vertical specialization) Exporting to Re-export (e2r()) New Revealed Comparative Advantage (nrca()) As well as several other tools. The gvc package can now be install directly from R using: {% highlight r %} install.packages(“gvc”) {% endhighlight %} In addition to this, a development version is available on GitHub, this version is to be used at your own risk, it can be install using:

decompr on CRAN

I am proud to announce that after a few emails back and forth with Prof. Brian Ripley, which consisted mostly of me appologising for not following the proper procedure for submission, I received an email announing that my decompr package is now available on CRAN. The package can now easily be installed using: {% highlight r linenos %} install.packages(“decompr”) {% endhighlight %} The version published contains several updates, most importantly, I used a regional input-output table from the WIOD project, which is substantially smaller and makes the decompositions significantly faster.

Data Science Specialisation

Yesterday the Johns Hopkins School of Public Health published a post about their Data Science Specialisation on the online MOOC platform Coursera. The post metiones the first batch 266 students finishing the specialisation (among them, me :-) ). In total more than 800,000 people have registerd for one of the courses, out of which 14,000 finished at least one. The Specialisation The Data Science Specialisation consists of nine courses and a capstone project (which is was announced, but is yet to open for registration). The courses are:

Replicable Development Economics

The tagline of this blog says something replicable development economics using R and git. So far, I have posted gimmicks on new R tools such as shiny, rmarkdown, and my own package. Also, I have posted on how to use Git, Github, and Jekyll to write a website/blog. However, I have never brought the two together and how this features into creating replicable research. In this post I will briefly describe what Git and R are, and how I use them for my work. I hope to post something tomorrow about useful resources for mastering both these tools (tomorrow’s post).

A jekyll blog

What are jekyll, markdown, and git(hub)? and why would you need all of this for a blog, in stead of a simple Blogspot of Wordpress page? The short answer is more control, by having fewer and more transparent layers, you retain more control over the content and layout of your blog. Since launching this blog last week, I have received a number of question about how to set up something similar. Below I briefly describe the main steps for setting up a jekyll blog, and tomorrow I will go into the details of how I customised this one.

The decompr package

I am proud to announce the beta version of the decompr R package. The package implements Export Decomposition using the Wang-Wei-Zhu (Wang, Wei, and Zhu 2013) and Kung-Fu (Mehrotra, Kung, and Grosky 1990) algorithms. It comes with a sample data set from the WIOD project, and has its own mini site. Update, the decompr package in now available on CRAN, also announced in this post Inputs The package uses Inter-Country Input-Output (ICIO) tables, such as World Input Outpt Database (Timmer et al. 2012).

ggvis, shiny, and HTML5 slides

ggvis is wonderful new tool to create interactive graphics, which was build with Shiny apps in mind. In this post I will go over how you can create a Shiny app using ggvis and incorporating the ‘app’ in an rmarkdown slideshow (interactively). Sepal-Modeling is a shiny app (repo), which uses ggvis to fit LOESS smoothers on the sepal ratios of the iris dataset. There are separate smoothers for every species, as well as a general smoother for all observations. The span can be adjusted in order to see if we need to model the sepal ratio per species or if we can just model it jointly.

Hand Coding a Linear Model function

In yesterday’s post we developed a method for constructing a multivariate linear model with an intercept. Today we will turn the collection of loose commands into an integrated and easy to use function. A small recap from yesterday. We start by loading data and assigning our variables to objects {% highlight r %} data(iris) x1 <- iris$Petal.Width x2 <- iris$Sepal.Length y <- iris$Petal.Length {% endhighlight %} We now construction our linear model, the fastest way of doing this is using the QR decomposition.

Hand Coding Categorical Variables

In last week’s posts we discussed handcoding a linear model and writing a convenient function for this, in today’s post we will take this a step further by including a categorical variable. Swiss life Since I live in Geneva we will use a built-in data set that is close to home. {% highlight r %} data(“swiss”) {% endhighlight %} This data set compares fertility rates in 47 different French-speaking regions (sub-Cantonal) of Switzerland around the year 1888 (for more information see help("swiss")).