Variables and types in


Marie-Hélène Burle

training@westgrid.ca
June 02, 2020

Our first script


In this course, we will use Covid-19 data to become familiar with Julia’s syntax and functioning.

In this lesson, we will write our first Julia script to load and transform data.
As we do so, we will discuss variables and types in Julia.

Create a file (name and location of your choice) with the extension .jl

Julia scripts are text files and—by convention—have that extension.

Load packages


First, we need to load some packages.
This will take some time as Julia will pre-compile newly installed or updated packages. Next time you load these packages, it will go a lot faster.

using CSV
using DataFrames
using Dates
using JLD


Dates is a package from the standard Julia library (it was installed when you installed Julia).
The other packages are packages that you should have installed .

Variables


In Julia, variables are names bound to a values.

These names are extremely flexible and can use any Unicode character.

The only rules are:

  • they must begin with a letter or with an underscore (a few of the Unicode characters are also accepted)
  • they cannot be the names of built-in statements such as if , do , try , else

Variables


The Julia Style Guide recommends the following conventions:

  • use lower case
  • word separation can be indicated by underscores, but better not use them if the names can be read easily enough without them

Variables


The first variable we are creating is bound to a string.

That string is the path (in your system) of the file time_series_covid19_confirmed_global.csv 1

relative to the directory in which your Julia session is running
or
its absolute path.

How can you easily tell in which directory your Julia session is running?

1(Part of the data that you should have installed.)

Variables


This is what it looks like on my system
(replace it with the proper path on your machine!)

file = "../../data/covid/csse_covid_19_data/csse_covid_19_time_series/" *
    "time_series_covid19_confirmed_global.csv"


Note:
* in Julia allows string concatenation, so I used it to break the very long path name. You don’t have to do that.

Ending semi-colon


You might have noticed that Julia returns the value, even when you assign it to a variable (this is different from the behaviour of R and Python).

To avoid this, add a semi-colon (; ) at the end:

file = "../../data/covid/csse_covid_19_data/csse_covid_19_time_series/" *
    "time_series_covid19_confirmed_global.csv";

Types


I mentioned that our first variable was a string.

So let’s talk about types in Julia.

Note that variables don’t have types since they are simply names bound to values. Values have types.

2 main type systems


Static type-checking

Type safety (catching errors of inadequate type) performed at compile time

Dynamic type-checking

Type safety done at runtime

Types


Julia’s type system is dynamic (types are unknown until runtime), but types can be declared, optionally bringing the advantages of static type systems.


This gives users the freedom to choose between an easy and convenient language, or a clearer, faster, and more robust one (or a combination of the two).

To know the type of an object, use typeof

Type declaration


Done with ::

<value>::<type>


Example:

2::Int

Load the data


Now that we have the path of our file, we can create a new variable.

This one is a dataframe and holds the confirmed Covid-19 data:

dat = DataFrame(CSV.File(file))

Let’s explore the data


Some useful functions:

typeof(dat)

names(dat)

size(dat)
nrow(dat)
ncol(dat)

Indexing


Without copying (changes made to it will change dat ):

dat[!, 1]
dat[!, "Province/State"]
dat[!, :"Province/State"]
dat."Province/State"


Making a copy (changes made to it will not change dat ):

dat[:, 1]

Indexing


How could you index the 3rd row without copying?
How could you index the 3rd row making a copy?
The cell of the 2nd row and 4th column?
(note: this does not make a copy)

Converting type


typeof(dat."Province/State")

This weird type is not what we want. We want a String

The function string converts a value to a String


Before applying it to our vector, let’s play with this function a little:

string([1, 2, 3])
Is this what we want?
How can we address this?

Converting type


Which form of indexing (with or without copy) do we want to use to convert our column?
Write the code to convert our first column to String
Are there other columns with weird types we need to convert?

Select and order columns


Let’s get rid of columns we won’t use and bring the country column to the left:

select!(dat, vcat(2, 1, collect(5:ncol(dat))))
What does ! do?
Why are we using it here?
How can you find out what vcat does?
Try to understand this line of code by playing with it

Rename columns


The function rename uses dictionaries:


rename!(dat, Dict(1 => :country, 2 => :province))

or

rename!(dat, Dict([(1, :country), (2, :province)]))

Long format


Let’s transform our data frame into long format:

datlong = stack(dat, Not([:country, :province]),
                variable_name = :date,
                value_name = :confirmed)

Convert date


This does not look good:

datlong.date


We want the date to look like YYYY-MM-DD and to be of type Date

datlong.date = Date.(replace.(string.(datlong.date),
                              r"(.*)(..)$" => s"\g<1>20\2"),
                     "m/dd/yy")

Our final data frame


Let’s have a look at our final cleaned data frame:

datlong

Save object


In the next session, you will start from here.

There are various approaches to do this:

  • you could re-run this script before starting to run the next one
  • you could load this script within the next one (with include(“this_script.jl”) )
  • you could save datlong in a Julia data file and load it in your next script

Let’s do that third option.

Save object


For this, we are using the package JLD which allows to save and load Julia data in .jld files.

Note that a single .jld file can contain several objects.

save("covid.jld", "confirmed", datlong)

This will save the file covid.jld in the working directory of the REPL.

You can save it elsewhere by giving an absolute or relative path instead of just a file name. For instance, on my machine, this is where I am saving it:

save("../../data/covid.jld", "confirmed", datlong)