Consolidate data


When you download data from Epicollect5, the system creates a file for each form and for each branch. Epicollect5 unique identifiers are added to the data set to link the data together.

Since each project requirements are different, we leave the data consolidation down to the user in the post processing of data.

Epicollect5 identifiers

Each hierarchy form data set will have a colum called "ec5_uuid" with a unique identifier per each row. Any child form down the hierachy will also have a column called "ec5_parent_uuid" to reference data from its parent form.

Each branch data set will have a column "ec5_branch_owner_uuid" which will reference each "ec5_uuid" of a hierarchy form.

The values (which look like d559da55-0df3-4121-8db0-5870d4faf038) are system generated identifiers used to keep the relationships on the data.

These identifiers are always present on any downloaded data sets.

Consolidate data using Google Sheets

We will use Google Sheets for this example, but the same concept can be applied to Excel, Apple Numbers and similar.

An Excel example using VLOOKUP function is shown below.

What follows is just a simple example about how to merge data coming from three files (one hierarchy form and two branches) based on our EC5 Branches Project example project.

First of all, let's download the data in CSV format for that project and save it somewhere handy. The downloaded .zip will contain 3 CSV files:

  • form-1__form-1.csv
  • branch-1__list-your-family-members.csv
  • branch-2__list-your-pets.csv

We created a new spreadsheet and imported the above mentioned files, one per each sheet tab, doing FILE > IMPORT

We use the following settings per each file:

Please make sure you add each file in its own tab sheet. To create a new tab sheet, just click on the "+" button at the bottom left:

We also called our tab sheets "form_1", "branch_family_members", "branch_pets". This is to make easier to reference them in our formulas.

On the "form_1" tab sheet, create a new column called "Family Members" where we will fetch the data from the "branch_family_members" tab sheet. Click on the fist empty cell of that column to select it:

On that cell we add this formula: (if you like to know more about Google Sheets and available formulas, the docs are here)

=iferror(JOIN(", ", QUERY( branch_family_members!$A:$F , "select E where A = '"&$A$2:$A&"'")))

We wrap everything in iferror() to fetch data only if there are some branches for that row, otherwise leave the cell empty.

We use JOIN("," {QUERY(...)}) to concatenate the family members data found with a comma.

In the QUERY statement, we first reference the "branch_family_members" from column A to F, basically all the columns in the "branch_family_members" sheet. On the right side of the QUERY statement, we search for data on column E on that tab sheet (3_Family_member_name) where the column A (ec5_branch_owner_uuid) matches column A on the form_1 tab sheet (ec5_uuid)

The values of "ec5_branch_owner_uuid" and "ec5_uuid" are the relationship link between the data sets.

The result is this:

After the formula gest copied to all the cells, it will look like this:

Using the same steps as above, we can fetch the data from our "branch_pets" tab sheet.

The formula gets updated to

=iferror(JOIN(", ", QUERY( branch_pets!$A:$F , "select F where A = '"&$A$2:$A&"'")))

The final result will be like below:

Awesome!

The final spreadsheet is available here for you to view.

The project used in this example is here.

Consolidate data in Excel using the VLOOKUP function

VLOOKUP is an Excel function to lookup and retrieve data from a specific column in table. VLOOKUP supports approximate and exact matching, and wildcards (* ?) for partial matches. The "V" stands for "vertical". Lookup values must appear in the first column of the table, with lookup columns to the right. More info.

We are going to use the project EC5 VLOOKUP in EXCEL for this example. Download its data (2 files, form-1__class.csv and form-1__student.csv)

The project is really simple, there is CLASS > STUDENT hierachy, one parent form and one child form. We just want to add CLASS entries and all the STUDENT entries attending each class. Obvioulsy, a student can attend more classes.

Let's import the data into Excel, one file per sheet: (See how to do it)

Select the STUDENT sheet. Create a new colum called Class and select its first empty cell:

Let's add the VLOOKUP formula. We would like to show the class name next to each student. The class name can be found on the CLASS sheet.

The formula will look like:

=VLOOKUP(B2,CLASS!A$2:E$4,5,FALSE)

More info on the formula and its arguments can be found here.

For this example, it is basically saying:

  • B2: look for the B2 value (ec5_parent_uuid)
  • CLASS!A$2:E$4: the whole CLASS sheet, all cells
  • 5: return value of column 5 from the CLASS sheet when found (1_Class_name column)
  • FALSE: look for an exact match

The value of "Maths" is returned:

Copying the formula down to the whole column, all class names are returned:

Another simple tutorial about Excel VLOOKUP here.

results matching ""

    No results matching ""