Consolidate data


When you download data from Epicollect5, the system creates a file for each form and for each branch. Epicollect5 unique identifiers are added to the data set to link the data together.

Since each project requirements are different, we leave the data consolidation down to the user in the post processing of data.

Epicollect5 identifiers

Each hierarchy form data set will have a colum called "ec5_uuid" with a unique identifier per each row. Any child form down the hierachy will also have a column called "ec5_parent_uuid" to reference data from its parent form.

Each branch data set will have a column "ec5_branch_owner_uuid" which will reference each "ec5_uuid" of a hierarchy form.

The values (which look like d559da55-0df3-4121-8db0-5870d4faf038) are system generated identifiers used to keep the relationships on the data.

These identifiers are always present on any downloaded data sets.

Consolidate data using Google Sheets

We will use Google Sheets for this example, but the same concept can be applied to Excel, Apple Numbers and similar.

What follows is just a simple example about how to merge data coming from three files (one hierarchy form and two branches) based on our EC5 Branches Project example project.

First of all, let's download the data in CSV format for that project and save it somewhere handy. The downloaded .zip will contain 3 CSV files:

  • form-1__form-1.csv
  • branch-1__list-your-family-members.csv
  • branch-2__list-your-pets.csv

We created a new spreadsheet and imported the above mentioned files, one per each sheet tab, doing FILE > IMPORT

We use the following settings per each file:

Please make sure you add each file in its own tab sheet. To create a new tab sheet, just click on the "+" button at the bottom left:

We also called our tab sheets "form_1", "branch_family_members", "branch_pets". This is to make easier to reference them in our formulas.

On the "form_1" tab sheet, create a new column called "Family Members" where we will fetch the data from the "branch_family_members" tab sheet. Click on the fist empty cell of that column to select it:

On that cell we add this formula: (if you like to know more about Google Sheets and available formulas, the docs are here)

=iferror(JOIN(", ", QUERY( branch_family_members!$A:$F , "select E where A = '"&$A$2:$A&"'")))

We wrap everything in iferror() to fetch data only if there are some branches for that row, otherwise leave the cell empty.

We use JOIN("," {QUERY(...)}) to concatenate the family members data found with a comma.

In the QUERY statement, we first reference the "branch_family_members" from column A to F, basically all the columns in the "branch_family_members" sheet. On the right side of the QUERY statement, we search for data on column E on that tab sheet (3_Family_member_name) where the column A (ec5_branch_owner_uuid) matches column A on the form_1 tab sheet (ec5_uuid)

The values of "ec5_branch_owner_uuid" and "ec5_uuid" are the relationship link between the data sets.

The result is this:

After the formula gest copied to all the cells, it will look like this:

Using the same steps as above, we can fetch the data from our "branch_pets" tab sheet.

The formula gets updated to

=iferror(JOIN(", ", QUERY( branch_pets!$A:$F , "select F where A = '"&$A$2:$A&"'")))

The final result will be like below:

Awesome!

The final spreadsheet is available here for you to view.

The project used in this example is here.

results matching ""

    No results matching ""