The Clean Up Crew

From mn/ifi/inf5750
Jump to: navigation, search

Group members

  • Nikolai Hegelstad nikolheg
  • Milena Tosic milenato
  • Eirik Berg Nordheim eirikbno

Repository link


  • First meeting - 7th of October
  • Created wiki page - 25th of October
  • First project meeting - 25th of October
  • Project deadline - 29th of November
  • Project presentation - 7th of December

Problem, interpretation

The task, A, is made up of two subtasks:

  • Find duplicate Singleton events.
  • Find duplicate Tracked Entity Instances.

To our understanding, the singletons are mainly minor tasks, not linked to a specific person. And the duplicates often occur when you have personnel that are manually transfering paper forms to the dhis2 system without being properly informed of the fact that other personnel, maybe the person working an earlier shift, already performed this task. We therefore assume that singleton events are exact duplicates, i.e. without misspellings (or at least few in general).

The TEIS on the other hand are mainly due to misspellings. For instance: A doctor asks for a patients name to look him up in the database, the patient replies "Bob Bobson" but the doctor looks up "Bob Robson", since he can't find a "Bob Robson" he desides to created a new instance for this person and we end up with two TEI's representing the same person, the original "Bobson" and the duplicate "Robson". We are therefore mainly interested in doing a fuzzy string search on the TEI's first and last name to check for duplicates. Of course there are many other fields which may or may not help with verifying that these two are indeed the same person, but our algorithm will only focus on the misspelling of the first + last name and leave the comparison of the other attributes to the user for visual confirmation.

Our app is supposed to find these duplicates, let the user of the program confirm / deny that these are in fact duplicates, and then export these duplicates to be handled by an administrator later on.


Singleton events

  • the search is performed within specified clinic
  • we don't perform search across different programs
  • we search for duplicates during the specified time period
  • two singleton events are marked as duplicates if they have exactly the same dataElements

Tracked Entity Instances

  • the search is performed within specified clinic or chiefdom
  • two TEIs are marked as duplicates if they have similar first name and last name.
  • For a more thorough search, use Maiden name, TB Number and National Identifier.

Summary of requirements

Functional requirements:

  • The user should be given some initial information and be able to chose between searching for Singletons or TEIS. - Acceptance test result: Accomplished
  • The user should then be redirected to the appropriate functionality. Acceptance test result: Accomplished
  • The user should be prompted to select an organisation unit of appropriate level and the search for duplicates should start. Acceptance test result: Accomplished
  • The user should then be displayed the duplicates for visual confirmation and be able to select/unselect true / false positives. Acceptance test result: Accomplished
  • Finally the user should be able to export the singletons as JSON file. Acceptance test result: Unaccomplished

Time schedule

Tentative meeting schedule
hrs Monday Tuesday Wednesday Thursday Friday
10-12 x
14-16 x x

During the project period, every member strive to work approx. 2 hours a day on the project.

This was upheld for the whole period with the exception of the exam period.

How we are dividing tasks within the group

Our approach to completing this project are divided into two parts.

  • Part 1:

During this part we will collaborate as much as possible to create a MVP that serves as the foundation that we will be able to build on.

Furthermore, gaining knowledge about the system, React and javascript in general will be fundamental to this part.

  • Part 2:

During this part we define clear tasks for each group member based on their wishes and past experiences. This will be done during the group meetings.

Singleton algorithms: Milena

Tracked Entity Instance algorithms: Eirik

Layout, Redux: Nikolai

Screenshots and screen flows

Early sketch:

Early sketch.png
Early model:

Final model:


Documented learning during project

  • Javascript ES6
  • React - Learned a tremendous amount
  • Redux - Learned a tremendous amount
  • Fuzzy string algorithms
  • NPM ecosystem

Suggested improvements to the API

  • Mark offline registries (android) as probable duplicates.
  • Validate upon registration whether a patient is a possible duplicate.

Unresolved issues

Exporting data to the DHIS dataStore didn't work.