Canada Needs to Urgently Feed More Data into Healthcare AI Solutions During COVID-19

March 26, 2020
By Noel Courage and Ray Kovarik

For companies developing healthcare artificial intelligence (“AI”) solutions, a lot of quality input data is essential to allow the AI to learn fast and provide useful output. AI companies are IP-based companies, typically keeping their proprietary algorithms secret, and commercializing by providing services or selling a software product1.

The Power of AI in Healthcare

One example of a successful Canadian AI company is BlueDot, a digital health company that uses big data analytics to track and anticipate the spread of infectious disease. Its AI-driven algorithm reviews news, airline ticketing data, demographics, government statements and more to inform clients of risks. BlueDot had a successful US$7m Series A financing round in 2019. BlueDot reportedly sent its first warnings about COVID-19 to its clients on December 31, 2019. The Canadian government recently signed a deal with BlueDot to use its analytics platform to track and monitor the disease. AI can be applied other ways in an infectious disease outbreak. For example, AI can process large amounts of data that can be used to make decisions on allocation of healthcare resources and best practices for treatment. AI has also been used to suggest drug candidates. 

More Data Means Better AI Results

The recurring theme is that the larger the dataset that an AI-driven algorithm uses, the more accurate its output becomes. Belgium is a leader by combining data from telecom operators with health data to produce such a large dataset, all under the supervision of their Data Protection Authority (DPA). Taiwan has also done well by linking medical records on their national health insurance database with customs and immigration records. Canada should make more data about COVID-19 available to researchers and, to the extent possible, publicly available to everyone. Most of the country is hanging on daily provincial data about new coronavirus cases, which appears to be the only publicly available information2. However, this data is only a small snapshot, and there are questions about reliability. For example, Ontario has identified over 1000 cases of coronavirus as of March 25, 2020, but this number on reporting day is clearly under representative of the current situation3.

Data Gaps

As one basic example where more data is needed, at the time of writing this article, the data in Ontario’s daily COVID-19 updated web report does not currently include the number of patients hospitalized or in ICU’s4. This information is being made available to the media5, but it should be in the daily report. Complete, transparent data should also be compiled and provided on demographics, comorbidities (underlying health conditions), as well as full details such as whether/how the case is believed to be linked to community spread (what geography) or travel (the Ontario online data provides some of these details, but is not up to date on case reports)6. New York is little farther ahead on providing more detailed information and its reporting would be a short term goal, and then build from there. We do recognize that there are serious logistical and resource issues in compiling and reporting data when healthcare resources are being overwhelmed by an outbreak, and we greatly appreciate the efforts of public healthcare organizations and government. 

This absence of basic information scratches the surface of how much more data is not yet publicly available. Quality data is everything, whether AI is used or not. For example, the absence of rigorous testing and comprehensive public data can mean that there is underestimation of the number of contagious COVID-19 carriers in the community7, giving the public a false impression. Younger demographics may believe they are at low risk if they see mortality data (low risk of youth death) but don’t see breakdowns of hospitalization and ICU admissions. Policy decisions, such as shutting down-business, become more difficult without full information. AI outputs can themselves be useless or misleading, if AI systems are not properly taught by data.

Bridging the Data Gaps – How to Feed Data into AI Healthcare Solutions

Providing More Data

On the ground, physicians are calling for comprehensive basic data to help them in their daily practices8. Even comprehensive basic data would help inform conventional assessment tools, treatment plans and community outreach/education. Governments must prioritize meeting the needs of those on the front lines of healthcare, and also provide additional data to support AI work. A recent news report in the Toronto Star states that AI companies are calling out for more data. It flags several issues with government data sharing that must be addressed. 

Collaboration is Critical – Sharing Data and Resources

To combat the spread of COVID-19, the government, the scientific community and private business are pooling resources to provide the big data that makes AI so effective. For example, the Vector Institute in Toronto has highlighted how this can be done for a number of purposes. One example is AI-driven development of small molecule therapeutics for SARS-CoV-2 (the virus that caused the COVID-19 disease)9. The Vector Institute has also provided a list of COVID-19 Research Tools, which includes links to datasets and existing AI tools available to the public.

Applying the Data to COVID-19

Data needs to be made available so that the power of AI can be used to inform healthcare decisions. BlueDot tracking disease spread is one example. Self-assessment tools are currently lacking, but AI could learn about symptoms and risks10. Drug combinations could be matched up with patient outcomes to help inform treatment, to bridge the gap while waiting for formal clinical trial study results. Treatment of new patients with ventilation and other measures could be informed by outcomes of prior patients.  As discussed above, the Vector Institute has provided links to current resources and tools around the world. At a more local level, ICES, a non-profit corporation in Ontario, has a data repository of de-identified and linkable health data that could be invaluable for Ontarians (and possibly the rest of Canada). The ICES data repository contains 20 million patient life histories with 500 billion data points that can be used as a dataset for ICES’ AI-driven analysis. Not only could the use of this data help identify where COVID-19 could strike next, but it will also be able to drill-down into other data about patient assessments, treatments, and outcomes. Governments need to consult with these AI leaders on an ongoing basis to support their efforts. Canada can’t develop all potentially useful AI from scratch, so the globe should be scoured for AI solutions that exist, and local development work should be considered where there is no existing technology.

The common denominator for all AI solutions is that they need to be fed as much data as available to develop solutions. Patient privacy laws still must be respected, for example, by using anonymized data that cannot be tracked back to a particular patient. However, it is critical for governments to collect comprehensive data and make it to their AI collaborators and the general public, to the fullest extent possible.

1 Patents may be filed on some aspects of AI inventions, but this is a complex area where a patent attorney should be consulted for guidance.

2 Researchers and entities that have data sharing agreements with government and healthcare authorities are would be privately privy to more information.

3 Dr. J Kwan (@jkwan_md), March 25, 2020:

5 680 News.

6 At the time of writing, many cases in the Ontario government daily report on coronavirus do not provide details on region or means of transmission.

7 Dr. Yoni Freedhoff (@YoniFreedhoff), March 25, 2020.

8 Dr. Kulvinder Kaur (@dockaurG), March 25, 2020:

9 Vector Institute COVID-19 Updates, March 25, 2020:

10 Dr. Yoni Freedhoff (@YoniFreedhoff), March 25, 2020.

Subscribe to our newsletter

You can unsubscribe at any time. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This site is registered on as a development site.