Description
Assignment
1. Attend Unit 6 live lecture online via Blackboard Collaborate.
2. Complete the assigned readings, follow links to online materials, and listen to lecture recording as necessary.
3. By the deadline, complete and post the following Data Science / System Architecture Challenge deliverables to the assignment folder in this unit.
A newly minted technology startup with a good group of angel investors providing sufficient financial infusion into software and infrastructure has hired you as a Solutions Architect to be part of a team working on a novel idea connecting previously disconnected data types across health care technology segments.
The idea is as follows:
Most life sciences companies currently use clinical trials as their sole basis for evaluation of a researched drug performance, its benefits, its side effects, and its predicted market success. From pre-clinical trials to post-clinical trials, all data is contained within a single standalone computer application, where it is thoroughly evaluated/massaged/analyzed using a group of patient volunteers who consent into a clinical study. This means a few dozen to several hundred patients, typically, before the product is released to higher distributions as part of the latest clinical trial phases and subsequently FDA approved market, i.e. retail pharmacies. The problem: life sciences companies have little visibility into the “real life” drug performance, except for anecdotal evidence, periodic observations, periodic reports, and big lawsuits and newspaper headlines when something goes wrong. FDA decisions to take drugs off the market could push smaller life sciences companies into bankruptcy, especially if a company relies on one or a few drugs as its lifelines. We want to change this and turn the world of drug development upside down, in a positive sense – with Big Data. How? This is something you will be responsible to answer.
Your goal is to create a high-level architecture for the system that will analyze real-life drug performance in the market using Electronic Medical Record (EMR) data from providers. You will contract with providers to pull in their data from any EMR regardless of the vendor, bring it into a batch, process, normalize, and make available to your Data Scientists internally for evaluation. More specifically, you are looking for certain patterns indicative of an issue such as side effects, collecting information about details and quantity of those side effects, and reporting on a certain set of attributes selected by analyst to address the research question about a drug and its real-life performance. You will use a specific drug, called Darvocet, (that was taken off the market for a specific reason in the past), as your pilot for evaluating whether the system works and how the system works. Assume a few pilot provider sites may participate in your study. They will gain the first adopter benefits and related discounts for a finished product, should you be successful. Your general steps are as follows. Your detailed steps are completely open to your interpretation, based on your research, attendance of a lecture related to Unit 6, and the readings.
Find and research Darvocet. Pay special attention to its purpose, intended clinical goals and patients, side effects, and reasons it was taken off the market. Provide 1 to 2 pages write-up with corresponding supporting literature as outcomes of your research. 5 points.
Determine how you would structure your system to (a) extract relevant clinical data from provider EMRs, (b) process data at the arrival point when it is loaded in bulk into from various EMR sources into your system, (c) store the data at the arrival point, (d) analyze data inside your database, and (e) supply relevant reports to your life sciences clients. Describe your logic. 5 points.
Develop an architecture diagram of your data flows for an architecture of your choice, using a software application of your choice. Once created in an application of your choice, i.e. Visio (free from UIC webstore), Gliffy, Lucid Chart, OmniGraffle, etc. – please convert to PDF prior to submission in Blackboard. 5 points.
Define clinical data you need, clinical vocabularies to retrieve data, and specific code examples. Please note that you do not need to supply an all-exhaustive list of all codes, but 2-5 examples would be sufficient.
Here is an example of a data table format you could use to deliver outcomes of your research:
Data Type
Data Transport Mechanism
Vocabulary Type
Specific Code
Briefly Justify / Explain
Note: data transport mechanism means the medium of delivery, or how, via which data interoperability method the data gets from provider source into your analytics application.
Table = 5 points.
In conclusion, explain how and why your system would work, represent an innovation, and justify value for your clients. Do remember that you need to return value to your health care providers who signed up as early testers, in addition to your “primary” clients in life sciences. The question you will strive to answer is, if you had this pilot in your hands before Darvocet was pulled off the market, how could you either prevent it or help your client improve the drug by the ways of supplying early trouble indicators and feeding into the Version 2 development process? 5 points.
Your deliverables for this assignment are:
Word document with answers to points (1), (2), (4), and (5). Include your name and unit number on the document. Submit to assignment folder in this unit.
PDF with an architecture diagram as an answer to point (3). Include your name and unit number on the document. Submit to assignment folder in this unit.
Note 1: you are a designer starting with a clean sheet, so if there is a ton of ambiguity in this assignment, then this is the way it was intended to be. This is the situation you should expect in the job marketplace when you enter or re-enter the workforce. Successful data scientists do not attempt to solve or improve existing solutions. They either resolve known big challenges or create new innovations.
Note 2: please remember to cite and reference in APA, not to plagiarize, and avoid using resources representing someone’s unverified opinions such as Wikipedia and online blogs. Professional literature can be used, but carefully scrutinized for quality and reputation of the knowledge source.
See Attached document for reading
Unformatted Attachment Preview
The New World of Healthcare
Data Science: Blending Data,
Environments, and Creating
Disruptive Business Models
Jacob Krive, PhD
Biomedical and Health Information Sciences
University of Illinois at Chicago
The “new” old wave
• Just a short time ago, the “new way” was EMR replacement of paper
• In provider environments, EMRs become legacy “base” environments
where primary data resides, gathered at the point of care
• In Pharma, clinical studies were performed in single isolated
computer applications: no sharing, no (automated) cross-study
analysis, no link to real-time provider data
• Provider, Pharma, and Retail are separate “old” worlds: no data
linkage
• Can we do better and think different in these different times?
Phased drug development
How can we do better?
• Shorten drug development process
• Ensure market success of the drug longer term
• Predict and prevent drug market flops
• Prolong life of a drug in the market
• Detect and prevent harmful side effects
• Pull drugs off the market early to harm fewer patients, when
necessary
• Integrate and blend data confined in the points of care systems with
R&D data confined in the pharma applications
Data integration: Proliferation of the medical
Data
research
clouds
Source
Clinical Data
Transport
Consent
Management
Data Lake
Data DeIdentification
Data
Reservoir
Data
Anonymization
Data
Integration
Data Ocean
HIPAA
(Providers)
GxP
(Life Sciences)
It is not just text data anymore: Image classification
and model recognition architectures
Integration of the healthcare data domains
Provider
EMR
Provider
EMR
data transport
data transport
Data batch
data normalization
mobile
applications
computer
applications
Big Data: Hadoop, HBase, HDFS, Datamart
data processing
Patients
Clinical
Investigators
Data
Scientists
Life Sciences
Applications
Data no longer has to be confined
• Data transport mechanisms: HL7, FHIR, XML, JSON
• Medical data vocabularies: SNOMED, LOINC, CPT, ICD, RxNorm
• Data normalization and data processing
• Data visualization
• Data can move in multiple directions, a true exchange – not just
upload or download or a single translation
• Let’s discuss your assignments
Questions and Discussion
Unit 6 Readings/Resources
Attached Files:
•
New World of Data Science Lecture Krive 2017.pdf New World of Data Science Lecture Krive 2017.pdf Alternative Formats (514.553 KB)
Unit 6
Required
Ebook – Reddy, Chandan K., Aggarwal, Charu C. (Ed.) (2015). Healthcare Data Analytics. United States:
Boca Raton, Florida, CRC Press, Taylor and Francis Group. ISBN: 978-1-482-23211-0. – chapters 11 and
18.
Download the chapters from the links provided:
Chapter 11: Temporal Data Mining for Healthcare Data
http://proxy.cc.uic.edu/login?url=https://www.taylorfrancis.com/books/9781482232127/chapter
s/10.1201%2Fb18588-17
Chapter 18: Data Analytics for Pharmaceutical Discoveries
http://proxy.cc.uic.edu/login?url=https://www.taylorfrancis.com/books/9781482232127/chapter
s/10.1201%2Fb18588-25
Here is the link for access to the entire book, if needed.
http://proxy.cc.uic.edu/login?url=https://www.taylorfrancis.com/books/9781482232127
Krive, J. (2017). The new world of healthcare data science: Blending data, environments, and creating
disruptive business models.
course. link at top of page.
Recommended
Cloud Computing
1. Stokes, D. (2013). Compliant cloud computing-managing risks. Pharmaceutical Engineering 33,(4), 111.
https://docplayer.net/1761713-Compliant-cloud-computing-managing-the-risks.html
2. Smith, R. (2011). Storm clouds? Cloud computing in a regulated environment. Journal of GXP
Compliance, 15(4),71-76.
http://proxy.cc.uic.edu/login?url=https://search.proquest.com/docview/905946024/fulltextPDF/9246E4FE8
60749ABPQ/1?accountid=14552
3. Driscoll, A., Daugelaite, J., Sleator, R. (2013). Big data, Hadoop and cloud computing in
genomics. Journal of Biomedical Informatics, 46, 774-781
http://proxy.cc.uic.edu/login?url=http://www.sciencedirect.com.proxy.cc.uic.edu/science/article/pii/S153204
6413001007?via%3Dihub
4. Lougheed, C., Jain, A., Meil, D., Jarrell, B. (2014). U.S. Patent No. 2014/0032240 A1. Washington, DC:
U.S. Patent and Trademark Office.
https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20140032240.pdf
5. Archtecting for HIPAA security and compliance on Amazon web services (2017), Retrieved from
https://docs.aws.amazon.com/whitepapers/latest/architecting-hipaa-security-and-compliance-onaws/architecting-hipaa-security-and-compliance-on-aws.html
6. FDA resource on Real World Evidence. (2020). Retrieved from https://www.fda.gov/scienceresearch/science-and-research-special-topics/real-world-evidence
Unit 6
Assignment
1. Attend Unit 6 live lecture online via Blackboard Collaborate.
2. Complete the assigned readings, follow links to online materials, and listen to lecture recording as
necessary.
3. By the deadline, complete and post the following Data Science / System Architecture Challenge
deliverables to the assignment folder in this unit.
A newly minted technology startup with a good group of angel investors providing sufficient financial
infusion into software and infrastructure has hired you as a Solutions Architect to be part of a team
working on a novel idea connecting previously disconnected data types across health care technology
segments.
The idea is as follows:
Most life sciences companies currently use clinical trials as their sole basis for evaluation of a researched
drug performance, its benefits, its side effects, and its predicted market success. From pre-clinical trials to
post-clinical trials, all data is contained within a single standalone computer application, where it is
thoroughly evaluated/massaged/analyzed using a group of patient volunteers who consent into a clinical
study. This means a few dozen to several hundred patients, typically, before the product is released to
higher distributions as part of the latest clinical trial phases and subsequently FDA approved market, i.e.
retail pharmacies. The problem: life sciences companies have little visibility into the “real life” drug
performance, except for anecdotal evidence, periodic observations, periodic reports, and big lawsuits and
newspaper headlines when something goes wrong. FDA decisions to take drugs off the market could
push smaller life sciences companies into bankruptcy, especially if a company relies on one or a few
drugs as its lifelines. We want to change this and turn the world of drug development upside down, in a
positive sense – with Big Data. How? This is something you will be responsible to answer.
Your goal is to create a high-level architecture for the system that will analyze real-life drug performance
in the market using Electronic Medical Record (EMR) data from providers. You will contract with providers
to pull in their data from any EMR regardless of the vendor, bring it into a batch, process, normalize, and
make available to your Data Scientists internally for evaluation. More specifically, you are looking for
certain patterns indicative of an issue such as side effects, collecting information about details and
quantity of those side effects, and reporting on a certain set of attributes selected by analyst to address
the research question about a drug and its real-life performance. You will use a specific drug,
called Darvocet, (that was taken off the market for a specific reason in the past), as your pilot for
evaluating whether the system works and how the system works. Assume a few pilot provider sites may
participate in your study. They will gain the first adopter benefits and related discounts for a finished
product, should you be successful. Your general steps are as follows. Your detailed steps are completely
open to your interpretation, based on your research, attendance of a lecture related to Unit 6, and the
readings.
1. Find and research Darvocet. Pay special attention to its purpose, intended clinical goals and
patients, side effects, and reasons it was taken off the market. Provide 1 to 2 pages write-up with
corresponding supporting literature as outcomes of your research. 5 points.
2. Determine how you would structure your system to (a) extract relevant clinical data from provider
EMRs, (b) process data at the arrival point when it is loaded in bulk into from various EMR
sources into your system, (c) store the data at the arrival point, (d) analyze data inside your
database, and (e) supply relevant reports to your life sciences clients. Describe your logic. 5
points.
3. Develop an architecture diagram of your data flows for an architecture of your choice, using a
software application of your choice. Once created in an application of your choice, i.e. Visio (free
from UIC webstore), Gliffy, Lucid Chart, OmniGraffle, etc. – please convert to PDF prior to
submission in Blackboard. 5 points.
4. Define clinical data you need, clinical vocabularies to retrieve data, and specific code examples.
Please note that you do not need to supply an all-exhaustive list of all codes, but 2-5 examples
would be sufficient.
Here is an example of a data table format you could use to deliver outcomes of your research:
Data Type
Data Transport
Mechanism
Vocabulary
Type
Specific Code
Briefly Justify /
Explain
Note: data transport mechanism means the medium of delivery, or how, via which data interoperability
method the data gets from provider source into your analytics application.
Table = 5 points.
5. In conclusion, explain how and why your system would work, represent an innovation, and justify
value for your clients. Do remember that you need to return value to your health care providers
who signed up as early testers, in addition to your “primary” clients in life sciences. The question
you will strive to answer is, if you had this pilot in your hands before Darvocet was pulled off the
market, how could you either prevent it or help your client improve the drug by the ways of
supplying early trouble indicators and feeding into the Version 2 development process? 5 points.
Your deliverables for this assignment are:
a. Word document with answers to points (1), (2), (4), and (5). Include your name and unit number
on the document. Submit to assignment folder in this unit.
b. PDF with an architecture diagram as an answer to point (3). Include your name and unit number
on the document. Submit to assignment folder in this unit.
Note 1: you are a designer starting with a clean sheet, so if there is a ton of ambiguity in this assignment,
then this is the way it was intended to be. This is the situation you should expect in the job marketplace
when you enter or re-enter the workforce. Successful data scientists do not attempt to solve or improve
existing solutions. They either resolve known big challenges or create new innovations.
Note 2: please remember to cite and reference in APA, not to plagiarize, and avoid using resources
representing someone’s unverified opinions such as Wikipedia and online blogs. Professional literature
can be used, but carefully scrutinized for quality and reputation of the knowledge source.
Minute Paper
Please complete the minute paper by the deadline.
Purchase answer to see full
attachment