The previous year has brought many exciting developments. We moved back to North America, after living abroad in Canada and Germany for eight years. My spouse returned to her medical career at the local hospital, and I am very happy in my new role as a Data Scientist at Lexical Intelligence/National Institutes of Health (Office of Portfolio Analysis). We are currently on the market to purchase a home, and are thriving in our new jobs. I’ve been occupied by my research, and have been thinking about how to make this space productive and insightful. So, I’d like to take a minute to share some ideas and some new directions for this site.
As this site moves forward, I plan to focus solely on the emergent field of Data Science (DS). The field of DS, however, is vast, dynamic, and challenging. What excites me most about DS is the methodological rigor involved, and the expanse of intellectual disciplines incorporated within DS. Two facets of DS, in particular, fascinate me most.
First are the epistemological constructs involved. Insofar as DS seeks to analyze data, to discover and tell the stories of the data, and to translate those stories into objective metrics and performance, the data scientist must engage with and utilize coding, data models, statistics, linear algebra, calculus, machine learning, deep learning, visualizations, graph theory, networks, human psychology, and story telling. The overall objective isn’t to study data for data’s sake, but to implement a data-driven decision model to inform policy makers and stakeholders. The work of DS is an empiricist’s heaven – an oxymoron for sure! It’s a discipline for those who like to learn and develop their understanding of knowledge and empirical reasoning.
Second are the various fields of research involved in a challenging DS project. Sure, we can fire up a Jupyter Notebook and pretend we are doing DS with the Titanic problem set, but such a data set does not emulate the various challenges posed by a research agenda that must wrangle data, clean data, use exploratory data analysis to iterate over the data, mine the text(s)/data, begin to discover hidden insights about the topic, ask new questions, know when to use machine or deep learning, whether computer vision might help in ways that vision isn’t customarily perceived, etc. The goal is to explicate the research objectives, spell out the findings, challenge erroneous intuitions (if needed), and establish a knowledge base.
As I take this blog in a new direction, I will purge it of its older research topics of material studies of late antiquity. I will convert the older posts to a PDF and post them on academia.edu. What is more, there will be an application on this site dedicated to the material study of late antiquity, where I publish my editions of the Community Rule, Midrash of the Maskil, A Document of Community Law (aka. The Damascus Document), Codex Alexandrinus, etc. The material study will be conducted by using methods of DS, especially computer vision and natural language processing.
So, when I say this site will focus solely on DS, I am saying, on the one hand, that we are interested to keep our finger on the pulse of cutting edge DS developments and topics, and, on the other, that we are not limiting ourselves to the run of the mill DS projects. For the remainder of this year, I’d like to spend some time on Computer Vision and Natural Language Processing. I will use a blog format to introduce and summarize DS projects, review papers and books in the field DS, and build out this blog as a resource for data-driven decision models.