By The Numbers: Thru 2021-08-28
August 28, 2021
Departed Chicago and arrived near Boston. To date that is 3.8k miles driven with 49.7k records captured. And yes, there were donuts.
Read More
August 24, 2021 During my road trips, I like to write “by the numbers” themed posts where I share a little bit about how the data shaping up during the trip. Here’s the first update for the 2021 Epic Road Trip! Read More
August 18, 2021 Today was supposed to mark the beginning of my road trip. Oops! I am grateful for the delay as I got to see some great people and build some data goodness instead. Read More
August 16, 2021 Ok so “new blog” is a stretch. This is the same blog that’s hosted via Github Pages but with a new backend and purpose. Check out this post to learn about recent updates to my blog and the new stuff I will be posting here. Read More
June 27, 2020 Final vacation project update! Decided to post it as a video instead of a blog post. Now I’m going to let it run for a week and then measure how well my process did at repairing a data stream. Read More
June 24, 2020 Today’s accomplishment was crafting the first cut of a Step Function deployed via SAM & CloudFormation. I went head first into writing code… and quickly realized my previous drawing needed some more love. I redrew my previous step function so I could track the input parameters and detail the decision points. Here is that new drawing along with how AWS visualized it via the Step Functions console: Read More
June 23, 2020 Part of today was spent at Mount Rainier (Link: Photos in iCloud), so I did not put in a full day’s effort. Today’s updates (Link: Commits in GitHub) involve parsing the response from Athena’s Read More
June 22, 2020 I said I was not going to work on this project over the weekend. That was a lie: I pushed a few commits into the repo through the weekend. Last I wrote, I was a bit frustrated by the response structure provided by Athena’s Read More
June 19, 2020 You see this image below? It scares me. This is how a query looks when you ask Athena to grab query results for you. It mimics the rows and columns in terms of how a person would think of a query result. The raw JSON file is available at the end of this post. Let’s take a step back and talk about how I got here. Read More
June 18, 2020 At this point, all of the infrastructure work is complete and I am pulling stock data every minute for 11 stocks. The biggest additions from yesterday include using a time-triggered Lambda to queue up stock data requests for another Lambda to go get. The results get moved into a data stream then stored in a data lake. Now I have a data catalog available that enables us to query Amazon Athena (serverless query service) to do some basic analytics on the data. The latest hand-drawn monstrosity of an architecture diagram looks like this: Read More
June 17, 2020 The basic infrastructure is complete: code grabs stock data from Finnhub and pushes the result into a data stream where it eventually gets stored in a data lake. The API secret token is stored in AWS Secrets Manager and never exposed. Everything is deployed via CloudFormation; you can start looking at my code on Github. Read More
June 16, 2020 Yesterday I started a two-week vacation from Amazon Web Services. While I am excited to take a break, I want to use some of this time to play with AWS technologies I have not yet used. I wanted to go beyond tutorials so I came up with a small project: a self-healing data stream. I will create routine data streams that are processed continuously to create downstream facts and aggregations. The challenge is that the data capture mechanism has an intentional flaw may result in incomplete data sets being used by downstream clients. This requires building an “auditor” to step in, analyze the data sets, and take corrective actions if data quality is impaired.By The Numbers: Thru 2021-08-24
Road trip delay, so let's build a data pipeline!
New blog, who dis?
Vacation Project: Final Code Update
Vacation Project: Do a little step function dance
Vacation Project: Parsing results from Athena
get_query_results() method. It is not pretty but it does the job:Vacation Project: Weekend Update
get_query_results() method. This response is a row-based dictionary where each element lists the related columns’ values. It is probably the simplest way to share tabular results that do not have a primary key but it flies in the face against column-oriented data types that have become a modern standard.Vacation Project: Day Three
Vacation Project: Day Two
Vacation Project: Day One Learning
Vacation Project: Self-Healing Data Stream