Automating Data Extraction

Recently, our Tech Team received a request from our Marketing Team to automate the extraction of certain social media statistics, such as the number of views, comments, and likes on our posts within a specified period.

Turns out there was a new initiative to track the performance of our marketing activities, and part of it involves monitoring various metrics - such as reach, impressions and interactions - on our public-facing platforms, like:

Each platform has its own analytics dashboard - Facebook has a Creator Studio for Facebook and Instagram, Spotify has Spotify for Artists, etc. - but having to log in to each platform to extract the exact information we want (“how many views do we have on each video posted between Jan and March?”) is a tedious and unenviable manual task.

Design

Luckily, these platforms also have public-facing APIs (Application Programming Interfaces) that we can use to automate this process. Our general approach is to :

  • Extract the data we want (impressions, views, likes, saves, etc.) from our social media platforms using scripts (or extractors).

  • Load the data into a database.

  • Build visualisations and a dashboard based on the data.

  • Repeat this process every day.

This is a miniature of a classic procedure called ETL (Extract, Transform, Load) used to consolidate data for data analysis and business intelligence applications.

Untitled Diagram-3.png

There are 4 components to our system:

  • Our extractors are simple scripts written in Python

  • Airflow is a task management tool for running and scheduling the extractor jobs

  • PostgreSQL is a database for storing all the raw and transformed data

  • Redash is a data visualisation tool

These applications are all run on our own DigitalOcean server.

API

In order to use the APIs, we have to first create API keys, used to authenticate API requests. The instructions can be easily found on the respective platform’s websites, such as:

Afterwards, it is just a matter of looking through the various documentation to find out which APIs can give us the data we want.

  • For YouTube, we only need the views, likes and comments, which can be retrieved through the YouTube Data API. We also needed the data for unique viewers, but sadly it is no longer available through their API.

  • For Facebook and Instagram, everything is queried through their Graph API.

  • For Spotify, we need the number of listens per track. Unfortunately. there is no public API available for the Spotify for Artists platform, but we were able to design a web scraping workflow to get the data we want.

ETL pipeline

More accurately, we are using an ELT workflow (Load first, then Transform).

ELT is a more flexible workflow for our use case; since we do not need to optimise for storage yet (our social media analytics isn’t exactly big data, plus data is cheap nowadays), we can just dump the raw data into the database, and design the transformation step accordingly. Right now, we are still not 100% certain of the metrics we need, so this workflow allows us to have the flexibility of changing how we process the raw data into a query-friendly format.

Our extractors are run using Airflow, which organises processes in units called tasks. Airflow allows us to visualise the flow easily, and inspect the logs/output and status of each task as they run.

3D976254-E076-480E-950A-03A3822C9477.png

To keep our data updated, Airflow can also be configured to run our extractors automatically on a regular basis. We have set them up to run daily.

Data Visualisation

Now that we have the data we want in our database, we can use Redash to query the data and visualise the output.

Redash is a BI tool that is meant for users with some knowledge of SQL. Thankfully, the SQL knowledge we need to get what we want is not very complicated - we just need to know the basic select, group, order and some window functions.

1DB2081F-C2B8-4773-A40E-1EE1231CDADC.png

From the output, we can even create some simple time series graphs! The chart below shows the number of new impressions/reach/engagements/likes for all our posts this year, over the past few days.

22E7CA44-2DF7-411E-8047-AE12B31E148B.png

We can then create a dashboard to view all the charts and tables in one place.

CFDA8AD9-090C-4A80-ADFC-C3E4054E0359.png

Acknowledgements

Believe it or not - this was all set up in a single day.

We would like to thank Andrew Goh from Grab for helping us with building this system! Andrew reached out to us through Grab’s Social Impact initiative, which enabled him to commit one full day to this project.

We are also very grateful to DigitalOcean for supporting us with infrastructure credits through their Hub for Good program.

A Chat with Qin Li-Wei

After having performed to his first live audience in over half a year, we had a chat with Qin Li-Wei to ask him about his thoughts on the pandemic situation for the arts industry, and what it meant to be able to perform live once again.

DSC06988.JPG

Q: You are used to travelling and performing with world-class orchestras worldwide. Was it difficult to adapt to the new “stay-home” lifestyle caused by the global pandemic? What were some of the things you did at home?

A: Yes, it was definitely difficult to adapt to this new lifestyle. My last concert was at the beginning of March, at the Suntory Hall in Tokyo. Once that was over, I had to stay in Singapore and all my other performances were cancelled. China cancelled first, followed by Europe, and then Australia. It was quite a difficult period psychologically, even though I got to rest a bit more physically. I usually have around 50 concerts a year, but this year I’ve probably played only 5! But I think it’s a good time to find new ways to find a purpose in our artistic lives.

I’ve been doing quite a bit of teaching (over the internet) for various conservatories, even webinars with the Melbourne Youth Orchestra just to talk to the youngsters about life as a musician, etc. I’ve also been learning new works - which probably wouldn’t have been possible if not for this pandemic! Works like the Schnittke Cello Sonata and the Hindemith Cello Concerto - These are mammoth works, and I’m usually on the road so much I just don’t have sufficient time to delve into these works meaningfully. But now I suddenly have all this time! And so how better to spend the time than to learn music I’ve always wanted to learn and finding other ways to improve myself!

My usual lifestyle before the pandemic was driven by deadlines. There were always new deadlines to meet. Every week there’s a new concerto, a new recital, etc. So my practice schedule is always centred around these concerts, working towards these deadlines. Now, there are no such deadlines! So we really have to find a purpose - why are we practising? Why do we love music?

Overall, I’ve gotten used to it now after occupying myself with other things, but at the beginning it definitely took some time getting used to. I’ve never been at one place for so long since high school!

Q: Your children must definitely appreciate you being at home for a prolonged period of time?

A: Actually it’s the reverse. My kids were very happy at the start (of the Circuit Breaker period) but now they’re almost fed up with me! Because I’m practising with my younger son on the cello every day, so it must be quite pressurising for him.

DSC07051.JPG

Q: It has now been over five months since the Circuit Breaker was implemented. How did it feel to be back on stage, performing to a live audience?

A: It’s amazing. Music is all about communication and sharing. Yes, we only had 50 people in the audience. But still, there were moments of interaction. Also with the orchestra, I wasn’t so used to it at the beginning because my memory of working with the orchestra is at a much closer distance to each other. Both visually, and also hearing. It took some time to get used to. But once that was over I thought it was fine. And to be making live music once again, as opposed to just facing a camera (which I had been doing quite a lot prior to this), was really really refreshing. Yes, there is still an audience behind the camera, but the microphone was right next to me, it almost felt like I was recording a CD, but at the same time you have to play it like a concert, because there are still people watching you (through the camera). For the live performance last week, I was focused on my projection, being able to speak to the last person who was sitting in the last row. Now this is really what I’m used to, and it was nice to be back!

Q: You performed the first movement of Haydn’s Second Cello Concerto to a live audience last week, and will be recording all three movements at the end of the month. Would your approach towards the music be any different? Why or why not?

A: I will treat the recording we’re doing as a performance, especially because there are cameras everywhere, acting as the eyes and ears of the live audience. The only difference is that these cameras are able to go much closer to you than any audience member would be able to do live. It’s really exciting for us performers because it demands a more polished sound, while still maintaining the stage presence of a live concert. I’ll admit - it will be harder. But at the same time, to have a video to keep as a memory of this challenging period, and to do it with OMM, is such a privilege for me! Especially recording at the Esplanade - magnificent hall, recording a wonderful concerto, I think it should work very well! And this is probably the new norm. We have to get used to being able to perform like at a concert while being recorded at a very high level. On stage (during the recording), I will probably still treat it as a concert! Imagining the concert, imagining the interaction with the audience - only then will the recording be spontaneous and exciting. It will definitely be challenging but I’m looking forward to it!

DSC07042.JPG

Q: Could you share with us what the usual practice with cadenzas is, and what is so special about this particular one you will be performing (and eventually recording)?

A: Cadenzas were originally written to demonstrate the virtuosic capabilities of the performer. The tradition is that when you’re young, you play what’s written on the score. When you mature and develop your own artistic identity, you then try to write something for yourself that fits your style, something that you’re comfortable with. That is usually the case. These cadenzas also have to fit (musically) with the rest of the movement, based on what the composer has written. Hence, it is actually not easy to write your own cadenzas. I’m not such a great composer, and my own cadenzas are absolute rubbish, so Jonathan’s cadenza will stay with me for the foreseeable future. I think it’s really difficult to top his cadenza! Also, since one of the objectives of this project is to support freelance musicians, I thought, why not invite a Singaporean composer to contribute to this project in a meaningful way? And when Jonathan Shin (who also plays the cello) happened to be available, I thought it was the perfect opportunity!

Prior to this, I did have plans to record both Haydn Cello Concertos. For the C major concerto, I also had another prominent Singaporean composer, Ho Chee Kong, compose a cadenza for me when I performed it with the Singapore Symphony Orchestra back in 2018. With Jonathan’s latest contribution, I now have two Haydn Cello Concertos with cadenzas composed by two Singaporean composers! I think this will shine a unique Singaporean light on these two works, which is fantastic.

Learning Jonathan’s cadenza is really the same as learning any other piece. But because he plays the cello, he was not afraid to explore the different capabilities of the instrument. I really thank him for that, because it not only made the learning process very interesting, but was also a treat for the audience members listening to it! Overall, I really enjoyed the entire process, and I can’t wait to record it with OMM.

We would like to thank Qin Li-Wei for taking time out of his busy schedule to speak to us. Stay tuned to our social media channels to catch Li-Wei play Haydn with OMM!

Rolling out our own e-ticketing solution

When planning a concert, one of our very first steps - typically - would be to approach a ticketing agent such as SISTIC or APACTix (now Ticketmaster). For our latest event OMM Restarts!, we were allowed a very small audience of 50, so we thought it was the perfect opportunity to explore the possibility of rolling out our own e-ticketing solution.

Our original idea was to use the good ol’ perforated paper tickets. However, to minimise the risk of COVID-19 transmission, we decided that our solution needed to be digital.

Enter the OMM Tech team - we recently formed this team with the aim to modernise our backend processes, capitalise on every opportunity to automate workflows, and ultimately help our volunteer-run committee be as efficient as possible. So far, we had been working on smaller projects such as simple Telegram bots and HR automation using Apps Script, so we were excited to take this up as our first major customer-facing product.

Scope and Requirements

A few minutes into designing the system, we realised that creating a minimally functional ticketing platform is pretty straightforward. Let’s break down the basic requirements of a ticketing system:

  • Customers are able to reserve seats, pay for, and receive tickets easily

  • Customers are able to view event and seat information on the ticket

  • Ushers are able to validate tickets and mark them as scanned

That’s about it! To accomplish the above digitally, all we need is

  • Some way to receive payment from the audience

  • An automated solution to distribute the tickets

  • A database to store ticket data - seat numbers, whether the ticket has been scanned, etc.

  • An admin platform for ushers to validate the ticket, and mark the ticket as scanned.

  • A simple frontend to display the seat number, and the QR code for ushers to validate the ticket.

To simplify things, we decided to take the seat selection and payment handling aspects out of the scope for this project, and handle ticket requests via email - we had the capacity to do so since we only had to fill 100 seats (two shows, 5pm and 8pm, of 50 in the audience). Payment was handled via PayNow or PayLah!, two of the most popular payment methods here in Singapore.

User Experience and Interface Design

We originally wanted the interface to be something that both the audience and ushers can use to view the seat information and mark them as scanned. The idea is that

  1. We send the audience an email with a QR code

  2. Ushers scan the QR code from the email, which leads them to the ticketing page

  3. Ushers input the password given to them at the start of the day, and hit the “Mark as scanned” button.

  4. Audience is admitted into the hall

After testing the flow, we eventually decided to separate the password scanning part into a separate interface altogether, and move the QR code for the ushers to scan from the email into the e-ticket.

Check out our design iterations, made using Sketch. Credits to Yun En, our latest OMM Tech member!

Implementation

For the flow described above, only two API calls were needed:

  • GET /<ticket_id>/status - used by the audience to retrieve ticket details, and by the admin page to validate the ticket

  • POST /<ticket_id>/submit - used by the admin page to mark a ticket as scanned

In order to put out a working solution within 2 weeks, we needed to use something that we are familiar with, is easy to deploy, and is reliable. AWS was thus the obvious choice as we use them in our day jobs and we are extremely familiar with the ecosystem. To implement the APIs, we use

  • API Gateway, as the reverse proxy to direct HTTP requests to our…

  • Lambda, written in Python, which handles the requests and interacts with…

  • DynamoDB, the database that stores the data of our tickets

As for frontend, there are way too many solutions to choose from - even vanilla HTML/JS/CSS wouldn’t be a bad option since it is a very simple webapp we are creating - so we defaulted to what we were most familiar with: Elm. We linked our repository to Netlify to build and deploy the static site automatically on their CDN.

Our code is publicly accessible on GitHub.

Ticket Distribution

Sendgrid was our original solution for sending out the tickets to our audience. However, Sendgrid was recently hacked, and thus many of the IPs used by their free plans were put on spam lists such as Spamhaus. Hence, we were unable to get emails through to @hotmail and @outlook users.

We promptly switched to Mailgun, which was just as easy to configure and use, and fortunately we were able to export the HTML of our Sendgrid dynamic template (from Sendgrid itself no less!) to send out the remaining emails with the same visual consistency as the first batch.

Launch Day

Someone in the audience came for the 5pm show even though their ticket was for the 8pm show, and the mistake was only discovered after he was already admitted into the hall. There were a few things we could have done to prevent this:

  • Make tickets for different shows appear different visually, to signal to the ushers that the audience came for the wrong show

  • Disallow the successful scanning of the ticket through some backend logic

  • Prevent the QR code from being generated through some frontend logic

We also accidentally discovered an iOS 14 bug: one of our ticket URLs triggered the new iOS 14 App Clip feature when we scanned it using the QR code app. We have submitted a bug report to Apple.

Further Work

There is a lot more work to be done in order support regular-sized concerts with our system. We would need a portal for a potential customer to select their desired seats, make payment, and receive their e-tickets automatically. We also need to audit the security of our system to ensure that the system is reasonably resilient against potential attack vectors, such as ticket forgery and denial of service attacks.

Fortunately, these are not difficult problems. Overall, this experience provided an encouraging insight into creating an e-ticketing solution for concerts, and we strive to develop and refine it further for our future concerts.