A union of curiosity and data science

Knowledgebase and brain dump of a database engineer


Setup and Install Apache Airflow on a Ubuntu 18 GCP (Google Cloud) VM

 

 First we log into GCP. 

Next create a VM within "Compute Engine". 

I create a small VM named Airflow for this demo.  

I choose Ubuntu 18.04 LTS Minimal. Create the VM

Connect to the VM using the browser SSH client.

sudo su
apt-get update
apt install python
apt-get install software-properties-common
apt-get install python-pip
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
pip uninstall marshmallow-sqlalchemy
pip install marshmallow-sqlalchemy==0.17.1
airflow initdb
airflow webserver -p 8080

 

The first thing I'll do when connected is elevate my user. 

Next I'll update the OS. 

Next Install Python. 

Next we'll install software-properties-common. This will help manage the repo's that we install software from. 

Next let's install Pip

 

 

We also want to export an environment variable for UNIDECODE to prevent errors. 

You can read more on this here : https://stackoverflow.com/questions/52203441/error-while-install-airflow-by-default-one-of-airflows-dependencies-installs-a

Now install apache airflow using pip

Currently in October 2019, you'll get a Marshmallow-SQLalchemy error if you attempt to initialize the default SQLite Database.

To prevent this error install an earlier version of Marshmallow-SQLalchemy.

Initialize the database

Run the web server on port 8080

Open the GCP Firewall to allow traffic to the airflow server. 

 

At this point you may be wondering ,  why is there an warning at the top of the page related to the scheduler. This is due to a "Max Threads" setting in the airflow config being greater than 1. With Sqlite as the DB , this setting will need to be set to 1 and the scheduler will need to be started. 

 

Ok, I'm going to log back into the console and use the browser to SSH into my instance. 
Once I'm in , I'll switch users and open the airflow config file. Once the config file is open, scroll down until you see  "max_threads". If you're using SQLite change this value to 1. Save the file.

Now we can start the scheduler. 

 

 

 

 

Airflow docs: https://airflow.apache.org/start.html

 

 

 

 

 

 

Magento 2 API Product Get

Create a token in Magento Admin. System > Integrations > (Create a integration) or edit an existing integration and get the token. 

 

<?php
//Authentication rest API magento2.Please change url accordingly your url
$headers = array("Authorization: Bearer <token value>");

$requestUrl='https://www.site.com/index.php/rest/V1/products/<your sku>';

$ch = curl_init($requestUrl);

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($ch);
$result=  json_decode($result);

if (curl_errno($ch)) {
   print curl_error($ch);
}
print_r($result);

?>

 

 

 

Google Translate API - Wav file speech recognition with python3


This api transcribes up to 1 min of audio to text. You'll need a google cloud account. 

https://cloud.google.com/translate/docs/

#!/usr/bin/env python3

import speech_recognition as sr
from os import path

audio = "K:\\random\\part4.wav"
r = sr.Recognizer() #recognizer object, we can use various services with this including  google, microsoft and ibm watson
with sr.AudioFile(audio) as source:
    audio = r.record(source)  # read the entire audio file

# Google Cloud Speech Recognition
Whisper = r"""<entire contents of json credentials from google>"""
try:
    ish =  r.recognize_google_cloud(audio, credentials_json=Whisper)
    print("Transcribed:...  " +ish)
#canwehassavedfile!
    file = open("K:\\random\\part4.txt","w")
    file.write(ish)
    file.close()
except sr.RequestError as e:
    print("Sorry, can't help you. ; {0}".format(e))
except sr.UnknownValueError:
    print("What was that?")