I made some free time to update my blog. I dedicate this post to my first serious code.
Our company has a database of users who subscribed to at last one of our services and are MTNIrancell users. In the other side, MTNIrancell has a database of their users who are subscribed to one or more of our services.
The problem was that MTNIrancell DB and our DB were out of sync.
MTNIrancell gave us their users list as a .csv file and asked us to compare the list with our DB and finally give the diff.
Abraham was struggling with our DB and MTNIrance’s csv file for a couple of days. Those days, I was reading Eric Matthes‘ Python Crash Course Files and Exceptions chapter. Abe asked me to help him. I was glad because I could develop a serious operational code.
filename = 'ghosts.csv' with open('irancell.csv') as mtn: mtn_lines = mtn.readlines() with open('vada.csv') as vada: vada_lines = vada.read() line_number = 0 for line in vada_lines: line_number += 1 print("Now reading line number: " + str(line_number) +" of vada.csv") if line not in mtn_lines: with open(filename, 'a') as file_diff: file_diff.write(line)
The code was working and compare MTNIrancell’s csv with our csv file and exported the users exist in our DB which are not in MTNIrancell’s DB. The MTNIrancell’s DB had about 2 M records while our DB had about 1.1 M records.
My code was working but it was too slow. It compares those csv files about 5 records per second. Aberaham and Ali didn’t have enough free time to wait for the code to finish. So they googled and found comm tool. Comm could solve a problem in a very very short time rather than my Python3 code. It’s because my python3 code is a single thread/single process code while comm could use more thread/process. Although I am not sure I can use multi thread/multi processing to compare those files in a shorter time, but who knows, maybe I refactor the code someday to support multi process processing.