Image for post
Image for post
Photo by Dan Loran on Unsplash

I recently came across data published by government of India on farm produce for different crops in different season district wise from year 1996 to 2015. I wanted to use this data to answer some questions I was interested one was the analysis of per unit area production of a particular crop across different state. I choose wheat for my analysis. but you can use any other based on the given data. According to the given data I am not sure if area is in acres or hectares and produce is in quintals or tons but that doesn’t affect our analysis if units are consistent which is my assumptions. …


Python and linear algebra to help a hypothetical farmer

A Farmers’s Problem that can be solved by matrix algebra :

Image for post
Image for post
Photo by Shelley Pauls on Unsplash

Suppose you are a farmer in rural India and you want to sell your rice and potato produce. There is one option to go to Mandis in more than one city to sell all your produce. Other option you have is to sell all your produce in same city. The buyer here in same city puts just one condition. Farmer can set price but he’ll buy same amount of rice and potatos. Buyers in other market put the condition that they ‘ll buy the produce at the same price at which the buyer in local market buys but they can buy different amount of potatoes and rice as per their requirement. You as a farmer find demand from the 2 Mandis for rice and potato. …


Clustering Names from wikipedia article Using Word Embedding and K-Mean Clustering

Image for post
Image for post
Photo by Bee Balogun on Unsplash

In this article I am trying to cluster names from the names extracted from a wikipedia article. I’ll be using K-mean clustering and the distance between names will be calculated based on the word embedding vectors provided by spacy. In an earlier article we extracted names from wiki page and used spacy named entity recognizer technique to identify the names from that page.

In this article we’ll go a step further and apply unsupervised machine learning technique k-mean clustering for clustering names in different groups and then analyse the groups if they make any sense.

Word Embedding

word embedding is a vector representation of words which places similar words closer in terms of their distance in vector spaces. In this mechanism each word is one dimensional vector. Length of vector depends…


Image for post
Image for post
Photo by Hans Vivek on Unsplash

Question of interest here is to find which dynasty rule in India for the longest period of time.

Here I am trying to extract all the dynasty rules from following page and try to determine length of their rules and plot a graph to find the longest ruling dynasty https://en.wikipedia.org/wiki/List_of_Indian_monarchs

We’ll get the data from wiki apply some python techniques and turn it into a visualization. We are trying to apply and understand some python concepts in the whole process.

Step 1. Get content from wiki page

import pandas as pd
import requests
import wikipedia
import re
url = r'https://en.wikipedia.org/wiki/List_of_Indian_monarchs'
page = wikipedia.page('List_of_Indian_monarchs')
content = page.content

If you look at the page you can see that you can find that the time duration for each dynasty is give in the content itself. We’ll try to extract this information from…


Named Entity Recognition From Wikipedia article using Spacy

Image for post
Image for post
Photo by Julian Rivera on Unsplash

In this article we ‘ll try to find names of person in a wikipedia article using python spacy library. I assume that you have already installed spacy and wikipedia api libraries from pypi if you are planning to run source code from this article.

Many a time articles are too long and we are only interested in certain information. We are either interested in summary or major events and major characters associated with the current. Here we are trying to just find person names from different articles. Determining whether a word is name of a person is done using pretrained models. Spacy does a good job of labeling these. …


Binaca Geetmala Analysis

Image for post
Image for post
Photo by Brian McGowan on Unsplash

If you are born between early 1940s to late 1980s in hindi heartland of india or you had any other association with bollywood in that period, chances are that your childhood is dominated by memories of bollywood songs played on radio stations like Vividh bharti. I had one such childhood. Recently I came across some recommendation on my youtube account for songs from popular Binaca Geetamala from that period of time. It made me little curious to think about those days.

Now I knew pretty well who my favorite singers were and who were most popular that day but I got little curios to know more about all the songs that were ever played on Binaca geetmala and who were most featured artists during that era in that playlist. …


Image for post
Image for post
Photo by chuttersnap on Unsplash

Pickling in python is persisting object state at secondary storage so that you can it is present even when program terminates and we are able to recreate object state from thos file instead of running all the previous required to create object state.

This seems to be very powerful option it gets troublesome sometimes however. You need to be careful in terms what you can and you can pickle.

Here is a canonical pickle and unpickle structure. We are trying to pickle and unpickle a simple variable called t which stores an array of strings.

import pickle
t = [“aa”]
with open(“test.pickle”, “wb”) as f:
pickle.dump(t,f …


Immutable Objects : Some Insights and implementation

Objects and Need for Immutability

Image for post
Image for post
Photo by Thomas Habr on Unsplash

An Object is immutable if you can not change state of an object once its created.

An object has 3 property:

Identity
State
Behavior.

I got following image from https://slideplayer.com/slide/5854201/ which depicts this idea very well.


Image for post
Image for post
Photo by Andrew Neel on Unsplash

I always wonder on what to advice when I see a question like should I learn ‘Python or R’ , ‘Java or Ruby’, , ‘Lisp or Prolog’, ‘Ada or COBOL’. (I never saw the last one actually). In my opinion the answer is always obvious. Both. Probably my approach to learn a new language is little different from others.

Learning any new thing requires, time , patience , practice and motivation. One of the reasons we are in a dilemma question like above is scarcity of few of these essential elements needed for learning and mostly it is time. …


Image for post
Image for post
Photo by Patrick Tomasso on Unsplash

Collaborative filtering is one of the simplest approaches for recommendation systems. I am going to use python surprise package to make a simple recommendation system. In collaborative filtering we rely on other user’s rating on common items to determine the rating of an item for a user when the item is already rated by other users and we have already established a similarity parameter in those users. There are some issues we are going to address while trying to create a recommendation system based on collaborative filtering.

First question is how to calculate similarity between users. How to calculate the similarity between users when most of the ratings from them is missing for the items they have not rated. …

About

pankaj kumar

Data Scientist / Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store