Semantic Keyword Clustering in Python

I already shared some cluster approaches victimization TF-IDF Vectorizer for clustering keywords along in Python. This works excellent for grouping keywords together in Python that share identical text strings; however, you’re not capable of a group by which means and linguistics relationships.

One thanks to agitating linguistics is a build-up, for example, word2vec models and cluster keywords with Word Mover’s Distance. The downside: you have got to pay some effort building such models. For this reason, I would like to indicate to you an additionally accessible answer you’ll be able to transfer and run.

Use the Google SERP results to discover semantic relationships

Google is victimization informatics models to supply the most effective search results for the user. Yes, it’s a black box – however, we can use it to our advantage. Rather than building our models, we tend to use this black box to cluster keywords by their semantic in Python. Here is however the Python program logic works:

Starting purpose may be a list of keywords for a topic
For each Keyword, we scrape the SERP results
A graph is formed using the connection between keywords and ranking pages: If identical pages rank for various keywords, they appear to be connected. this can be the principle we are creating the linguistics keyword clusters

Let’s put everything together in Python

The Python Script covers these functionalities:

Download the SERPs for a given list of keywords by victimization googles custom search engine. The results are saved to an SQLite database. You would like to line up a custom search API here. Once doing this, you’ll use the free quota of a hundred requests per day – the paid arrangement can value you $5 per one thousand requests if you’ve got larger keyword sets and you need results right away. If you have time to accompany the SQLite solutions – the SERP results will be appended to the table on every run (take a brand new set of 100 keywords for the subsequent day once the free quota is accessible again). Within the python script, you’ve got to line up this variable:
CSV_FILE=”keywords.csv” => store your keywords here
LANGUAGE = “en”
COUNTRY = “en”
API_KEY=”xxxxxxx”
CSE_ID=”xxxxxxx”

Running getSearchResult(CSV_FILE,LANGUAGE,COUNTRY,API_KEY,CSE_ID,DATABASE,SERP_TABLE) will write the SERP results to the database

The Clustering is made using networkx and the community detection module. The data is fetched from the SQLite database – the clustering is called with getCluster(DATABASE,SERP_TABLE,CLUSTER_TABLE,TIMESTAMP)
The Clustering results can be found in the SQLite table – if you do not change the name it is “keyword_clusters” by default.

Also READ: WordPress Layout and Photoshop: Guides and Techniques

You can get the full code below:



# Semantic Keyword Clustering by Pemavor.com 

# Author: Stefan Neefischer (stefan.neefischer@gmail.com) 

 
from googleapiclient.discovery import build 

import pandas as pd 

import  Levenshtein 

from datetime import datetime 

from fuzzywuzzy import fuzz 

from urllib.parse import urlparse 

from tld import get_tld 

import langid 

import json

import pandas as pd

import numpy as np

import networkx as nx

import community

import sqlite3

import math

import io

from collections import defaultdict




 
 
def cluster_return(searchTerm,partition):

     return partition[searchTerm]




 
 
def language_detection(str_lan):

    lan=langid.classify(str_lan)

    return lan[0]




 
 
def extract_domain(url, remove_http=True):

    uri = urlparse(url)

    if remove_http:

        domain_name = f"{uri.netloc}"

    else:

        domain_name = f"{uri.netloc}://{uri.netloc}"

    return domain_name




 
 
def extract_mainDomain(url):

    res = get_tld(url, as_object=True)

    return res.fld




 
 
def fuzzy_ratio(str1,str2):

    return fuzz.ratio(str1,str2)

 
 
def fuzzy_token_set_ratio(str1,str2):

    return fuzz.token_set_ratio(str1,str2)




 
 
def google_search(search_term, api_key, cse_id,hl,gl, **kwargs):

    try:


        service = build("customsearch", "v1", developerKey=api_key,cache_discovery=False)

        res = 
service.cse().list(q=search_term,hl=hl,gl=gl,fields='queries(request(totalResults,searchTerms,hl,gl)),items(title,displayLink,link,snippet)',num=10, cx=cse_id, **kwargs).execute()

        return res

    except Exception as e:

        print(e)

        return(e)




 
 
def google_search_default_language(search_term, api_key, cse_id,gl, **kwargs):


    try:
        service = build("customsearch", "v1", developerKey=api_key,cache_discovery=False)


        res = service.cse().list(q=search_term,gl=gl,fields='queries(request(totalResults,searchTerms,hl,gl)),items(title,displayLink,link,snippet)',num=10, cx=cse_id, **kwargs).execute()

        return res

    except Exception as e:

        print(e)

        return(e)




 
 
def getCluster(DATABASE,SERP_TABLE,CLUSTER_TABLE,TIMESTAMP="max"):

    dateTimeObj = datetime.now()

    connection = sqlite3.connect(DATABASE)

    if TIMESTAMP=="max":

        df = pd.read_sql(f'select * from {SERP_TABLE} where requestTimestamp=(select max(requestTimestamp) from {SERP_TABLE})', connection)

    else:

        df = pd.read_sql(f'select * from {SERP_TABLE} where requestTimestamp="{TIMESTAMP}"', connection)

    G = nx.Graph()

    #add graph nodes from dataframe columun

    G.add_nodes_from(df['searchTerms'])

    #add edges between graph nodes:

    for index, row in df.iterrows():

        df_link=df[df["link"]==row["link"]]

        for index1, row1 in df_link.iterrows():

            G.add_edge(row["searchTerms"], row1['searchTerms'])



 
    # compute the best partition for community (clusters)

    partition = community.best_partition(G)

Understanding procurementnation.com Shipping

techandgamedaze .com: Your Ultimate Source for Digital Entertainment

AVStarNews Number: Everything You Need to Know

What’s Buzzing in Perth? Dive into the Latest Scoop on open house perth.net latest news

How to Overcome Common Technical SEO Challenges in Enterprise Websites

Semantic Keyword Clustering in Python

IncidentalSeventy – The Intriguing Phenomenon of 70

Decoding iamnobody89757: What’s the Story Behind the Handle?

Life on a Deserted Island with an Enemy Female Soldier

Exploring the Concept of Heaven or Not.net

Wounded, Yet Strong: A Story of a Saint’s Revenge

5 Benefits of Donating Your Body to Science

New post

Understanding procurementnation.com Shipping

techandgamedaze .com: Your Ultimate Source for Digital Entertainment

Follow Us

Semantic Keyword Clustering in Python

Use the Google SERP results to discover semantic relationships

The Python Script covers these functionalities:

You can get the full code below:

Related Posts