[ENG/ITA] Python and Hive: A Tool to Simplify Curation | Work in Progress!

in #hive-1466202 months ago

cover


La versione italiana si trova sotto quella inglese

The italian version is under the english one


Python and Hive: A Tool to Simplify Curation | Work in Progress!

My first Python project involved creating a small bot that could upvote and comment on posts under which I had previously left a comment containing a certain keyword: its usefulness is to be able to use only one account to choose if and how much to upvote a post with my secondary account.

In fact, sometimes I might want to upvote only with my main account, sometimes only with my secondary account, sometimes with both but with different percentages... that's why setting up a curation trail with hive.vote in these cases might be too restrictive, while having my own custom bot that allows, each time, to choose what to do enables me to be much more flexible and avoid wasting precious upvotes.

In comparison with the code shared last time I have made some small improvements, some suggested by other users, others added to make the code more robust and less likely to crash unexpectedly.

Now I am finishing some last small details, but meanwhile you can already find the script on GitHub... or at least you will be able to find it as soon as I set the privacy to “public” 😂 so if you click on the link shortly after this post is published you will sadly see nothing yet.


Now let's move onto a new project!

Having (almost) finished the first project, it's time to move on to something different, in an effort to learn new stuff!

This time the idea of what to make comes from a suggestion of @stewie.wieno, who asked me if, using Python, it would be feasible to create something that could make possible the creation of a sort of curation trail to support Italian users on Hive.

Therefore, my idea was to design a script that had the following features:

  • find posts with a particular tag (e.g., ita);
  • check if the post is written in Italian language;
  • check if the post has at least 500 words (or 1000 if the post is written in two languages).

If these requirements are met, the post is added to a special list.

Here the task of this first script ends.

The list can then be checked manually by one or more curators who make sure that the posts are of quality, are not spam and do not violate some Hive rule.

After that I would like to create a second script that would take the cleaned-up list and proceed to upvote the selected posts, leaving each one also a comment.

This would greatly simplify and speed up the curators' work, with the two scripts taking care of almost the entire process automatically.

Of course this is only the beginning, but building such a tool seemed like an interesting exercise, so I wanted to try this little experiment :)


And here's the code!

Below is the code for the first of the two scripts I am working on, already done and ready to be polished:


#!/usr/bin/env python3
"""A script to simplify curation on Hive"""
from beem import Hive
from beem.blockchain import Blockchain
import beem.instance
import os
import json
import markdown
from bs4 import BeautifulSoup
import re
from langdetect import detect_langs, LangDetectException as lang_e

# Instanciate Hive
HIVE_API_NODE = "https://api.deathwing.me"
HIVE = Hive(node=[HIVE_API_NODE])

beem.instance.set_shared_blockchain_instance(HIVE)


def get_block_number():

    if not os.path.exists("last_block.txt"):
        return None

    with open("last_block.txt", "r") as infile:
        block_num = infile.read()
        return int(block_num)


def set_block_number(block_num):

    with open("last_block.txt", "w") as outfile:
        outfile.write(f"{block_num}")


def convert_and_count_words(md_text):
    # Convert text from markdown to HTML
    html = markdown.markdown(md_text)

    # Get text
    soup = BeautifulSoup(html, "html.parser")
    text = soup.get_text()

    # Count text words
    words = re.findall(r"\b\w+\b", text)
    return len(words)


def text_language(text):
    # Detect languages
    try:
        languages = detect_langs(text)
    except lang_e:
        return False, 0

    # Count languages
    num_languages = len(languages)

    # Sort languages from more to less probable
    languages_sorted = sorted(languages, key=lambda x: x.prob, reverse=True)

    # Check most probable languages (up to 2)
    top_languages = (
        languages_sorted[:2] if len(languages_sorted) > 1 else languages_sorted
    )

    # Check it target language is among the top languages
    contains_target_lang = any(lang.lang == "it" for lang in top_languages)

    # Return True/False and number of languages detected
    return contains_target_lang, num_languages


def hive_comments_stream():

    blockchain = Blockchain(node=[HIVE_API_NODE])

    start_block = get_block_number()

    for op in blockchain.stream(
        opNames=["comment"], start=start_block, threading=False, thread_num=1
    ):
        set_block_number(op["block_num"])

        # Skip comments
        if op.get("parent_author") != "":
            continue

        # Check if there's the key "json_metadata"
        if "json_metadata" not in op.keys():
            continue

        # Deserialize 'json_metadata'
        json_metadata = json.loads(op["json_metadata"])

        # Check if there's the key "tags"
        if "tags" not in json_metadata:
            continue

        # Check if there's the tag we are looking for
        if "ita" not in json_metadata["tags"]:
            continue

        post_test = op.get("body")

        # Check post language
        is_valid_language, languages_num = text_language(post_test)

        if is_valid_language == False:
            continue

        # Check post length
        word_count = convert_and_count_words(post_test)

        if languages_num == 1:
            if word_count < 500:
                print("Post is too short")
                continue

        if languages_num > 1:
            if word_count < 1000:
                print("Post is too short")
                continue

        # data of the post
        post_author = op["author"]
        post_permlink = op["permlink"]
        post_url = f"https://peakd.com/@{post_author}/{post_permlink}"
        terminal_message = (
            f"Found eligible post: " f"{post_url} " f"in block {op['block_num']}"
        )
        print(terminal_message)

        with open("urls", "a", encoding="utf-8") as file:
            file.write(post_url + "\n")


if __name__ == "__main__":

    hive_comments_stream()



This time there are no templates or configuration files.

Like last time I would be very happy to receive suggestions and advice on how to make the code even more efficient and correct :)


images property of their respective owners

to support the #OliodiBalena community, @balaenoptera is 3% beneficiary of this post


If you've read this far, thank you! If you want to leave an upvote, a reblog, a follow, a comment... well, any sign of life is really much appreciated!


Versione italiana

Italian version


cover

Python e Hive: uno Strumento per Semplificare l'Attività di Curation | Lavori in Corso!

Il mio primo progetto scritto in Python ha riguardato la creazione di un piccolo bot che potesse upvotare e commentare i post sotto cui io abbia lasciato in precedenza un commento contenente una determinata parola chiave: la sua utilità è quella di poter utilizzare un solo account per decidere come e se upvotare un post con il mio account secondario.

Alle volte infatti potrei voler upvotare solo con il mio account principale, altre solo con quello secondario, altre ancora con entrambi ma con percentuali diverse... ecco perchè configurare una curation trail con hive.vote in questi casi potrebbe essere troppo limitante, mentre avere un proprio bot personalizzato che consenta, di volta in volta, di scegliere cosa fare permette di essere molto più flessibili ed evitare di sprecare preziosi upvotes.

Rispetto al codice condiviso la scorsa volta ho apportato alcune piccole migliorie, alcune suggeritemi da altri utenti, altre aggiunte per rendere il codice più robusto e meno incline a crash imprevisti.

Ora sto rifinendo alcune ultime piccole cose, ma intanto potete già trovare lo script su GitHub... o almeno potrete trovarlo appena avrò impostato la privacy su "pubblica" 😂 per cui se clicclate sul link a poca distanza dalla pubblicazione di questo post non vedrete, purtroppo, ancora nulla.


Adesso si passa ad un nuovo progetto!

Finito (quasi) il primo progetto, è tempo di passare a qualcosa di diverso, nell'ottica di provare ad imparare cose sempre nuove!

Stavolta l'idea di cosa realizzare deriva da un suggerimento di @stewie.wieno, che mi ha chiesto se, sfruttando Python, fosse possibile creare qualcosa che potesse agevolare la creazione di una sorta di curation trail a sostegno degli utenti italiani su Hive.

La mia idea è stata perciò quella di progettare uno script che avesse le seguenti funzioni:

  • individuare i post muniti di un particolare tag (es. ita);
  • controllare che il post sia scritto in lingua italiana;
  • controllare che il post abbia almeno 500 parole (o 1000 se il post è scritto in due lingue).

Se questi requisiti sono soddisfatti il post viene aggiunto ad un'apposita lista.

Qui finisce il compito di questo primo script.

La lista può così essere controllata manualmente da uno o più curatori che si accertino che i post siano di qualità, non siano spam e non violino qualche regola di Hive.

Dopo di che vorrei creare un secondo script che si occupi di prendere la lista ripulita e proceda ad upvotare i post selezionati, lasciando a ciascuno un commento informativo.

In questo modo il lavoro dei curatori sarebbe notevolmente semplificato e velocizzato, occupandosi i due script di praticamente tutta la procedura in maniera automatizzata.

Ovviamente questo è solo un inizio, ma costruire uno strumento del genere sembrava un esercizio interessante, per cui ho voluto provare a fare questo piccolo esperimento :)


Ed ecco il codice!

A seguire il codice del primo dei due script a cui sto lavorando, già funzionante e pronto per essere rifinito:


#!/usr/bin/env python3
"""A script to simplify curation on Hive"""
from beem import Hive
from beem.blockchain import Blockchain
import beem.instance
import os
import json
import markdown
from bs4 import BeautifulSoup
import re
from langdetect import detect_langs, LangDetectException as lang_e

# Instanciate Hive
HIVE_API_NODE = "https://api.deathwing.me"
HIVE = Hive(node=[HIVE_API_NODE])

beem.instance.set_shared_blockchain_instance(HIVE)


def get_block_number():

    if not os.path.exists("last_block.txt"):
        return None

    with open("last_block.txt", "r") as infile:
        block_num = infile.read()
        return int(block_num)


def set_block_number(block_num):

    with open("last_block.txt", "w") as outfile:
        outfile.write(f"{block_num}")


def convert_and_count_words(md_text):
    # Convert text from markdown to HTML
    html = markdown.markdown(md_text)

    # Get text
    soup = BeautifulSoup(html, "html.parser")
    text = soup.get_text()

    # Count text words
    words = re.findall(r"\b\w+\b", text)
    return len(words)


def text_language(text):
    # Detect languages
    try:
        languages = detect_langs(text)
    except lang_e:
        return False, 0

    # Count languages
    num_languages = len(languages)

    # Sort languages from more to less probable
    languages_sorted = sorted(languages, key=lambda x: x.prob, reverse=True)

    # Check most probable languages (up to 2)
    top_languages = (
        languages_sorted[:2] if len(languages_sorted) > 1 else languages_sorted
    )

    # Check it target language is among the top languages
    contains_target_lang = any(lang.lang == "it" for lang in top_languages)

    # Return True/False and number of languages detected
    return contains_target_lang, num_languages


def hive_comments_stream():

    blockchain = Blockchain(node=[HIVE_API_NODE])

    start_block = get_block_number()

    for op in blockchain.stream(
        opNames=["comment"], start=start_block, threading=False, thread_num=1
    ):
        set_block_number(op["block_num"])

        # Skip comments
        if op.get("parent_author") != "":
            continue

        # Check if there's the key "json_metadata"
        if "json_metadata" not in op.keys():
            continue

        # Deserialize 'json_metadata'
        json_metadata = json.loads(op["json_metadata"])

        # Check if there's the key "tags"
        if "tags" not in json_metadata:
            continue

        # Check if there's the tag we are looking for
        if "ita" not in json_metadata["tags"]:
            continue

        post_test = op.get("body")

        # Check post language
        is_valid_language, languages_num = text_language(post_test)

        if is_valid_language == False:
            continue

        # Check post length
        word_count = convert_and_count_words(post_test)

        if languages_num == 1:
            if word_count < 500:
                print("Post is too short")
                continue

        if languages_num > 1:
            if word_count < 1000:
                print("Post is too short")
                continue

        # data of the post
        post_author = op["author"]
        post_permlink = op["permlink"]
        post_url = f"https://peakd.com/@{post_author}/{post_permlink}"
        terminal_message = (
            f"Found eligible post: " f"{post_url} " f"in block {op['block_num']}"
        )
        print(terminal_message)

        with open("urls", "a", encoding="utf-8") as file:
            file.write(post_url + "\n")


if __name__ == "__main__":

    hive_comments_stream()



Stavolta non ci sono templates o file di configurazione.

Come l'altra volta sarei felicissimo di ricevere suggerimenti e consigli sul come rendere il codice ancora più efficiente e corretto :)


immagini di proprietà dei rispettivi proprietari

a supporto della community #OliodiBalena, il 3% delle ricompense di questo post va a @balaenoptera

Se sei arrivato a leggere fin qui, grazie! Se hai voglia di lasciare un upvote, un reblog, un follow, un commento... be', un qualsiasi segnale di vita, in realtà, è molto apprezzato!

Posted Using InLeo Alpha

Sort:  

Ottime idee e ti seguo con molto interesse in questa tua avventura nella creazione di utili scritp
!discovery 50
!PIMP
!hiqvote
@tipu curate 2

Probabilmente tutto si tradurrà solamente in un po' di esercizio e qualcosa di nuovo imparato, però non si sa mai che un domani possa nascerne qualcosa di più utile e concreto :)

Grazie mille per tutto il supporto ed i mega-upvotes!

!PIZZA !LOL !LUV

@libertycrypto27, @arc7icwolf(1/4) sent you LUV. | tools | discord | community | HiveWiki | <>< daily

Made with LUV by crrdlx

Did you hear about the dyslexic pimp?
He bought a warehouse.

Credit: reddit
@libertycrypto27, I sent you an $LOLZ on behalf of arc7icwolf

(1/10)
Farm LOLZ tokens when you Delegate Hive or Hive Tokens.
Click to delegate: 10 - 20 - 50 - 100 HP

Ohhh
This is quite useful...
I'll check the previous post...

Learned a bit about Python in a compulsory course in school, although the language didn't stick😂😂😂😭

I'm just scratching the basics, but the possibility seems so many that I couldn't help but attempt to write something :)

I'm trying to learn it on my own, but I can confirm that a lot of exercise is required not to forget what one has learned in the previous days/weeks/months !LOL I had to stop for a few months and I almost forgot everything 😂

Did you know that the first french fries weren’t cooked in France?
They were cooked in Greece.

Credit: belhaven14
@seki1, I sent you an $LOLZ on behalf of arc7icwolf

(4/10)
NEW: Join LOLZ's Daily Earn and Burn Contest and win $LOLZ

Beautiful Soup, interesting.., I haven't tried scraping as yet. You can get a post count using BEEM, I can dig it out of my script if you like. BEEM is also deprecated and I am going to re-write my BOT soon, using alternative code.

I was looking for a way to build a word counter of my own and Beautiful Soup seemed like the way to go... I had no idea BEEM already had its own counter 😅 I just started using it and I still have to check a lot of stuff! :)

BEEM is also deprecated

Really!? What a sad news, I was just starting to experiment with it... is there already an alternative around?

I started learning python roughly 2 months ago and I thought that doing something tangible on Hive could help me stay motivated and focused :)

Btw, many thanks for the support! This evening I already made some improvements to the code above:

  1. I added one more func to create a new file every 24 hours, with each file having in its name the date it was created
  2. I polished a bit the code and made it more readable

Now I'm going to work on the other two scripts I'd like to write!

This will give you some idea of how to get the body content size using BEEM.

# Get Approximate Bodysize of post
bodysize = BEEMComment_post.body.split(" ", post.body.count(" "))
bodylen = len(bodysize)

BEEM was maintained by an ex-witness named @holger80 who vanished some time ago. His library remains but it's getting more outdated every day.

I am starting to look at the HIVE Condenser API, it's here:
https://developers.hive.io/apidefinitions/condenser-api.html

I started learning python roughly 2 months ago and I thought that doing something tangible on Hive could help me stay motivated and focused :)

Do it, it's very rewarding...

Thanks for all this info and for your snippet!

The Condenser API docs look full of new stuff to digest... grasping all of that it's not going to be easy 😅

My dear friends, You are doing great , I get admission In software engineering Now a days I am learning the basic which is C. your work motivate me very much, I love your work ,Keep it up and motivate us , Tell me It is easy for me to learn coding and work for hive blog chain , can I do it ?

Bella idea, speriamo vada in porto,
!PIZZA
!BEER

Grazie :)

Probabilmente resterà sempre e solo un'idea, però io comunque come forma di esercizio punto a finire il tutto e creare un qualcosa di funzionante... poi chissà, magari un giorno servirà a qualcosa :)

!LOL !PIZZA

Sicuramente che servirà,
!PGM

Why did the king go to the dentist?
To get his teeth crowned.

Credit: reddit
@pousinha, I sent you an $LOLZ on behalf of arc7icwolf

(3/10)
Delegate Hive Tokens to Farm $LOLZ and earn 110% Rewards. Learn more.

Credo che sia un progetto molto molto interessante e mi farebbe piacere anche un commento da parte di @libertycrypto27 con cui avevo già affrontato l’argomento.

Domanda da non addetto ai lavori: ma poi questo script dove gira? Su un server? E come legge le informazioni sulla blockchain per poi curare?

Loading...

This post was shared and voted inside the discord by the curators team of discovery-it
Join our Community and follow our Curation Trail
Discovery-it is also a Witness, vote for us here
Delegate to us for passive income. Check our 80% fee-back Program

PIZZA!

$PIZZA slices delivered:
@arc7icwolf(4/10) tipped @pousinha
pousinha tipped arc7icwolf
arc7icwolf tipped stewie.wieno
arc7icwolf tipped libertycrypto27

@libertycrypto27, the HiQ Smart Bot has recognized your request (1/2) and will start the voting trail.

In addition, @arc7icwolf gets !PIMP from @hiq.redaktion.

For further questions, check out https://hiq-hive.com or join our Discord. And don't forget to vote HiQs fucking Witness! 😻