-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reddit #997
base: master
Are you sure you want to change the base?
Conversation
@@ -25,6 +25,7 @@ ftest/*.csv | |||
*.sqlar | |||
*-wal | |||
*-shm | |||
*.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c'était pour faire les tests de mes fonctions, histoire d'éviter de push des fichiers de données inutiles
minet/cli/reddit/__init__.py
Outdated
""", | ||
variadic_input={ | ||
"dummy_column": "subreddit", | ||
"item_label": "subreddit url, subreddit shortcode or subreddit id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subreddit url, shortcode or id
ce sera probablement un peu moins long dans l'aide
minet/cli/reddit/__init__.py
Outdated
}, | ||
arguments=[ | ||
{ | ||
"flags": ["-n", "--number"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'habitude cet argument s'appelle -l, --limit
dans minet. Et on ne le met en général que quand c'est pas trivial à faire avec un xan slice
je crois.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Là c'est pas le maximum qu'il doit récupérer c'est le nombre de résultats à récupérer. Tu préfères que je change ça et que je mette une limite ?
minet/cli/reddit/__init__.py
Outdated
""", | ||
variadic_input={ | ||
"dummy_column": "post", | ||
"item_label": "post url, post shortcode or post id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
minet/cli/reddit/__init__.py
Outdated
) | ||
|
||
REDDIT_USER_POSTS_SUBCOMMAND = command( | ||
"user_posts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Les noms de commandes sont en kebab case, pas snake case :)
Il faudra changer l'exemple aussi.
minet/reddit/scraper.py
Outdated
|
||
|
||
def extract_t1_ids(text): | ||
pattern = r"t1_(\w+)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile cette regex en dehors de la fonction. Sinon dans ce cas là est-ce qu'un split fait pas le taf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ça marcherait pas un split, c'est une chaine qui contient plus d'infos mais où je récupère juste ceux de cette forme là
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
par contre je viens de remarquer que sur certaines pages les données sont pas stockées avec le même format...
je finis de faire les autres changements et je répare ça
minet/reddit/scraper.py
Outdated
error, | ||
): | ||
try_author = post.select_one("a[class*='author']") | ||
author = try_author.get_text() if try_author else "Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Est-il possible qu'il existe un user qui s'appelle "Deleted"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alors oui j'y avais pas pensé mais il y a un user qui s'appelle deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
du coup soit je trouve un autre moyen d'afficher les users supprimés soit je peux faire précéder les username de "u/" comme c'est souvent fait sur Reddit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ou pour les utilisateurs supprimés je peux mettre "[Deleted]", c'est indiqué comme ça sur le site
minet/reddit/scraper.py
Outdated
|
||
def get_childs_l500(self, url, list_comments, parent_id): | ||
_, soup, _ = reddit_request(url, self.pool_manager) | ||
comments = soup.select("div[class='commentarea']>div>div[class*='comment']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[class='commentarea']
> .commentarea
ou alors tu veux spécifiquement qu'il n'y ait qu'une seule classe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
en fait je veux juste récupérer le premier élément dont la classe contient 'comment' qui va contenir le premier commentaire de la page (le principal vu qu'on est sur la page du commentaire) et tous ses fils imbriqués direct dans sa balise. si je fais pas comme ça je vais récupérer un élément qui contient le premier commentaire et tous ses fils, un autre qui contient son premier fils et tous ses fils, etc.
minet/cli/reddit/comments.py
Outdated
with loading_bar.step(url): | ||
try: | ||
if cli_args.all: | ||
comments = scraper.get_comments(url, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments = scraper.get_comments(url, cli_args.all)
sinon?
minet/cli/reddit/posts.py
Outdated
with loading_bar.step(url): | ||
try: | ||
if cli_args.number: | ||
if cli_args.text: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Même remarque
No description provided.