Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The mistake: wrong name for the train set and dev set for UP_Portuguese-Bosque #4

Open
scofield7419 opened this issue Aug 19, 2019 · 11 comments
Assignees

Comments

@scofield7419
Copy link

You may wrongly named the train set file and dev set file for UP_Portuguese-Bosque by mistake.
The two file names need to be exchanged.

@arademaker arademaker self-assigned this Jan 29, 2020
arademaker added a commit that referenced this issue Feb 6, 2020
@arademaker
Copy link
Member

This issue is solved but many problems were found during the merge of the current UD data from the master branch of http://github.com/universaldependencies/UD_Portuguese-Bosque and the SRL annotations in the https://github.com/System-T/UniversalPropositions/tree/master/UP_Portuguese-Bosque.

See https://github.com/System-T/UniversalPropositions/blob/issue-4/UP_Portuguese-Bosque/README.md#merging-up-and-ud-datasets

We will need to manually inspect some sentences and decide about the final format of the files in this repository. To make Portuguese data similar to the remain files, I will temporarily convert the new CoNLL-U valid files to CoNLL files (one extra column for each predicate).

@arademaker
Copy link
Member

@huaiyu-zhu comments are welcome.

@arademaker
Copy link
Member

I found at least one sentence in the current set of PT sentences in the MASTER branch with a wrong sent_id. Fixed in 109816f

@yunyaoli
Copy link
Contributor

@arademaker Can we close this issue? Thanks.

@alanakbik
Copy link
Collaborator

Hello @arademaker are you planning to merge branch issue-4? We just stumbled across the problems in the Portuguese UP and it seems to fix them?

@arademaker
Copy link
Member

Yes, I need to finish this issue. I will need 1-2 weeks for that.

@yunyaoli
Copy link
Contributor

yunyaoli commented Oct 5, 2020

@arademaker Any updates?

@arademaker
Copy link
Member

Sorry, I will solve it this week. But we still need to reevaluate the plans for the PR #7.

@andreabac3
Copy link

UP

@arademaker
Copy link
Member

Sorry, can you elaborate?

@andreabac3
Copy link

andreabac3 commented Sep 28, 2021

@arademaker Sorry, I'm referring to the train and dev file naming problem. I only realized this after producing the dataset statistics. With my comment ("UP"), I wanted to bring to attention the fact that the names are still reversed.

I take this opportunity to thank all of you for your work.

Kind regards,
Andrea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants