On the training configurations or methodology #124

Aspie96 · 2022-07-11T03:55:40Z

Aspie96
Jul 11, 2022

First, thank you for releasing TorToiSe!

The results really are remarkable and impressive. You did a fantastic job.

Your readme says:

The above points could likely be resolved by scaling up the model and the dataset. For this reason, I am currently withholding details on how I trained the model, pending community feedback.

Since there is no discussion yet on the topic, I hope you don't mind me creating one.

I am not a contributor and, of course, you are the final authority on this.

My suggestion is to release these details, which indeed can be quite helpful and informative, for at least two different reasons:

Whether knowledge is public or not, wealthy people and large companies have the resources to gather the same information, and probably do even bettre. Informing the public does not change what those people (who are not ethically better) can do, but rather democratizes access to AI knowledge. The difference is not whether misuses can happen, but how informed we are about them and whether they lead to more disparity.
Knowledge is often quite flexible. The information that you may contribute might help the training of quite different AI models.

6r1d · 2022-07-11T18:38:10Z

6r1d
Jul 11, 2022

I'd agree with @Aspie96 here.

The above points could likely be resolved by scaling up the model and the dataset. For this reason, I am currently withholding details on how I trained the model, pending community feedback.

If you'll ever consider releasing training configuration, it would be helpful to others and allow iterating further. The final decision is yours, but I believe the details would allow making more TTS engines at a later date. And it'll be great for people with visual impairment.

The ways in which a voice-cloning text-to-speech system could be misused are many

While you're right, this is not the only TTS system and global change will rise from people getting more acquainted with TTS.
If your idea gets more widely adopted, many more voices and languages are going to be added and people will get a better grasp of TTS systems.

There may be some damage, but there's going to be a lot of improvement, as well.

Previously, EleutherAI had similar considerations:

We think the damage caused by new technologies like these are likely to be heavy-tailed, in the sense that the top 1% of dangerous actors are likely to be responsible for >99% of the damage. For the reasons just given, attempting to keep this technology out of the hands of bad actors is futile, and the best we can do is empower society as a whole to study and use this technology for good.

0 replies

neonbjb · 2022-07-11T20:40:07Z

neonbjb
Jul 11, 2022
Maintainer

Hey there, thank you for starting this discussion. It felt awkward for myself to start such a thread.

In general, I'd like to hear more feedback on this topic. I appreciate the sentiment from Eleuther that was expressed above: I feel more or less the same way. However, I will state that the potential for harm that fine-tuning Tortoise represents far surpasses the potential harm that could come from any of Eleuther's projects that I have seen. It is a deep-fake producing machine. It will fool anyone who does not know that this exists, and that number is a few billion people. Here is an example that I put together that I have been showing people lately:

kennedy_moon.mp4

I like to ask them "what is wrong with this audio". Older folks pick up pretty quickly that it was Reagan that made this speech, but no-one immediately says "that's not Kennedy".

I'd also like to ask for input on: What would a release of training configs actually look like? I would be quite comfortable dropping the DLAS configs I used to train these models, but they will be missing the dataset. I would also never recommend that someone attempt to use & understand my DLAS trainer. I built it for myself, I modify & break it all the time while doing experiments, and I will not commit to maintaining it. I simply don't have the time as an employed solo developer. My "side time" is spent doing research, I'm not really interested in maintaining repos. :)

The majority of the effort I spent building this was collecting, curating, cleaning and transcribing my dataset, and I cannot legally release that. As a result, I'm pretty skeptical that releasing my training details, which amounts to "copy and pasting the DALL-E + Improved Diffusion papers" would be meaningful.

With all of this being said, I am happy to discuss training details with anyone. I do intend to put together a whitepaper with details like hyperparameters, methodology, order of model training, etc. I just haven't finished it yet. I do appreciate that this is a bit backwards from the way most ML research is done. :)

8 replies

meltingrock Feb 8, 2023

@neonbjb Thank you kindly.

P.S. I have read some really nasty comments on here regarding your relationship with Play and details around training data. You did all the hard work, tinkered and applied your mind proving that TTS quality could be better than even the heavyweights at FAANG are capable of. It is your decision to stop, divulge or do whatever the hell you decide to do, so ignore the noise. (Even though it may be upsetting.)

My 2c :-)

bbecausereasonss Feb 8, 2023

@neonbjb Thank you kindly.

P.S. I have read some really nasty comments on here regarding your relationship with Play and details around training data. You did all the hard work, tinkered and applied your mind proving that TTS quality could be better than even the heavyweights at FAANG are capable of. It is your decision to stop, divulge or do whatever the hell you decide to do, so ignore the noise. (Even though it may be upsetting.)

My 2c :-)

Nasty comments? What nasty comments? As far as I could tell every single person in this discussion is being polite, grateful and open minded. Please don't make things up to score virtue points; this aint the place for it.

Why don't you quote these supposedly "nasty" comments.

meltingrock Feb 8, 2023

Ok.

#292 (comment)

Maybe we have differing opinions when it comes to being "polite, grateful and open minded".

meltingrock Feb 8, 2023

@neonbjb You are badly mistaken if you think I am trying to "score points" with you. For what reason and to what end? (Rhetorical)

In any case, if not too late, be careful of the inflated ego it is an annoying trait.

neonbjb Feb 8, 2023
Maintainer

Appreciated. I know it's just a vocal minority.

6r1d · 2022-07-11T21:40:14Z

6r1d
Jul 11, 2022

The majority of the effort I spent building this was collecting, curating, cleaning and transcribing my dataset, and I cannot legally release that. As a result, I'm pretty skeptical that releasing my training details, which amounts to "copy and pasting the DALL-E + Improved Diffusion papers" would be meaningful.

Got it, some records are limited in that regard, especially historical ones.

I'd also like to ask for input on: What would a release of training configs actually look like? I would be quite comfortable dropping the DLAS configs I used to train these models, but they will be missing the dataset. I would also never recommend that someone attempt to use & understand my DLAS trainer. I built it for myself, I modify & break it all the time while doing experiments, and I will not commit to maintaining it

With all of this being said, I am happy to discuss training details with anyone. I do intend to put together a whitepaper with details like hyperparameters, methodology, order of model training, etc.

I think this is actually very good: someone may donate more voices and things may improve at that point.

You're mentioning that DLAS may break, but IIRC, Jax was broken that one time, blocking the use of GPT-j, but GPT-j survived that.

0 replies

bbecausereasonss · 2023-01-11T14:10:22Z

bbecausereasonss
Jan 11, 2023

I would say another point is, someone will release an open-source model soon, it may not be you but it will happen. The sooner you do it, the sooner you enable people like us who want to do good and create personal projects for education or entertainment or enablement. The more people get a grasp of the tech, the less fear and coercion we will have with the tech only being used in shadowy internet corners, in the future.

0 replies

devilismyfriend · 2023-02-01T19:31:17Z

devilismyfriend
Feb 1, 2023

I agree with the sentiment shared by the rest here, Tortoise is a very promising solution that's currently blocked from being better, additionally, with new advancements like VALL-E etc it has been / will be passed soon regardless, with the amount of information you already shared, any small company of devs could already figure out a way to fine-tune Tortoise so, in essence, the only people that would be benefiting from your solution right now will be people that wish to monetize the tech, which is fine, but looking at stuff like the recent Stable Diffusion, look how that changed the landscape for AI Image generation, by sharing it to the public many techniques were invented and amazing new use cases were formed, when SD came out MJ was the main model around and while that's still true, they didn't allow for stuff like init images, img2img or even certain words like guts (berserk fans sadge), SD opened up the industry to change and probably changed many industries forever, for the better in most cases, even including MJ.

The work you put here is very inspiring and it's a shame to see it be repurposed by folks like Elevenlabs.ai, it's just the same story we had with Disco Diffusion and MJ where a small group of devs benefits greatly from the ignorant masses.

On a personal note, anything can be made to bad use, we can't stop bad actors from doing anything really and we're approaching a reality in which imagery, audio and text will be easily generated by the common person, that would require a general change in the way we consume information and while it's alarming and the future is uncertain, the bad actor's actions aren't on the creator of these models, they're on the bad actors, this reality is coming regardless.

Personally, as a ML hobbyist, I would love to learn more through your experience and code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the training configurations or methodology #124

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

On the training configurations or methodology #124

Replies: 5 comments · 8 replies

neonbjb Jul 11, 2022 Maintainer

neonbjb Feb 8, 2023 Maintainer

Replies: 5 comments 8 replies

neonbjb
Jul 11, 2022
Maintainer

neonbjb Feb 8, 2023
Maintainer