Replies: 5 comments 8 replies
-
I'd agree with @Aspie96 here.
If you'll ever consider releasing training configuration, it would be helpful to others and allow iterating further. The final decision is yours, but I believe the details would allow making more TTS engines at a later date. And it'll be great for people with visual impairment.
While you're right, this is not the only TTS system and global change will rise from people getting more acquainted with TTS. There may be some damage, but there's going to be a lot of improvement, as well. Previously, EleutherAI had similar considerations:
|
Beta Was this translation helpful? Give feedback.
-
Hey there, thank you for starting this discussion. It felt awkward for myself to start such a thread. In general, I'd like to hear more feedback on this topic. I appreciate the sentiment from Eleuther that was expressed above: I feel more or less the same way. However, I will state that the potential for harm that fine-tuning Tortoise represents far surpasses the potential harm that could come from any of Eleuther's projects that I have seen. It is a deep-fake producing machine. It will fool anyone who does not know that this exists, and that number is a few billion people. Here is an example that I put together that I have been showing people lately: kennedy_moon.mp4I like to ask them "what is wrong with this audio". Older folks pick up pretty quickly that it was Reagan that made this speech, but no-one immediately says "that's not Kennedy". I'd also like to ask for input on: What would a release of training configs actually look like? I would be quite comfortable dropping the DLAS configs I used to train these models, but they will be missing the dataset. I would also never recommend that someone attempt to use & understand my DLAS trainer. I built it for myself, I modify & break it all the time while doing experiments, and I will not commit to maintaining it. I simply don't have the time as an employed solo developer. My "side time" is spent doing research, I'm not really interested in maintaining repos. :) The majority of the effort I spent building this was collecting, curating, cleaning and transcribing my dataset, and I cannot legally release that. As a result, I'm pretty skeptical that releasing my training details, which amounts to "copy and pasting the DALL-E + Improved Diffusion papers" would be meaningful. With all of this being said, I am happy to discuss training details with anyone. I do intend to put together a whitepaper with details like hyperparameters, methodology, order of model training, etc. I just haven't finished it yet. I do appreciate that this is a bit backwards from the way most ML research is done. :) |
Beta Was this translation helpful? Give feedback.
-
Got it, some records are limited in that regard, especially historical ones.
I think this is actually very good: someone may donate more voices and things may improve at that point. You're mentioning that DLAS may break, but IIRC, Jax was broken that one time, blocking the use of GPT-j, but GPT-j survived that. |
Beta Was this translation helpful? Give feedback.
-
I would say another point is, someone will release an open-source model soon, it may not be you but it will happen. The sooner you do it, the sooner you enable people like us who want to do good and create personal projects for education or entertainment or enablement. The more people get a grasp of the tech, the less fear and coercion we will have with the tech only being used in shadowy internet corners, in the future. |
Beta Was this translation helpful? Give feedback.
-
I agree with the sentiment shared by the rest here, Tortoise is a very promising solution that's currently blocked from being better, additionally, with new advancements like VALL-E etc it has been / will be passed soon regardless, with the amount of information you already shared, any small company of devs could already figure out a way to fine-tune Tortoise so, in essence, the only people that would be benefiting from your solution right now will be people that wish to monetize the tech, which is fine, but looking at stuff like the recent Stable Diffusion, look how that changed the landscape for AI Image generation, by sharing it to the public many techniques were invented and amazing new use cases were formed, when SD came out MJ was the main model around and while that's still true, they didn't allow for stuff like init images, img2img or even certain words like guts (berserk fans sadge), SD opened up the industry to change and probably changed many industries forever, for the better in most cases, even including MJ. The work you put here is very inspiring and it's a shame to see it be repurposed by folks like Elevenlabs.ai, it's just the same story we had with Disco Diffusion and MJ where a small group of devs benefits greatly from the ignorant masses. On a personal note, anything can be made to bad use, we can't stop bad actors from doing anything really and we're approaching a reality in which imagery, audio and text will be easily generated by the common person, that would require a general change in the way we consume information and while it's alarming and the future is uncertain, the bad actor's actions aren't on the creator of these models, they're on the bad actors, this reality is coming regardless. Personally, as a ML hobbyist, I would love to learn more through your experience and code. |
Beta Was this translation helpful? Give feedback.
-
First, thank you for releasing TorToiSe!
The results really are remarkable and impressive. You did a fantastic job.
Your readme says:
Since there is no discussion yet on the topic, I hope you don't mind me creating one.
I am not a contributor and, of course, you are the final authority on this.
My suggestion is to release these details, which indeed can be quite helpful and informative, for at least two different reasons:
Beta Was this translation helpful? Give feedback.
All reactions