HTML Cleaner - Comment Section
I'm currently using your HTML Cleaner to strip the links from some pages in order to convert them into pdfs. Your tool is the best thing I've found for that; it's quick and clean. So thank you for making it available.
How difficult would it be for you to add an "ignore" function to the cleaner? I've noticed that the cleaner converts some standard tags, like <figcaption>, into <p> tags. And it does the same for the elements I wrote myself. For instance, I have one element, <whitley>, that styles the font size, spacing, and margins of a piece of text. The cleaner turns that into a <p>. Another, <authorpic>, floats an image to the right, sets margins around it, and styles a border for it. That gets turned into a <p> as well.
But the cleaner does ignore all the weird classes I wrote, like "brad1" and "brad2," that change the initial background colors on an image, and then change the color on <hover>.
Looking at the comment list on the cleaner site, I have to echo the universal thanks you have from users there, but it looks like you aren't developing this much any more (the last comment is from 2016). So if you've moved on to other projects, let me just express my gratitude. But if you're considering adding some functionality, a place where a user could list one or more tags to ignore would be so great.
Again, thanks,
Joe
Thank you for the feedback.
I keep upgrading the cleaner on another domain: HTML6.com/editor I keep this website online because people have been using it for over 10 years and have become accustomed to it.
The editor corrects invalid HTML tags by default. You need to disable the "Correct invalid tags" option in the Settings and it will allow you to define your own tags.
Figcaption should always be the child of a <figure>, that's why it's replacing it with p.
<figure>
<img src="image.jpg" alt="alt">
<figcaption>Caption text</figcaption>
</figure>
what got me totally confused was all the cruft - your site was just what I was looking for!
Is there any chance you can help me run your site offline? I travel a lot with my laptop, so I'm often without an internet connection, so might need to tweak another Word document soon.
Let me know if there's a "light" version or if there's an offline manifest app that you've got working so I can continue even when a connection is unavailable. All the best, thanks again!
thanks
This is the most brilliant thing we have found in years. Thank you for making this tool available to us. Our team of developers and designers build and revise online courses. Courses we inherit often have no CSS and absolutely ugly HTML littering the pages. To save us time in the long run we actually go through the entire course converting every page to reference a css and clean up all the html.
You have literally saved us hours just cleaning up one lesson of content. The time you will save us in a year will likely allow us to accomplish 10-20% more in revision project closures, and accomplish them up to 50% quicker. Thank you! Thank you! Thank You!
Hi there,
I was wondering if you could make available a version that i can put on my local webserver? I am looking to create an application that works with your tool that will allow me to automate cleaning 100s of pages by using automated tasks.
This is a pretty awesome tool my friend. Great job.
Thanks in advance
Hello, I like Html-Cleaner
But with big html (2 pages in word, half of it is table), I copy it from word. I get error: Input is long. Html has 587 lines. Is it a bug or limitation of your service?
Yes, there's a limitation to avoid freezing the web browser. The program doesn't allow you to clean the source if it contains more than 75.000 characters. In this case a blue window appears with the warning "Input too long". You can follow in real time the number of characters at the bottom of the source editor.
2 pages of Word can be very long in HTML format depending on the content because Word adds a lot of extra inline styles and unnecessary code.
Try to clean it in two steps or send me the doc so I can have a look at it.
\"Akademie múzických umění v Praze\"
vs.
\"Akademie múzických umění v Praze\"
where can I switch off a \"´\" replacement?
I use this tool very often and there's just one thing I'd like to suggest. Every time the program starts it checks some cleaning options by default and I always have to set them up the way I like it. I'd like to be able to save my options or set the program to remember my settings.
That's a good idea. The next major upgrade will use a cookie to remember your settings.
The <Remove tag attributes> option should not remove the src attribute of the images and the href attribute of the links. I think this would be essential because there's a separate option to remove images and links.
An automatic case converter would be very handy with the following options: Sentence case. | lower case | UPPER CASE | Capitalized Case
I always have to go to a to a different website to make these.
That was a great idea. I added it to a separate page: www.html-cleaner.com/case-converter. I hope you'll like it.
I wish I could use the Twitter bootstrap css in the editor.
I'm glad to announce that the cleaner has been upgraded, implementing all suggestions coming from you! Go ahead and give it a try.
Hello! I just discovered HTML cleaner and I cannot tell you how glad I am that I did! It is saving me some serious time on a very big project.
I have noticed when I run the cleaner that my img src tags formatted like this: src="/images/foo.jpg" -- get reformatted to this: src="images/foo.jpg" -- without the leading forward slash. I'm not sure if I have something set incorrectly or if it's a bug. For now I am having to manually add them back in.
Thank you!
Katy
Thank you Katy for your remark. I have fixed the problem. Keep cleaning :)
1. I\'d like to retain my styles almost 100% from my word doc. The only things I\'d like to remove are extra spaces between sections. Other than the Remove Successive Space option, are there any other options you would recommend prior to clicking the clean HTML option?
2. Is there any easy suggestion for aligning/indenting bulleted lists as well as aligning text where the text following the bullet is lengthy and the words wrap cleanly?
Great product, I\'ve learned quite a bit in a matter of days already and your UI is very intuitive.
Thanks, Mike
I want those letters like "ä", "ö" clean in the html Code and not as: ä Ö...
thanks a lot!
The only thing I would love as an update would be the UI and formatting. I currently use your tool and dirtymarkup which work amazing together. But two in one? That would be insane!
Another issue I find is that sometimes if the tables are over 3 points to the right it forgets to add closing tags and it gets a bit jumbled which has caused me to go over it whenever I clean.
Thanks a lot!
Your program is amazing and almost solves my problem. The "almost" is because I need to paste Word text to Blackboard, with a _blank tag added to all links. Unfortunately, Word for Mac does not allow to change the default target of links.
Could it be an additional function to your website? Thank you I'm any case
You can use a little trick with the find and replace tool like this:
Find: <a
Replace with: <a target='_blank'
I love this site, I started using it about a month ago. I work for a college and ADA compliancy is very important. I would donate/pay to help keep it. I would be glad to contact more of my colleagues and associates.
Also... have to say BEST site of this kind!
This is a great resource for instructional designers working to have an easy remote tool to ensure ADA Compliancy. I cannot tell you how much of a time saver HTML-Cleaner is!!
So glad I found this site. It works really well for removing those pesky mso tags
Thank you so much for this cleaner. Were it not for this, I would have wasted a lot more time trying to get the same results. I can't find anything near like this! It's so useful! Makes my life 100x easier, because I can get blog posts up a lot faster. Thank you so much for all your hard work.
Thank You sooo much for this :).. it helped me a lot.. but I see some pages are auto scripted like "document edited by html-cleaner" may I know whether it's working randomly or like per every 10 times usage so that I can estimate how many pages are affected
Every fifth document is scripted if it's large enough, short documents are not affected. But you can leave those links to support our project :)
When I place the word doc paste and copy into the doc screen then copy the html then place in web, it looks good there but when I send the doc to email test its appearing with question marks all over the place and even outside of its use causing the doc to look incompetent. here is a copy of emailed version after html input from site?
… [Content hidden by admin]
It's hard to tell from this but you can check the character set of your email provider or try if the Encode special characters cleaning option makes any difference.
Hello html-cleaner,
First off, let me thank you for this awesome website. Very cool! Is there a buttom to see the CSS code to go with the clean html code? I can find that option. Thanks
The visual editor is using this css, if this is your question: https://html-cleaner.com/tinymce/skins/lightgray/content.min.css
Hello, I want to inquire about the word cleaner converter. I have a csv files and need to convert it to a html code so that I can insert it to my product description. Does this software also support this?
Second, Is there any possible to see the output? I do sample but I didn\'t see any output because it appear all blank.
The csv is a simple text file with comma separated cells. It's not clear what you mean.
Try to open it with Excel and copy the whole content in the visual editor of the html-cleaner.com site and see the result.
Or send me your file so I can see what's this about.
Hi, When I load a word doc that has text bold and colored in RED the bold is there but the color is removed.
Is there a way the color of a word in MS word doc can be transfered to HTML as well?
Its ok I got it wiorking. Save Word Doc to HTML via word 'save as'. Copy HTML into html end of clean html.
carry on. cheers
Is there a tool, app, converter, or chrome extension, etc. that would change the HTML formatting code to inline CSS code either for the whole page or for one string?
Otherwise I have to do it manually
now I have <div><b><font color="#660000" size="4">My old HTML formated text</font></b></div>
I need to get the code <div style="color:#660000;font-size:large;font-weight:bold;">An inline CSS formatted text</div>
pls ignore the text change:
My old HTML formated text -- to --- An inline CSS formatted text
the thex should stay the same: My old HTML formated text
Hello! Could you tell me how to convert Word into HTML with only <b> <br /> </b>, no more any character?
We don't want see <p></b></strong>.If you know, please write back to us. Many Thanks, Tendak
Paste your Word doc in the visual editor, activate the 'Remove all tags' cleaning option, and clean it. Then copy the cleaned source and paste it in the visual editor. You will get the source code just like you want.
Use a custom css instead, using my last css example as a starting point and set individual widths/heights for certain cells. It can be done but needs a little styling.
I'd like to suggest that you add an option to keep scripts intact in the table to div conversion. I'm seeing the "<" changed to "<" (less-than) even with that option off.
This is a section of code that includes VBScript for an ASP page in it. When it converts, it changes the VBscript calls, such as "<%=strCaller%>" to improper codes. This particular snippet of the page creates a series of form buttons for my home automation system. I was hoping to reformat this and other pages to divs with your tool.
In the meantime I suggest to replace every <% and %> with "vbopening" and "vbclosing" strings and when you finished the cleaning to revert to the <% and %>.
<code removed by admin>
Please explain what is the problem exactly? Where are you stuck?
I will definitely buy this.
Some how in Google picked up my website and conjoined it with someone else's. E-host says its not able to correct the mistake and that its a web engine's problem.
I would like to remove Googles mess through its HTML. Is that possible?
Also, can a wildcard character be used in the find and replace?
Update: this has been implemented
Please read through the previously answered questions before you address your issue.