python unicode unescape (html unescape)

Notice

Recent Posts

Link

삽질인생

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

annyoung

python unicode unescape (html unescape) 본문

프로그래밍

python unicode unescape (html unescape)

nopsled 2015. 11. 7. 15:12

시작하기 전에,

이거 때문에 엄청난 삽질을 했다... 물론 엄청 안되서 삽질을 했지만.. 파이썬의 인코딩 형식은 \u로 시작한다. 하지만 html에서는 %u로 시작했기 때문에 unescape가 불가능 했던 것이다... 이 간단한 것 때문에 매~우 삽질을 했다.

방법은?

#-*- coding:utf8 -*-

#!/usr/bin/python

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

word = unicode('%uC548%uB155%uD558%uC138%uC694%7E%21%20%uC774%uAC74%20%uBB38%uC790%uC5F4%uC785%uB2C8%uB2E4.%24_%24'.replace('%u','\\u'), 'unicode-escape')

print word

출력 결과 : 안녕하세요%7E%21%20이건%20문자열입니다.%24_%24

%7E%21~ 이건 무엇이냐면, URL Encode되어 있는 문자열이다. 그래서 이걸 어떻게 URL Decode 하냐면 다음과 같다.

#-*- coding:utf8 -*-

#!/usr/bin/python

import sys, urllib

reload(sys)

sys.setdefaultencoding('utf-8')

word = unicode('%uC548%uB155%uD558%uC138%uC694%7E%21%20%uC774%uAC74%20%uBB38%uC790%uC5F4%uC785%uB2C8%uB2E4.%24_%24'.replace('%u','\\u'), 'unicode-escape')

print urllib.unquote(word)

출력 결과 : 안녕하세요~! 이건 문자열입니다.$_$

이렇게 쉽게 쉽게 할 수 있다.

저작자표시 비영리

'프로그래밍' 카테고리의 다른 글

python ctypes GetFileVersion (0)	2016.11.09
python 윈도우 한글문제 (0)	2016.04.26
django standard install.. (0)	2015.10.13
Convert Facebook username to id (0)	2015.04.12
[파밍 크롤러] pharming crawler (1)	2015.01.28

'프로그래밍' Related Articles

Comments

annyoung

python unicode unescape (html unescape) 본문

python unicode unescape (html unescape)

'프로그래밍' 카테고리의 다른 글

티스토리툴바