๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Projects/COVID-19 analysis

xml ๋ฐ์ดํ„ฐ๋ฅผ csv ํŒŒ์ผ๋กœ ์ˆ˜์ง‘ 2 - ๊ณต๊ณต์žฌ๋‚œ๋ฌธ์ž๋ฐ์ดํ„ฐ

์ง€๋‚œ ํฌ์ŠคํŒ…์—์„œ ๋ง์”€ํ•ด๋“œ๋ ธ๋˜, ๋ถ€๋ถ„์„ ์ˆ˜์ •ํ•ด๋ณด๋ คํ•ฉ๋‹ˆ๋‹ค.

msg column์—์„œ [๋™๋Œ€๋ฌธ๊ตฌ์ฒญ] ์ด ๋ถ€๋ถ„์„ ๋‹ค๋ฅธ ์—ด๋กœ  ๋นผ๋‚ด๋Š” ์ž‘์—…์„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

1. ์†Œ์Šค์ฝ”๋“œ

f = open('SOSmsg.csv','r')
rdr = csv.reader(f)
list_a = []
cnt = 0
for line in rdr:
    a = line[3].replace('[', '',1)
    a = a.split( ']' , 1 )
    list_a.append([line[0],line[1],line[2],a[0],a[-1]])
ft = open('new.csv','w',newline='') 
wr = csv.writer(ft)
wr.writerows(list_a)
 
f.close()

 

2. ์†Œ์Šค์ฝ”๋“œ ์„ค๋ช…

 

f = open('SOSmsg.csv','r')

์ง€๋‚œ ๋ฒˆ์— ์ €์žฅํ–ˆ๋˜SOSmsg.csv ํŒŒ์ผ์„ ์ฝ๊ธฐ๋ชจ๋“œ๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

 

line[3].replace('[', '', 1 )

line[3] ๋ถ€๋ถ„์ด msg ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. replace ๋ฅผ ์จ์„œ [ ๋ฅผ ๊ณต๋ฐฑ์œผ๋กœ ๋Œ€์ฒดํ•ด์ค๋‹ˆ๋‹ค. ์ด ํ–‰์œ„๋ฅผ ํ•œ๋ฒˆ๋งŒ ํ• ๊ฒƒ์ด๊ธฐ๋•Œ๋ฌธ์—, 1

๋งจ ์•ž์˜ [ ๋ฅผ ์ง€์›Œ์ฃผ๊ธฐ ์œ„ํ•œ ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

 

a = a.split( ']' , 1 )

]๊ธฐ์ค€์œผ๋กœ, msg ๋ฐ์ดํ„ฐ๋ฅผ split ํ•ด์ค๋‹ˆ๋‹ค.

msg ์ค‘๊ฐ„์— ]๊ฐ€ ๋‚˜์˜ค๊ธฐ๋„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— , ๊ทธ ๋•Œ๋Š” split ์ด ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ง‰์•„์ฃผ๊ธฐ ์œ„ํ•ด, split ํšŸ์ˆ˜๋Š” 1๋ฒˆ์œผ๋กœ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค.

 

ft = open('SOSmsg_split.csv','w',newline='') 

์ƒˆ๋กœ์šด ํŒŒ์ผ์ธ SOSmsg_split.csv์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

 

 

3. ๊ฒฐ๊ณผ ํ™•์ธ