PPT相关模块介绍与安装 Python读取PPF文档

PPT相关模块介绍与安装 Python读取PPF文档

python-pptx模块,可以常见、修改PPT(.pptx)文件。需要单独安装,不属于Python标准模块。官方网站:https://python-pptx.readthedocs.io/en/latest/

可以使用pip命令直接安装:pip install python-pptx

PPT结构

Side:幻灯片页,Paragraph:段落,Run:文字块(与Word相同),Shape:形状

PPT相关模块介绍与安装 Python读取PPF文档

pptx(Microsoft PowerPoint或其他演示程序创建的演示文件)

读取PPT文档

.slides得到一个列表,包含了每个slide

from pptx import Presentation
prs=Presentation('1.pptx')
for slide in prs.slides:
    print(slide)
# <pptx.slide.Slide object at 0x000000000345C9F8>
# <pptx.slide.Slide object at 0x000000000345C778>
# <pptx.slide.Slide object at 0x000000000345C278>

python-pptx获取形状shape

slide.shapes就可以获取形状

from pptx import Presentation
prs=Presentation('1.pptx')
for slide in prs.slides:
   for shape in slide.shapes:
       print(shape)

# <pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000345B4C8>
# <pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000345BE88>
# <pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000345B388>
# <pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000345B4C8>
# <pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000345BFC8>
# <pptx.shapes.placeholder.PlaceholderPicture object at 0x000000000345BC88>

判断一个shape中是否存在文字

shape.has_text_frame可以判断是否有文字,shape.text_frame可以获取文字框

from pptx import Presentation
prs=Presentation('1.pptx')
for slide in prs.slides:
   for shape in slide.shapes:
       if shape.has_text_frame:
           text_frame=shape.text_frame
           print(text_frame)

# <pptx.text.text.TextFrame object at 0x000000000345BBC8>
# <pptx.text.text.TextFrame object at 0x000000000345BF08>
# <pptx.text.text.TextFrame object at 0x000000000345BBC8>
# <pptx.text.text.TextFrame object at 0x000000000345B2C8>
# <pptx.text.text.TextFrame object at 0x000000000345BBC8>

从shape中找paragraphs

text_frame.paragraphs可以获取shape中的段落

from pptx import Presentation
prs=Presentation('1.pptx')
for slide in prs.slides:
   for shape in slide.shapes:
       if shape.has_text_frame:
           text_frame=shape.text_frame
           for paragraph in text_frame.paragraphs:
               print(paragraph)

# <pptx.text.text._Paragraph object at 0x000000000345B388>
# <pptx.text.text._Paragraph object at 0x000000000345BFC8>
# <pptx.text.text._Paragraph object at 0x0000000003460208>
# <pptx.text.text._Paragraph object at 0x0000000003460308>
# <pptx.text.text._Paragraph object at 0x00000000034602C8>
# <pptx.text.text._Paragraph object at 0x0000000003460348>
# <pptx.text.text._Paragraph object at 0x00000000034605C8>
# <pptx.text.text._Paragraph object at 0x000000000345BFC8>
分享到 :

发表评论

登录... 后才能评论