OpenSemSearch's profile picture. #Semantic #search engine (#opensource) to search & analyze document sets, archives & news (exploratory search, #textmining, #nlp, #annotation, #ocr, #ddj, #dh)

Open Semantic Search

@OpenSemSearch

#Semantic #search engine (#opensource) to search & analyze document sets, archives & news (exploratory search, #textmining, #nlp, #annotation, #ocr, #ddj, #dh)

Pinned

Free #OpenSource research tools & tutorials for search, analysis, annotation, structure & textmining of large document collections, archives & leaks on your own laptop or server: opensemanticsearch.org #NICAR #NICAR22 #NICAR2022 #ddj #datajournalism #investigative #journalism

OpenSemSearch's tweet image. Free #OpenSource research tools & tutorials for search, analysis, annotation, structure & textmining of large document collections, archives & leaks on your own laptop or server: opensemanticsearch.org #NICAR #NICAR22 #NICAR2022 #ddj #datajournalism #investigative #journalism

Some scientists & librarians had documents (f.e. "Sammelband") / digitized books for which named entity recognition failed because of default limit (one million chars) of NER lib. Thanks to github.com/wsldankers from Tilburg University for new config option to extend the limit!


Upgraded text extraction to new Apache Tika release 2.5.0: dist.apache.org/repos/dist/rel…


Open Semantic Search reposted

Want rendering in addition to extracted text and metadata? Please contribute to the design!

This post is unavailable.

Open Semantic Search reposted

Imagine having 1000 PDFs and needing to find those with specific keywords. Here’s a tool many, many journalists need that someone could easily write and share. 🔍🗞 - Take PDFs as input - Convert to text - Search for keywords (UTF-8) - Output result as CSV


Open Semantic Search reposted

Open Semantic Search does exactly this and more. A lot more. It's an amazing tool. github.com/opensemanticse…


Open Semantic Search reposted

Heute gaben @acka47 und @fsteeg im Rahmen eines Kolloquiums der Professur Wirtschafts- und Sozialgeschichte an der @UniHalle eine Präsentation "Integration externer Normdatenquellen für Abgleich & Anreicherung lokaler Daten in #OpenRefine". slides.lobid.org/2022-halle-rec… #reconciliation


Open Semantic Search reposted

Hier noch ein kurzfristiger Veranstaltungshinweis für #wikibase und #linkedData Interessierte "NFDI-InfraTalk: Wikibase and the challenges and possibilities of knowledge graphs for RDM in NFDI4Culture" Heute, 7.3.2022, 16 Uhr im Live Stream auf youtube.com/watch?v=RPMkuD… #nfdi


Graphs of architecture documentation now in #Mermaid format (mermaid-js.github.io/mermaid/) inside documentation (markdown) inside the Git(hub) repo, so all can edit them like other parts of the docs in github.com/opensemanticse… - Tnx to #opensource #MkDocs plugin github.com/fralau/mkdocs-…

OpenSemSearch's tweet image. Graphs of architecture documentation now in #Mermaid format (mermaid-js.github.io/mermaid/) inside documentation (markdown) inside the Git(hub) repo, so all can edit them like other parts of the docs in github.com/opensemanticse… - Tnx to #opensource #MkDocs plugin github.com/fralau/mkdocs-…

Working on more prioritization features for the task queue for better chance to process more relevant documents earlier. If you want to contribute on setting priorities for filename extensions or relevant keywords which often occure in filenames: github.com/opensemanticse… #ddj


New #OpenSource release (beta!) of #Open #Semantic #Search Server for teams with many upgrades (Tika 2, Solr 8, spaCy NLP 3 & Flower task monitoring out of the box) available for download (package for Debian 11): opensemanticsearch.org #ddj #datajournalism #dh #digitalhumanities


New #OpenSource release (beta!) of #Open Semantic Desktop #Search VM with many upgrades (Debian 11 Bullseye, Apache Tika 2, Apache Solr 8, spaCy NLP 3 ...) available for download: opensemanticsearch.org/doc/desktop_se… #ddj #datajournalism #dh #digitalhumanities


Migrated build of the Open Semantic Desktop Search VM (Virtual Box appliance) to Ansible: github.com/opensemanticse…


Most #opensource contributors of #Open Semantic Search not listed on Github user interface as "Contributors" because our repo is structured by Git submodules (additional git repos). Added a section "Contributors" to github.com/opensemanticse… (feel free to extend). Thanks to all!


Next Open Source release of Open Semantic Search Server with automatic setup of Celery Flower (web user interface) for monitoring of the document processing task queue (ETL) out of the box.

OpenSemSearch's tweet image. Next Open Source release of Open Semantic Search Server with automatic setup of Celery Flower (web user interface) for monitoring of the document processing task queue (ETL) out of the box.
OpenSemSearch's tweet image. Next Open Source release of Open Semantic Search Server with automatic setup of Celery Flower (web user interface) for monitoring of the document processing task queue (ETL) out of the box.
OpenSemSearch's tweet image. Next Open Source release of Open Semantic Search Server with automatic setup of Celery Flower (web user interface) for monitoring of the document processing task queue (ETL) out of the box.

Working on #spaCy NER plugin to run Named-entity recognition (NER) by multiple different #MachineLearning models for same document language (currently "only" one #ml model per language configurable) to fill faceted search/interactive filters. #opensource #textmining #nlp #nlproc


Loading...

Something went wrong.


Something went wrong.