# Project

# Preliminary results

These are the topics fit to the 20000 documents on the quant-ph arXiv. This data set contains nearly 80% of all the world knowledge about quantum information theory.

The Gibbs sampler estimation for the LDA model was run for 400 iterations and we were looking for 30 topics. The algorithm automatically discovers 30 topics t1 to t30 that best describe the document collection.

A topic is a probability distribution over words. For each of the topics I have tried to pick the best label that describes the subject.

Here are the top words in each topic:

[t1=general QM theory] set quantum space operator observables hilbert theory definition theorem defined observable projection …

[t2=algebraic constructs] group space representation algebra invariant dimensional point lie product vector form generator …

[t3=decoherence] quantum system decoherence equation environment dynamic classical initial evolution decay density phy …

[t5=linear algebra] operator matrix vector basis matrices set element unitary density product form diagonal … [t18] function term equation order exp result integral expression approximation method obtain limit … [t11] phase cos sin case angle parameter sin2 rotation phases geometric cos2 adiabatic …

[t6=quantum cryptography] bob alice protocol bit key communication information quantum eve channel teleportation pair …

[t17=quantum information] entanglement states entangled pure qubit local mixed separable maximally system operation bipartite … [t19] measurement states information probability quantum optimal fidelity result povm input cloning case … [t7] channel theorem map quantum entropy proof bound positive log lemma follow information … [t28] quantum classical information theory question game player physic point chapter strategy problem …

[t9 = quantum algorithms] quantum problem graph probability algorithm bound random function theorem log proof set … [t12=quantum copmputation] quantum algorithm computation computer number bit step register problem unitary classical operation …

[t10 = references] phy rev quantum lett quant 2002 2000 2001 2003 2004 1998 1999 … [t15] table 111 100 000 101 vol review 110 1994 york 123 105 … [t25] fig figure values line parameter numerical number result shown show small large …

[t13] states operator function coherent exp number gaussian distribution phy wigner mode relation … [t14] equation field quantum space theory particle mechanic classical relativistic motion momentum dirac … [t8=] field force energy effect phy casimir surface frequency vacuum result mirror radiation …

[t16=error correcting codes] qubit error gate operation quantum gates code computation codes circuit single correction …

[t20=high energy physics] wave particle momentum function position energy particles packet probability scattering point potential … [t21=shrodinger equation] potential solution equation energy function phy hamiltonian problem oscillator real dinger schr …

[t4=quantum optics] photon beam detector single polarization detection optical fig experiment mode signal splitter … [t24] noise field phase squeezing mode signal fluctuation quantum feedback beam input output …

[t22] atom field cavity atomic laser ion level transition frequency trap coupling optical … [t23] system hamiltonian interaction evolution case states level condition initial term consider operator … [t27] spin control pulse electron magnetic quantum coupling pulses dot field single nmr … [t29] spin model energy ground number system chain lattice hamiltonian interaction transition temperature …

[t26=bell inequalities] bell measurement correlation local experiment inequality quantum inequalities result particle spin test …

[t30=measurement theory] quantum theory mechanic physical measurement interpretation observer object physic event probability collapse …

# Final report

report: arxiv-lda-gibbs-python-writeup.pdf

this is an excellent review paper: http://www.arbylon.net/publications/text-est2.pdf , which explains the concepts much better.

# Observations

* woow, that was done in just a few hours on a laptop

# Ideas

- autotagging to
- regex features
- subtopic classification using kullbeck-liebler distance